Higher-order total variation approaches and generalisations

Kristian Bredies; Martin Holler

doi:10.1088/1361-6420/ab8f80

1. Introduction

In this paper we give a review of higher-order regularisation functionals of total-variation type, encompassing their development from their origins to generalisations and most recent approaches. Research in this field has in particular been triggered by the success of the total variation (TV) as a regularisation functional for inverse problems on the one hand, but on the other hand by the insight that tailored regularisation approaches are indispensable for solving ill-posed inverse problems in theory and in practice. The last decades comprised active development of the latter topic which resulted in a variety of different strategies for TV-based regularisation functionals that model data with some inherent smoothness, possibly of higher order or multiple orders. For these functionals, this paper especially aims at providing a unified presentation of the underlying regularisation aspects, giving an overview of numerical algorithms suitable to solve associated regularised inverse problems as well as showing the breadth of respective applications.

Let us put classical and higher-order total-variation regularisation into an inverse problems context. From the inverse problems point of view, the central theme of regularisation is the stabilisation of the inversion of an ill-posed operator equation, which is commonly phrased as finding a u ∈ X such that

$\begin{equation*}K\left(u\right)=f\end{equation*}$

for given K : X → Y and f ∈ Y, where X and Y are usually Banach spaces. Various approaches for regularisation exist, e.g., iterative regularisation, Tikhonov regularisation, regularisation based on spectral theory in Hilbert spaces, or regularisation by discretisation. Being a regularisation and providing a stable inversion is mathematically well-formalised [84], and usually comprises regularisation parameters. Essentially, stable inversion means that each regularised inverse mapping from data to solution space is continuous in some topology, and being a regularisation requires in addition that, in case the measured data approximates the noiseless situation, a suitable choice of the regularisation parameters allows to approximate a solution that is meaningful and matches the noiseless data. These properties are typically referred to as stability and convergence for vanishing noise, respectively. For general non-linear inverse problems, they usually depend on an interplay between the selected regularisation strategy and the forward operator K, where often, derivative-based assumptions on the local behaviour around the sought solution are made [84, 113]. In contrast, for linear forward operators, unified statements are commonly available such that regularisation properties solely depend on the regularisation strategy. We therefore consider linear inverse problems throughout the paper, i.e., the solution of Ku = f where K : X → Y is always assumed to be linear and continuous.

Variational regularisation, which is the stabilised solution of such an inverse problems via energy minimisation methods, then encompasses—and is often identified with—Tikhonov regularisation (but comprises, for instance, also Morozov regularisation [136] or Ivanov regularisation [114]). Driven by its success in practical applications, it has become a major direction of research in inverse problems. Part of its success may be explained by the fact that variational regularisation allows to incorporate a modelling of expected solutions via regularisation functionals. In a Tikhonov framework, this means that the solution of the operator equation Ku = f is obtained via solving

$\begin{equation*}\underset{u\in X}{\mathrm{min}}\enspace {S}_{f}\left(Ku\right)+{\mathcal{R}}_{\alpha }\left(u\right),\end{equation*}$

where S_f : Y → [0, ∞] is an energy that measures the discrepancy between Ku and the measured data f, and ${\mathcal{R}}_{\alpha }:X\to \left[0,\infty \right]$ is the regularisation functional that depends on regularisation parameters α. From the analytical perspective, two main features of ${\mathcal{R}}_{\alpha }$ are important: first, it needs to possess properties that allow to guarantee that the corresponding solution map enjoys the stability and convergence properties as mentioned above (typically, lower semi-continuity and coercivity in some topology). Second, it needs to provide a good model of reasonable/expected solutions of Ku = f in the sense that ${\mathcal{R}}_{\alpha }\left(u\right)$ is small for such reasonable solutions and ${\mathcal{R}}_{\alpha }\left(u\right)$ is large for unreasonable solutions that suffer, for instance, from artefacts or noise.

While the first requirement is purely qualitative and known to be true for a wide range of norms and seminorms, the second requirement involves the modelling of expected solutions as well as suitable quantification, having in particular in mind that the outcome should be simple enough to be amenable to numerical solution algorithms. Suitable models are for instance provided by various classical smoothness measures such as Hilbert scales of smooth functions, i.e., by H^s-norms where s ⩾ 0, but also reflexive Banach-space norms such as L^p-norms, associated Sobolev-space seminorms in H^k,p for 1 < p < ∞, and Besov-space seminorms based on wavelet-coefficient expansions [39, 69, 170]. The reflexivity of the underlying spaces then helps to turn an ill-posed equation into a well-posed one, since the direct method in the calculus of variations can be employed with weak convergence.

However, there are reasons to consider Banach spaces that lack reflexivity, with L¹-spaces and spaces of Radon measures being prominent examples. Indeed, L¹-type norms as penalties in variational energies have seen a tremendous rise in popularity in the past two decades, most notably in the theory of compressed sensing [77]. This is due to their property of favouring sparsity in solutions, which allows to model more specific a priori assumptions on the expected solutions than generic smoothness, for instance. While sparsity in L¹-type spaces over discrete domains, such as spaces of wavelet coefficients, is directly amenable to analysis, sparsity for continuous domains requires to consider spaces of Radon measures and corresponding Radon-norm-type energies which are natural generalisations of L¹-type norms. Being the dual of a separable normed space then mitigates the non-reflexivity of these spaces. As a consequence, they play a major role in continuous models for sparsity-promoting variational regularisation strategies.

A particular example is the total variation functional [59, 162], see section 2 below for a precise definition, which can be interpreted as the Radon norm realised as a dual norm on the distributional derivative of u. As such, TV(u) is finite if and only if the distributional derivative of u can be represented by a finite Radon measure. The TV functional then penalises variations of u via a norm on its derivative while still being finite in the case of jump discontinuities, i.e., when u is piecewise smooth. In particular, its minimisation realises sparsity of the derivative which is often considered a suitable model for piecewise constant functions. In addition, it is convex and lower semi-continuous with respect to L^p-convergence for any p ∈ [1, ∞], and coercive up to constants in suitable L^p-norm topologies. These features make TV a reasonable model for piecewise constant solutions and allow to obtain well-posedness of TV regularisation for a broad class of inverse problems. They can be considered as some of the main reasons for the overwhelming popularity of TV in inverse problems, imaging sciences and beyond.

Naturally, the simplicity and desirable properties of TV come with a cost. As previously mentioned, interpreting TV as a functional that generalises the L¹-norm of the image gradient, compressed sensing theory suggests that this enforces sparsity of the gradient and hence piecewise constancy, i.e., one might expect that a TV-regularised function is non-constant only on low-dimensional subsets of its domain. While this might in fact be a feature if the sought solution is piecewise constant, it is not appropriate for general piecewise smooth data. Indeed, for non-piecewise-constant data, TV has the defect of producing artificial plateau-like structures in the reconstructions which became known as the staircasing effect of TV. This effect is nowadays well-understood analytically in the case of denoising [54, 139, 158], and recent results also provide an analytical confirmation of this fact in the context of inverse problems with finite-dimensional measurement data [26, 29]. The appearance of staircasing artefacts is in particular problematic since jump discontinuities are features which are, on the one hand, very prominent in visual perception and typically associated with relevant structures, and, on the other hand, important for automatic post-processing or interpretation of the data. As a result, it became an important research question in the past two decades how to improve upon this defect of TV regularisation while maintaining its desirable features, especially the sparsity-enforcing properties.

One first possible answer to this question is to consider the second-order total variation which is the Radon norm of the distributional second derivative. Indeed, if first-order total variation enforces sparse gradients, i.e., a distributional derivative that is supported on lower-dimensional subsets and hence, corresponding to a piecewise constant function, one might expect an analogous behaviour for the second derivative, resulting in piecewise affine functions. The smooth linear ramps that are characteristic for the latter are then typically not conceived as staircasing artefacts such that one might hope for a reduction of these artefacts by employing second-order total variation. These considerations motivate, generally, higher-order total variation as sparsity-enforcing regulariser for smooth regions in an image. However, as also described in section 3 below, pure higher-order total variation regularisation comes with some drawbacks, mainly the inability to recover jump discontinuities. To account for this, several extensions and alternatives have been proposed in the literature which still have the common theme of using higher-order derivatives and sparsity-enforcing functionals to model smooth regions in images.

This review is concerned with these developments with a focus on approaches that are related to the incorporation of higher-order derivatives and maintenance of the sparsity concepts realised by the Radon norm and the underlying spaces of Radon measures. These developments indeed resulted in a variety of different variational regularisation strategies, for which some are very successful in achieving the goal of providing an amenable model for piecewise smooth solutions. It is also a central message of this review that the success of higher-order TV model in terms of modelling and regularisation effect depends very much on the structure and the functional-analytic setting in which the higher-order derivatives are included. Following this insight, we will discuss different higher-order regularisation functionals such as higher-order total variation, the infimal-convolution of higher-order TV as well as the total generalised variation (TGV), which carries out a cascadic decomposition to different orders of differentiation. Starting form the analytical framework of the total-variation functional and functions of bounded variation, we will introduce and analyse several higher-order approaches in a continuous setting, discuss their regularisation properties in a Tikhonov regularisation framework, introduce appropriate discretisations as well as numerical solution strategies for the resulting energy minimisation problems, and present various applications in image processing, computer vision, biomedical imaging and beyond.

Nevertheless, due to the broad range of the topic as well as the many works published in its environment, it is impossible to give a complete overview. The various references to the literature given throughout the paper therefore only represent a selection. Let us also point out that we selected the presented material in particular on a basis that, on the one hand, enables a treatment that is as unified as possible. On the other hand, a clear focus is put on approaches for which the whole pipeline ranging from mathematical modelling, embedding into a functional-analytic context, proof of regularisation properties, numerical discretisation, optimisation algorithms and efficient implementation can be covered. In addition, extensions and further developments will shortly be pointed out when appropriate. Especially, many of the applications in image processing, computer vision, medical imaging and image reconstruction refer to these extensions. The applications were further chosen to represent a wide spectrum of inverse problems, their variational modelling and higher-order TV-type regularisation, and, not negligible, successful realisation of the presented theory. We finally aimed at providing a maximal amount of useful information regarding theory and practical realisation in this context.

2. Total-variation (TV) regularisation

Before discussing higher-order total variation and how it may be used to regularise ill-posed inverse problems, let us begin with an overview of first-order total variation. Throughout the review, we mainly adapt a continuous viewpoint which means that the objects of interest are usually functions on some fixed domain Ω, i.e., an non-empty, open and connected subset Ω ⊂ R^d in the d-dimensional Euclidean space. This requires in particular a common functional-analytic context for which we assume that the reader is familiar with and refer to the books [1, 85, 204] for further information. In the following, we will make, for instance, use of the Lebesgue spaces L^p(Ω, H) for H-valued functions where H is a finite-dimensional real Hilbert space as well as their measure-theoretic and functional-analytic properties. Also, concepts of weak differentiability and properties of the associated Sobolev spaces H^k,p(Ω, H) will be utilised without further introduction. This moreover applies to the classical spaces such as $\mathcal{C}\left(\overline{{\Omega}},H\right)$ , ${\mathcal{C}}_{\mathrm{c}}\left({\Omega},H\right)$ and ${\mathcal{C}}_{0}\left({\Omega},H\right)$ , i.e., the spaces of uniformly continuous functions on $\overline{{\Omega}}$ , of compactly supported continuous functions on Ω and its closure with respect to the supremum norm. As usual, the respective spaces of k-times continuously differentiable functions are denoted by ${\mathcal{C}}^{k}\left(\overline{{\Omega}},H\right)$ , ${\mathcal{C}}_{\mathrm{c}}^{k}\left({\Omega},H\right)$ and ${\mathcal{C}}_{0}^{k}\left({\Omega},H\right)$ where k could also be infinity, leading to spaces of test functions.

We further employ, throughout this section, basic concepts from convex analysis and optimisation. At this point, we would like to recall that for a convex function $F:X\to \left.\right]- \infty ,\infty \left.\right]$ defined on a Banach space X, the subgradient ∂F(x) at a point u ∈ X is the collection of all w ∈ X* that satisfy the subgradient inequality

$\begin{equation*}F\left(u\right)+{\langle w,\enspace v-u\rangle }_{{X}^{{\ast}}{\times}X}{\leqslant}F\left(v\right)\quad \text{for}\enspace \text{all}\quad v\in X.\end{equation*}$

For F proper, the Fenchel dual or Fenchel conjugate of F is the function ${F}^{{\ast}}:{X}^{{\ast}}\to \left.\right]- \infty ,\infty \left.\right]$ defined by

$\begin{equation*}{F}^{{\ast}}\left(w\right)=\underset{u\in X}{\mathrm{sup}}\enspace {\langle w,\enspace u\rangle }_{{X}^{{\ast}}{\times}X}-F\left(u\right).\end{equation*}$

The Fenchel inequality then states that ${\langle w,\enspace u\rangle }_{{X}^{{\ast}}{\times}X}{\leqslant}F\left(u\right)+{F}^{{\ast}}\left(w\right)$ for all u ∈ X and w ∈ X* with equality if and only if w ∈ ∂F(u). For more details regarding these notions and convex analysis in general, we refer to research monographs covering this subject, for instance [83, 198].

2.1. Functions of bounded variation

Generally, when solving a specific ill-posed inverse problem with, for instance, Tikhonov regularisation, one usually has many choices regarding the regularisation functional. Now, while functionals associated with Hilbertian norms or seminorms possess several advantages such as smoothness and allow, in addition, for regularisation strategies that can be computed by solving a linear equation, they are often not able to provide a good model for piecewise smooth functions. This can, for instance, be illustrated as follows.

Example 2.1. Classical Sobolev spaces cannot contain non-trivial piecewise constant functions. Let Ω ⊂ R^d be a domain and Ω' ⊂ Ω be non-empty, open with ∂Ω' a null set. Then, the characteristic function u = χ_Ω', i.e., u(x) = 1 if x ∈ Ω' and 0 otherwise, is not contained in H^1,p(Ω) for any p ∈ [1, ∞]. To see this, suppose that v ∈ L^p(Ω, R^d) is the weak derivative of u. Let $\varphi \in {\mathcal{C}}_{\mathrm{c}}^{\infty }\left({{\Omega}}^{\prime }\right)$ be a test function. Clearly,

$\begin{equation*}{\int }_{{\Omega}}v\cdot \varphi \enspace \mathrm{d}x=-{\int }_{{\Omega}}u\enspace \mathrm{d}\mathrm{i}\mathrm{v}\enspace \varphi \enspace \mathrm{d}x=-{\int }_{{{\Omega}}^{\prime }}\mathrm{d}\mathrm{i}\mathrm{v}\enspace \varphi \enspace \mathrm{d}x=0.\end{equation*}$

Hence, v = 0 on Ω'. Likewise, one sees that also v = 0 on ${\Omega}{\backslash}\overline{{{\Omega}}^{\prime }}$ . In total, v = 0 almost everywhere and as v is the weak derivative of u, u must be constant which is a contradiction.

The defect which is responsible for the failure of characteristic functions being (classical) Sobolev functions can, however, be remedied by allowing weak derivatives to be Radon measures. These are in particular able to concentrate on Lebesgue null-sets; a property that is necessary as the previous example just showed. In the following, we introduce some basic notions and results about vector-valued Radon measures, in particular, with an eye of embedding them into a functional-analytic framework. Moreover, we would like to have these notions readily available when dealing with higher-order derivatives and the associated higher-order total variation.

Throughout this section, let Ω ⊂ R^d be a domain and H a non-trivial finite-dimensional real Hilbert space with ⋅ and $\left\vert \cdot \right\vert$ denoting the associated scalar product and norm, respectively. As usual, the case H = R corresponds to the scalar case and H = R^d to the vector-field case, but, as we will see later, H could also be a space of higher-order tensors. The following definitions and statements regarding basic measure theory and can, for instance, be found in [7].

Definition 2.2. A vector-valued Radon measure or H -valued Radon measure on Ω is a function $\mu :\mathcal{B}\left({\Omega}\right)\to H$ on the Borel σ-algebra $\mathcal{B}\left({\Omega}\right)$ associated with the standard topology on Ω satisfying the following properties:

(a)
It holds that μ(Ø) = 0,
(b)
For each pairwise disjoint countable collection A₁, A₂, .... in $\mathcal{B}\left({\Omega}\right)$ it holds that $\mu \left({\bigcup }_{i\in \mathbf{N}}{A}_{i}\right)={\sum }_{i=1}^{\infty }\mu \left({A}_{i}\right)$ in H.

A positive Radon measure is a function $\mu :\mathcal{B}\left({\Omega}\right)\to \left[0,\infty \right]$ satisfying (a), (b) (with H replaced by [0, ∞]) as well as μ(K) < ∞ for each compact K ⊂⊂ Ω. It is called finite, if μ(Ω) < ∞.

Naturally, vector-valued Radon measures can be associated to an integral. For μ an H-valued Radon measure and step functions $u={\sum }_{j=1}^{N}{c}_{j}{\chi }_{{A}_{j}}$ , $v={\sum }_{j=1}^{N}{v}_{j}{\chi }_{{A}_{j}}$ with c₁, ..., c_N ∈ R, v₁, ..., v_N ∈ H and ${A}_{1},\dots ,{A}_{N}\in \mathcal{B}\left({\Omega}\right)$ , the following integrals make sense:

$\begin{equation*}{\int }_{{\Omega}}u\enspace \mathrm{d}\mu =\sum _{j=1}^{N}{c}_{j}\mu \left({A}_{j}\right)\in H,\quad {\int }_{{\Omega}}v\cdot \enspace \mathrm{d}\mu =\sum _{j=1}^{N}{v}_{j}\cdot \mu \left({A}_{j}\right)\in \mathbf{R}.\end{equation*}$

For uniformly continuous functions $u:\overline{{\Omega}}\to \mathbf{R}$ and $v:\overline{{\Omega}}\to H$ , the integrals are given as

$\begin{equation*}{\int }_{{\Omega}}u\enspace \mathrm{d}\mu =\underset{n\to \infty }{\mathrm{lim}}{\int }_{{\Omega}}{u}^{n}\enspace \mathrm{d}\mu ,\quad {\int }_{{\Omega}}v\cdot \enspace \mathrm{d}\mu =\underset{n\to \infty }{\mathrm{lim}}{\int }_{{\Omega}}{v}^{n}\cdot \enspace \mathrm{d}\mu \end{equation*}$

where {uⁿ} and {vⁿ} are sequences of step functions converging uniformly to u and v, respectively. Of course, the above integrals are well-defined, meaning that there are approximating sequences as stated and the above limits exist independently of the specific choice of the approximating sequences. The following definition is the basis for introducing a norm for H-valued Radon measures.

Definition 2.3. For a vector-valued Radon measure μ on Ω the positive Radon measure $\left\vert \mu \right\vert$ given by

$\begin{equation*}\left\vert \mu \right\vert \left(A\right)=\mathrm{sup}\enspace \left\{\sum _{i=1}^{\infty }\left\vert \mu \left({A}_{i}\right)\right\vert \enspace \vert \enspace {A}_{1},{A}_{2},\enspace \dots \in \mathcal{B}\left({\Omega}\right)\enspace \text{pairwise}\;\text{disjoint},\enspace {A}_{i}\subset A\enspace \text{for}\;\text{all}\enspace i\in \mathbf{N}\right\}\end{equation*}$

is called the total-variation measure of μ.

The total-variation measure is always positive and finite, i.e., $0{\leqslant}\left\vert \mu \right\vert \left(A\right){< }\infty$ for all $A\in \mathcal{B}\left({\Omega}\right)$ . By construction, μ is absolutely continuous with respect to $\left\vert \mu \right\vert$ , i.e., μ(A) = 0 whenever $\left\vert \mu \right\vert \left(A\right)=0$ for a $A\in \mathcal{B}\left({\Omega}\right)$ . By Radon–Nikodým's theorem, we thus have that each H-valued Radon measure μ can be written as $\mu ={\sigma }_{\mu }\left\vert \mu \right\vert$ with ${\sigma }_{\mu }\in {L}_{\left\vert \mu \right\vert }^{\infty }\left({\Omega},H\right)$ such that ${{\Vert}{\sigma }_{\mu }{\Vert}}_{\infty }{\leqslant}1$ and $\left\vert {\sigma }_{\mu }\right\vert =1$ almost everywhere with respect to $\left\vert \mu \right\vert$ . In this light, integration can also be phrased as

$\begin{equation*}{\int }_{{\Omega}}u\enspace \mathrm{d}\mu ={\int }_{{\Omega}}u{\sigma }_{\mu }\enspace \mathrm{d}\left\vert \mu \right\vert ,\quad {\int }_{{\Omega}}v\cdot \enspace \mathrm{d}\mu ={\int }_{{\Omega}}v\cdot {\sigma }_{\mu }\enspace \mathrm{d}\left\vert \mu \right\vert \end{equation*}$

for $u:\overline{{\Omega}}\to \mathbf{R}$ , $v:\overline{{\Omega}}\to H$ uniformly continuous. The following theorem, which is a direct consequence of [163, theorem 6.19], provides a useful characterisation of the space of vector valued measures as the dual of a separable space.

Proposition 2.4. The space $\mathcal{M}\left({\Omega},H\right)$ of all vector-valued Radon measures equipped with the norm ${{\Vert}\mu {\Vert}}_{\mathcal{M}}=\left\vert \mu \right\vert \left({\Omega}\right)$ for $\mu \in \mathcal{M}\left({\Omega},H\right)$ is a Banach space.

It can be identified with the dual space ${\mathcal{C}}_{0}{\left({\Omega},H\right)}^{{\ast}}$ as follows. For each $T\in {\mathcal{C}}_{0}{\left({\Omega},H\right)}^{{\ast}}$ there exists a unique $\mu \in \mathcal{M}\left({\Omega},H\right)$ such that

$\begin{equation*}{{\Vert}T{\Vert}}_{{\mathcal{C}}_{0}^{{\ast}}}={{\Vert}\mu {\Vert}}_{\mathcal{M}},\quad T\left(\varphi \right)={\int }_{{\Omega}}\varphi \cdot \enspace \mathrm{d}\mu \quad \text{for}\enspace \text{all}\quad \varphi \in {\mathcal{C}}_{0}\left({\Omega},H\right).\end{equation*}$

In particular, one has a notion of weak^*-convergence of Radon measures. For a sequence {μⁿ} and an element μ* in $\mathcal{M}\left({\Omega},H\right)$ we have that ${\mu }^{n}{\ast}{\rightharpoonup }{\mu }^{{\ast}}$ in $\mathcal{M}\left({\Omega},H\right)$ if

$\begin{equation*}\text{for}\;\text{all}\quad \varphi \in {\mathcal{C}}_{0}\left({\Omega},H\right):\quad {\int }_{{\Omega}}\varphi \cdot \enspace \mathrm{d}{\mu }^{n}\to {\int }_{{\Omega}}\varphi \cdot \enspace \mathrm{d}{\mu }^{{\ast}}\quad \text{as}\quad n\to \infty .\end{equation*}$

As the predual space ${\mathcal{C}}_{0}\left({\Omega},H\right)$ is separable, the Banach–Alaoglu theorem yields in particular the sequential relative weak^*-compactness of bounded sets. That means for instance that a bounded sequence always admits a weakly^*-convergent subsequence, a property that may compensate for the lack of reflexivity of $\mathcal{M}\left({\Omega},H\right)$ .

The interpretation as a dual space as well as the density of test functions in ${\mathcal{C}}_{0}\left({\Omega},H\right)$ also allows to conclude that in order for a linear functional T defining a Radon measure, it suffices to test against $\varphi \in {\mathcal{C}}_{\mathrm{c}}^{\infty }\left({\Omega},H\right)$ and to establish $\left\vert T\left(\varphi \right)\right\vert {\leqslant}C{{\Vert}\varphi {\Vert}}_{\infty }$ for all $\varphi \in {\mathcal{C}}_{\mathrm{c}}^{\infty }\left({\Omega},H\right)$ and C > 0 independent of φ. This is useful for derivatives, i.e., the derivative of a $u\in {L}_{\mathrm{l}\mathrm{o}\mathrm{c}}^{1}\left({\Omega},H\right)$ defines a Radon measure in $\mathcal{M}\left({\Omega},{H}^{d}\right)$ if

$\begin{equation}\left\vert {\int }_{{\Omega}}u\cdot \mathrm{d}\mathrm{i}\mathrm{v}\enspace \varphi \enspace \mathrm{d}x\right\vert {\leqslant}C{{\Vert}\varphi {\Vert}}_{\infty }\quad \text{for}\enspace \text{all}\quad \varphi \in {\mathcal{C}}_{\mathrm{c}}^{\infty }\left({\Omega},{H}^{d}\right).\end{equation} \tag{ 1 }$

In this case, we denote by $\nabla u\in \mathcal{M}\left({\Omega},{H}^{d}\right)$ the unique H^d-valued Radon measure for which ${\int }_{{\Omega}}$ φ ⋅ d∇u = − ${\int }_{{\Omega}}$ u div φ dx for all $\varphi \in {\mathcal{C}}_{\mathrm{c}}^{\infty }\left({\Omega},{H}^{d}\right)$ . Here, H^d is equipped with the scalar product $x\cdot y={\sum }_{i=1}^{d}{x}_{i}\cdot {y}_{i}$ for x, y ∈ H^d. In the case where (1) fails, there exists a sequence {φⁿ} in ${\mathcal{C}}_{\mathrm{c}}^{\infty }\left({\Omega},{\mathbf{R}}^{d}\right)$ with ${{\Vert}{\varphi }^{n}{\Vert}}_{\infty }=1$ and $\left\vert {\int }_{{\Omega}}u\cdot \mathrm{d}\mathrm{i}\mathrm{v}\enspace \varphi \enspace \mathrm{d}x\right\vert \to \infty$ as n → ∞. Thus, allowing the supremum to take the value ∞, this yields following definition.

Definition 2.5. The total variation of a $u\in {L}_{\mathrm{l}\mathrm{o}\mathrm{c}}^{1}\left({\Omega},H\right)$ is the value

$\begin{equation*}\mathrm{T}\mathrm{V}\left(u\right)=\mathrm{sup}\enspace \left\{{\int }_{{\Omega}}u\cdot \mathrm{d}\mathrm{i}\mathrm{v}\enspace \varphi \enspace \mathrm{d}x\enspace \vert \enspace \varphi \in {\mathcal{C}}_{\mathrm{c}}^{\infty }\left({\Omega},{H}^{d}\right),\enspace {{\Vert}\varphi {\Vert}}_{\infty }{\leqslant}1\right\}.\end{equation*}$

Clearly, in case TV(u) < ∞, we have $\nabla u\in \mathcal{M}\left({\Omega},{H}^{d}\right)$ with ${{\Vert}\nabla u{\Vert}}_{\mathcal{M}}=\mathrm{T}\mathrm{V}\left(u\right)$ . Trivially, for scalar functions, i.e., H = R, one recovers the well-known definition [7, 162]. Also, one immediately sees that TV is invariant to translations and rotations, or, more generally, to Euclidean-distance preserving transformations. This is the reason that this definition is also referred to as the isotropic total variation.

Example 2.6. Piecewise constant functions may have a Radon measure as derivative. Let Ω' ⊂ Ω be a subdomain such that ∂Ω' ∩ Ω can be parameterised by finitely many Lipschitz mappings. Then, the outer normal ν exists almost everywhere in ∂Ω' ∩ Ω with respect to the Hausdorff ${\mathcal{H}}^{d-1}$ measure and one can employ the divergence theorem. This yields, for u = χ_Ω' and $\varphi \in {\mathcal{C}}_{\mathrm{c}}^{\infty }\left({\Omega},{\mathbf{R}}^{d}\right)$ with ||φ||_∞ ⩽ 1 that

$\begin{equation*}{\int }_{{\Omega}}u\enspace \mathrm{d}\mathrm{i}\mathrm{v}\enspace \varphi \enspace \mathrm{d}x={\int }_{\partial {{\Omega}}^{\prime }\cap {\Omega}}\varphi \cdot \nu \enspace \mathrm{d}{\mathcal{H}}^{d-1}={\int }_{{\Omega}}\varphi \cdot \enspace \mathrm{d}\nu {\mathcal{H}}^{d-1}\enspace \llcorner \enspace \left(\partial {{\Omega}}^{\prime }\cap {\Omega}\right){\leqslant}{\mathcal{H}}^{d-1}\left(\partial {{\Omega}}^{\prime }\cap {\Omega}\right)\end{equation*}$

so $\nabla u=-\nu {\mathcal{H}}^{d-1}\enspace \llcorner \enspace \left(\partial {{\Omega}}^{\prime }\cap {\Omega}\right)$ is a Radon measure. One sees, for instance via approximation, that ${{\Vert}\nabla u{\Vert}}_{\mathcal{M}}={\mathcal{H}}^{d-1}\left(\partial {{\Omega}}^{\prime }\cap {\Omega}\right)$ .

The class of sets Ω' ⊂ Ω for which χ_Ω' possesses a Radon measure as weak derivative is actually much greater than the class of bounded Lipschitz domains. These are the sets of finite perimeter, denoted by $\mathrm{P}\mathrm{e}\mathrm{r}\left({{\Omega}}^{\prime }\right)={{\Vert}\nabla {\chi }_{{{\Omega}}^{\prime }}{\Vert}}_{\mathcal{M}}$ . One the other hand, for u ∈ H^1,1(Ω), we have $\mathrm{T}\mathrm{V}\left(u\right)={\int }_{{\Omega}}\left\vert \nabla u\right\vert \enspace \mathrm{d}x$ and the weak derivative as Radon measure is just $\nabla u{\mathcal{L}}^{d}$ , i.e., the Sobolev derivative interpreted as a weight on the Lebesgue measure. Collecting all functions whose weak derivative is a Radon measure, we arrive at the following space.

Definition 2.7. The space

$\begin{equation*}\mathrm{B}\mathrm{V}\left({\Omega},H\right)=\left\{u\in {L}^{1}\left({\Omega},H\right)\enspace \vert \enspace \mathrm{T}\mathrm{V}\left(u\right){< }\infty \right\},\quad {{\Vert}u{\Vert}}_{\mathrm{B}\mathrm{V}}={{\Vert}u{\Vert}}_{1}+\mathrm{T}\mathrm{V}\left(u\right)\end{equation*}$

is the space of H-valued functions of bounded variation. In case H = R, we denote by BV(Ω) = BV(Ω, R) and just refer to functions of bounded variation.

Proposition 2.8. The space BV(Ω, H) with the associated norm is a Banach space. The total variation functional TV is a continuous seminorm on BV(Ω, H) which vanishes exactly at the constant functions, i.e., ker(TV) = H 1, with H 1 being the set of constant, H-valued functions.

The total variation functional is just designed to possess many convenient properties [7].

Proposition 2.9.

The functional TV is proper, convex and lower semi-continuous on each L^p(Ω, H), i.e., for 1 ⩽ p ⩽ ∞.
For 1 ⩽ p < ∞, each u ∈ BV(Ω, H) ∩ L^p(Ω, H) can smoothly be approximated as follows: for ɛ > 0, there exists ${u}^{\varepsilon }\in {\mathcal{C}}^{\infty }\left({\Omega},H\right)\cap \mathrm{B}\mathrm{V}\left({\Omega},H\right)\cap {L}^{p}\left({\Omega},H\right)$ such that
$\begin{equation*}{{\Vert}u-{u}^{\varepsilon }{\Vert}}_{p}{\leqslant}\varepsilon ,\quad \left\vert \mathrm{T}\mathrm{V}\left(u\right)-\mathrm{T}\mathrm{V}\left({u}^{\varepsilon }\right)\right\vert {\leqslant}\varepsilon .\end{equation*}$
If Ω is a bounded Lipschitz domain, then there exists a constant C > 0 such that for each u ∈ BV(Ω, H) with ${\int }_{{\Omega}}$ u dx = 0, the Poincaré–Wirtinger estimate
$\begin{equation*}{{\Vert}u{\Vert}}_{d/\left(d-1\right)}{\leqslant}C\mathrm{T}\mathrm{V}\left(u\right)\end{equation*}$
holds.

From the regularisation-theoretic point of view, the fact that TV is proper, convex and lower semi-continuous on Lebesgue spaces is relevant, a property that fails for the Sobolev-seminorm ||∇ ⋅ ||₁. The Poincaré–Wirtinger estimate can be interpreted as a coercivity property on a subspace with codimension 1. Also note that this estimate is the same as for H^1,1(Ω, H)-functions and the respective constants C coincide. Consequently, the embedding properties of the latter space transfer immediately.

Proposition 2.10. Let Ω is a bounded Lipschitz domain. Then,

The embedding BV(Ω, H) ↪ L^d/(d−1)(Ω, H) (with d/(d − 1) = ∞ for d = 1) exists and is continuous.
The embedding BV(Ω, H) ↪ L^p(Ω, H) is compact for each 1 ⩽ p < d/(d − 1).
Each bounded sequence {uⁿ} in BV(Ω, H) possesses a subsequence $\left\{{u}^{{n}_{k}}\right\}$ which converges to a u ∈ BV(Ω, H) weak^* in BV(Ω, H), which we define as ${u}^{{n}_{k}}\to u$ in L¹(Ω, H), $\nabla {u}^{{n}_{k}}{\ast}{\rightharpoonup }\nabla u$ in $\mathcal{M}\left({\Omega},{H}^{d}\right)$ as k → ∞.

Consequently, the total variation is suitable for regularising ill-posed inverse problems in certain L^p-spaces.

2.2. Tikhonov regularisation

Let us now turn to solving ill-posed inverse problems with Tikhonov regularisation and BV-based penalty, i.e., solving

$\begin{equation*}Ku=f\end{equation*}$

for some data f in a Banach space Y. As mentioned in the introduction, since the focus of this review is on regularisation terms rather than tackling inverse problems in the most possible generality, we restrict ourselves here to linear and continuous forward operators K : L^d/(d−1)(Ω) → Y. Nevertheless we note that, building on the results developed here for the linear setting, an extension to non-linear operators typically boils down to ensuring additional requirements on the non-linear forward model rather than the regularisation term, see for instance [84, 106, 182].

Measuring the discrepancy in terms of the norm in Y, the problem is then to solve

$\begin{equation*}\underset{u\in \mathrm{B}\mathrm{V}\left({\Omega}\right)}{\mathrm{min}}\enspace \frac{{{\Vert}Ku-f{\Vert}}_{Y}^{q}}{q}+\alpha {\int }_{{\Omega}}\enspace \mathrm{d}\left\vert \nabla u\right\vert \end{equation*}$

for some exponent q ⩾ 1. Usually, Y is some Hilbert space and q = 2, resulting in a quadratic discrepancy, which is often used in case of Gaussian noise. For impulsive noise (or salt-and-pepper noise), the space Y = L¹(Ω'), with Ω' a domain, turns out to be useful. In case of Poisson noise, however, it is not advisable to take the norm but rather the Kullback–Leibler divergence between Ku and f, i.e. KL(Ku, f), where KL is given, for f ∈ L¹(Ω') with f ⩾ 0 almost everywhere, according to the non-negative integral

$\begin{equation}\mathrm{K}\mathrm{L}\left(v,f\right)={\int }_{{{\Omega}}^{\prime }}f\left(\frac{v}{f}-\mathrm{log}\left(\frac{v}{f}\right)-1\right)\enspace \mathrm{d}x\end{equation} \tag{ 2 }$

provided that v ⩾ 0 a.e., and ∞ else. In particular, in this context, we agree to set the integrand to v where f = 0 and to ∞ where v = 0 and f > 0.

In the following, we assume to have given a discrepancy functional S_f : Y → [0, ∞] that is proper, convex, lower semi-continuous and coercive. This is not the most general case but will be sufficient for us in order to ensure existence of minimisers of the Tikhonov functional.

Theorem 2.11. Let Ω be a bounded Lipschitz domain, Y be a Banach space, K : L^d/(d−1)(Ω) → Y linear and continuous (weak^*-to-weak-continuous in case d = 1), S_f : Y → [0, ∞] a proper, convex, lower semi-continuous and coercive discrepancy functional associated with some data f and α > 0. Then, there exist solutions of

$\begin{equation}\underset{u\in {L}^{d/\left(d-1\right)}\left({\Omega}\right)}{\mathrm{min}}\enspace {S}_{f}\left(Ku\right)+\alpha \mathrm{T}\mathrm{V}\left(u\right).\end{equation} \tag{ 3 }$

If S_f is strictly convex and K is injective, the solution is unique whenever the minimum is finite.

We provide the proof for the sake of completeness and as a prototype for the generalisation to higher-order functionals.

Proof. Assume that the objective functional in (3) is proper, otherwise, there is nothing to show. For a minimising sequence {uⁿ}, the Poincaré–Wirtinger inequality gives boundedness of $\left\{{u}^{n}-{\left\vert {\Omega}\right\vert }^{-1}{\int }_{{\Omega}}{u}^{n}\enspace \mathrm{d}x\right\}$ in L^d/(d−1)(Ω) while the coercivity of S_f yields the boundedness of {Kuⁿ}. By continuity, $\left\{K\left({u}^{n}-{\left\vert {\Omega}\right\vert }^{-1}{\int }_{{\Omega}}{u}^{n}\enspace \mathrm{d}x\right)\right\}$ must be bounded, so if K 1 ≠ 0, then { ${\int }_{{\Omega}}$ uⁿ dx} is bounded as otherwise, {Kuⁿ} would be unbounded. In the case that K 1 = 0, we can without loss of generality assume that ${\int }_{{\Omega}}$ uⁿ dx = 0 for all n as shifting along constants does not change the functional value. In each case, { ${\int }_{{\Omega}}$ uⁿ dx} is bounded, so {uⁿ} must be bounded in L^d/(d−1)(Ω). Hence, by compact embedding (proposition 2.10) we have ${u}^{{n}_{k}}\to {u}^{{\ast}}$ in L¹(Ω) as k → ∞ for a subsequence $\left\{{u}^{{n}_{k}}\right\}$ and u* ∈ BV(Ω). Reflexivity and continuity of K (weak^* sequential compactness and weak^*-to-weak continuity in case d = 1) give $K{u}^{{n}_{k}}\rightharpoonup K{u}^{{\ast}}$ in Y for another subsequence (not relabelled). By lower semi-continuity, u* has to be a solution to (3).

Finally, if S_f is strictly convex and K is injective, then S_f ◦ K is already strictly convex, so minimisers have to be unique. □

Example 2.12.

The discrepancy functional ${S}_{f}\left(v\right)=\frac{1}{q}{{\Vert}v-f{\Vert}}_{Y}^{q}$ for some f ∈ Y is obviously proper, convex, lower semi-continuous and coercive.
It follows from lemma A.1 in the appendix that the discrepancy S_f(v) = KL(v, f) defined on Y = L¹(Ω') for f ∈ L¹(Ω') with f ⩾ 0 almost everywhere is proper, convex and coercive in L¹(Ω'). Lower semi-continuity in turn follows as special case of lemma A.2.

Remark 2.13. Note that if the inversion of K : L^p(Ω) → Y is well-posed for some p ∈ [1, ∞], then solutions of (3) still exist (even for α = 0). Clearly, the TV penalty is not necessary for obtaining a regularising effect for these problems. In this case, minimising the Tikhonov function with TV penalty may the interpreted as denoising. The most prominent example might be the Rudin–Osher–Fatemi problem [162] which reads as

$\begin{equation*}\underset{u\in {L}^{2}\left({\Omega}\right)}{\mathrm{min}}\enspace \frac{1}{2}{\int }_{{\Omega}}{\left\vert u-f\right\vert }^{2}\enspace \mathrm{d}x+\alpha \mathrm{T}\mathrm{V}\left(u\right)\end{equation*}$

for f ∈ L²(Ω). Here, as the identity is 'inverted', the effect of total-variation regularisation can be studied in detail. Minimisation problems of this type with other regularisation functionals are thus a good benchmark test for the properties of this functional.

The stability of solutions in case of varying f depends, of course, on the dependence of S_f on f. The appropriate notion here is the convergence of the discrepancy functional, i.e., for a sequence {fⁿ} and limit f, we say that ${S}_{{f}^{n}}$ converges to S_f if

$\begin{equation}\left\{\begin{array}{cccc}{S}_{f}\left(v\right)\hfill & {\leqslant}\underset{n\to \infty }{\text{lim inf}}\enspace {S}_{{f}^{n}}\left({v}^{n}\right)\hfill & \quad \text{whenever}\quad \hfill & {v}^{n}\rightharpoonup v\enspace \text{in}\enspace Y,\hfill \\ {S}_{f}\left(v\right)\hfill & {\geqslant}\underset{n\to \infty }{\text{lim sup}}\enspace {S}_{{f}^{n}}\left(v\right)\hfill & \quad \text{for}\;\text{each} \hfill & v\in Y.\hfill \end{array}\right.\end{equation} \tag{ 4 }$

Moreover, we say that $\left\{{S}_{{f}^{n}}\right\}$ is equi-coercive if there is a coercive function S₀ : Y → [0, ∞] such that ${S}_{{f}^{n}}{\geqslant}{S}_{0}$ in Y for each n.

Theorem 2.14. In the situation of theorem 2.11, assume that ${S}_{{f}^{n}}$ converges to S_f in the sense of (4) and $\left\{{S}_{{f}^{n}}\right\}$ is equi-coercive. Then, for each sequence of minimisers {uⁿ} of (3) with discrepancy ${S}_{{f}^{n}}$ ,

Either ${S}_{{f}^{n}}\left(K{u}^{n}\right)+\alpha \mathrm{T}\mathrm{V}\left({u}^{n}\right)\to \infty$ as n → ∞ and (3) with discrepancy f does not admit a finite solution.
Or ${S}_{{f}^{n}}\left(K{u}^{n}\right)+\alpha \mathrm{T}\mathrm{V}\left({u}^{n}\right)\to {\mathrm{min}}_{u\in {L}^{d/\left(d-1\right)}\left({\Omega}\right)}{S}_{f}\left(u\right)+\alpha \mathrm{T}\mathrm{V}\left(u\right)$ as n → ∞ and there is, possibly up to constant shifts, a weak accumulation point u ∈ L^d/(d−1)(Ω) (weak^* accumulation point for d = 1) that minimises (3) with discrepancy S_f.

For each subsequence $\left\{{u}^{{n}_{k}}\right\}$ weakly converging to some u in L^d/(d−1)(Ω) ( ${u}^{{n}_{k}}{\ast}{\rightharpoonup }u$ in case d = 1), it holds that $\mathrm{T}\mathrm{V}\left({u}^{{n}_{k}}\right)\to \mathrm{T}\mathrm{V}\left(u\right)$ as k → ∞ and u solves (3) with discrepancy S_f. If solutions to the latter are unique, we have uⁿ ⇀ u in L^d/(d−1)(Ω) ( ${u}^{n}{\ast}{\rightharpoonup }u$ in case d = 1).

Proof. Let, in the following ${\int }_{{\Omega}}$ uⁿ dx = 0 for all n if K 1 = 0 and denote by F = S_f ◦ K + αTV as well as ${F}_{n}={S}_{{f}^{n}}\enspace {\circ}\enspace K+\alpha \mathrm{T}\mathrm{V}$ . First of all, suppose that {F_n(uⁿ)} is bounded. As $\left\{{S}_{{f}^{n}}\right\}$ is equi-coercive, we can conclude as in the proof of theorem 2.11 that {uⁿ} is bounded. Therefore, a weak accumulation point (weak^* in case d = 1) exists.

Suppose that ${u}^{{n}_{k}}\rightharpoonup u$ as k → ∞. Then,

$\begin{equation*}{S}_{f}\left(Ku\right){\leqslant}\underset{k\to \infty }{\text{lim inf}}\enspace {S}_{{f}^{{n}_{k}}}\left(K{u}^{{n}_{k}}\right),\quad \mathrm{T}\mathrm{V}\left(u\right){\leqslant}\underset{k\to \infty }{\text{lim inf}}\enspace \mathrm{T}\mathrm{V}\left({u}^{{n}_{k}}\right)\end{equation*}$

as well as, for each u' ∈ L^d/(d−1)(Ω)

$\begin{align}\hfill F\left(u\right)& {\leqslant}\underset{k\to \infty }{\text{lim inf}}{S}_{{f}^{{n}_{k}}}\left(K{u}^{{n}_{k}}\right)+\alpha \mathrm{T}\mathrm{V}\left({u}^{{n}_{k}}\right)\\ \hfill & \hfill {\leqslant}\underset{k\to \infty }{\text{lim sup}}{S}_{{f}^{{n}_{k}}}\left(K{u}^{\prime }\right)+\alpha \mathrm{T}\mathrm{V}\left({u}^{\prime }\right){\leqslant}F\left({u}^{\prime }\right).\end{align}$

Thus, u is a minimiser for F and plugging in u' = u we see that ${\mathrm{lim}}_{k\to \infty }{F}_{{n}_{k}}\left({u}^{{n}_{k}}\right)=F\left(u\right)$ . In order to obtain ${\mathrm{lim}}_{k\to \infty }\mathrm{T}\mathrm{V}\left({u}^{{n}_{k}}\right)=\mathrm{T}\mathrm{V}\left(u\right)$ , suppose that ${\text{lim sup}}_{k\to \infty }\mathrm{T}\mathrm{V}\left({u}^{{n}_{k}}\right){ >}\mathrm{T}\mathrm{V}\left(u\right)$ , such that

$\begin{equation*}\underset{k\to \infty }{\text{lim inf}}\enspace {S}_{{f}^{{n}_{k}}}\left(K{u}^{{n}_{k}}\right){\leqslant}\underset{k\to \infty }{\,\mathrm{lim}}{F}_{{n}_{k}}\left({u}^{{n}_{k}}\right)-\alpha \underset{k\to \infty }{\text{lim sup}}\enspace \mathrm{T}\mathrm{V}\left({u}^{{n}_{k}}\right){< }{S}_{f}\left(Ku\right)\end{equation*}$

which is a contradiction. Thus, ${\mathrm{lim}}_{k\to \infty }\mathrm{T}\mathrm{V}\left({u}^{{n}_{k}}\right)=\mathrm{T}\mathrm{V}\left(u\right)$ . Finally, if u is the unique minimiser for (3) with discrepancy S_f, then uⁿ ⇀ u as n → ∞ for the whole sequence ( ${u}^{n}{\ast}{\rightharpoonup }u$ in case d = 1) as any subsequence has to contain another subsequence that converges weakly (weakly^*) to u.

In order to conclude the proof, suppose that lim inf_n→∞ F_n(uⁿ) < ∞. In that case, the above arguments yield an accumulation point as stated as well as a minimiser u ∈ BV(Ω) of F with F(u) ⩽ liminf_n→∞ F_n(uⁿ). In particular, F is proper. By convergence of ${S}_{{f}^{n}}$ to S_f and minimality, we have

$\begin{equation*}F\left(u\right){\geqslant}\underset{n\to \infty }{\text{lim sup}}\enspace {F}_{n}\left(u\right){\geqslant}\underset{n\to \infty }{\text{lim sup}}\enspace {F}_{n}\left({u}^{n}\right){\geqslant}F\left(u\right)\end{equation*}$

so the whole sequence of functional values converges.

Finally, in case F_n(uⁿ) → ∞ as n → ∞, F cannot be proper: otherwise, we obtain analogously to the above that ∞ > F(u) ⩾ lim sup_n→∞ F_n(u) ⩾ liminf_n→∞ F_n(uⁿ) for some u ∈ BV(Ω) which is a contradiction. □

Remark 2.15. The convergence of discrepancies as in (4) is related to gamma-convergence. Indeed, the difference is that, for the latter, on the right-hand side of the lim sup inequality, an arbitrary sequence converging to v is allowed (instead of the constant sequence). In this context, as can be seen in the proof of the stability result above, one could still weaken the lim sup-assumption in (4) by allowing not only the constant recovery sequence but any sequence for which the regularisation functional converges. However, in order to maintain an assumption on the discrepancy term that is independent of the choice of regularisation, we chose the slightly stronger condition.

Example 2.16.

A typical discrepancy is some power of the norm-distance in Y, i.e., ${S}_{f}\left(v\right)=\frac{1}{q}{{\Vert}v-f{\Vert}}_{Y}^{q}$ for some q ⩾ 1. It is easy to show that whenever fⁿ → f in Y, ${S}_{{f}^{n}}$ converges to S_f in the above sense. Also, the equi-coercivity of $\left\{{S}_{{f}^{n}}\right\}$ is immediate.
For the Kullback–Leibler divergence, let Y = L¹(Ω') for some Ω' and assume that {fⁿ}, f in L¹(Ω') are such that fⁿ ⩽ Cf a.e. in Ω' for some C > 0 and KL(f, fⁿ) → 0 as n → ∞. Then, it follows from lemma A.2 in the appendix that ${S}_{{f}^{n}}=\mathrm{K}\mathrm{L}\left(\cdot ,{f}^{n}\right)$ converges to S_f = KL(⋅, f), and also that ||fⁿ − f||₁ → 0. The latter in particular implies boundedness of {fⁿ} in L¹(Ω') which, together with the coercivity estimate of lemma A.1 shows that $\left\{{S}_{{f}^{n}}\right\}$ is equi-coercive.

In addition to well-posedness of the Tikhonov-functional minimisation, one is of course interested in regularisation results, i.e., the convergence of solutions to a minimum-TV-solution provided that the data converges and α → 0 in some sense. For this purpose, let u^† ∈ BV(Ω) be a minimum-TV-solution of Ku^† = f^† for some data f^† in Y, i.e., TV(u^†) ⩽ TV(u) for each Ku = f^†, suppose that for each δ > 0 one has given a f^δ ∈ Y such that ${S}_{{f}^{\delta }}\left({f}^{{\dagger}}\right){\leqslant}\delta$ , and denote by u^α,δ a solution of (3) for parameter α > 0 and data f^δ.

Theorem 2.17. In the situation of theorem 2.11, let the discrepancy functionals $\left\{{S}_{{f}^{\delta }}\right\}$ be equi-coercive and converge to ${S}_{{f}^{{\dagger}}}$ in the sense of (4) for some data f^† ∈ Y with ${S}_{{f}^{{\dagger}}}\left(v\right)=0$ if and only if v = f^†. Choose for each δ > 0 the parameter α > 0 such that

$\begin{equation*}\alpha \to 0,\quad \frac{\delta }{\alpha }\to 0\quad \text{as}\quad \delta \to 0.\end{equation*}$

Then, again up to constant shifts, {u^α,δ} has at least one weak accumulation point in L^d/(d−1)(Ω) (weak^* in case d = 1). Each such accumulation point is a minimum-TV-solution of Ku = f^† and lim_δ→0TV(u^α,δ) = TV(u^†).

Proof. Again we assume that ${\int }_{{\Omega}}$ u^α,δ dx = 0 for all (α, δ) if K 1 = 0. Using the optimality of u^α,δ for (3) compared to u^† gives

$\begin{equation*}{S}_{{f}^{\delta }}\left(K{u}^{\alpha ,\delta }\right)+\alpha \mathrm{T}\mathrm{V}\left({u}^{\alpha ,\delta }\right){\leqslant}\delta +\alpha \mathrm{T}\mathrm{V}\left({u}^{{\dagger}}\right).\end{equation*}$

Since α → 0 as δ → 0, we have that ${S}_{{f}^{\delta }}\left(K{u}^{\alpha ,\delta }\right)\to 0$ as δ → 0. Moreover, as also δ/α → 0, it follows that lim sup_δ→0TV(u^α,δ) ⩽ TV(u^†). This allows to conclude that {u^α,δ} is bounded in BV(Ω) and, by embedding, admits a weak accumulation point in L^d/(d−1)(Ω) (weak^* in case d = 1).

Next, let u* be such an accumulation point associated with {δ_n}, δ_n → 0 as well as the corresponding parameters {α_n}. Then, ${S}_{{f}^{{\dagger}}}\left(K{u}^{{\ast}}\right){\leqslant}{\text{lim inf}}_{n\to \infty }{S}_{{f}^{{\delta }_{n}}}\left(K{u}^{{\alpha }_{n},{\delta }_{n}}\right)=0$ , so Ku* = f^†. Moreover, $\mathrm{T}\mathrm{V}\left({u}^{{\ast}}\right){\leqslant}{\text{lim inf}}_{n\to \infty }\mathrm{T}\mathrm{V}\left({u}^{{\alpha }_{n},{\delta }_{n}}\right){\leqslant}\mathrm{T}\mathrm{V}\left({u}^{{\dagger}}\right)$ , hence u* is a minimum-TV-solution. In particular, TV(u*) = TV(u^†), so ${\mathrm{lim}}_{n\to \infty }\mathrm{T}\mathrm{V}\left({u}^{{\alpha }_{n},{\delta }_{n}}\right)=\mathrm{T}\mathrm{V}\left({u}^{{\dagger}}\right)$ .

Finally, each sequence of {δ_n}, δ_n → 0 contains another subsequence $\left\{{u}^{{\delta }_{n}}\right\}$ for which $\mathrm{T}\mathrm{V}\left({u}^{{\alpha }_{n},{\delta }_{n}}\right)\to \mathrm{T}\mathrm{V}\left({u}^{{\dagger}}\right)$ as n → ∞, so TV(u^α,δ) → TV(u^†) as δ → 0. □

Finally, if a respective source condition is satisfied, we can, under some circumstances, give rates for some Bregman distance with respect to TV associated with respect to a particular subgradient element [48]. Recall that the Bregman distance ${D}_{{x}^{{\ast}}}^{F}\left(y,x\right)$ of x, y ∈ X for a convex functional $F:X\to \enspace \left.\right]- \infty ,\infty \left.\right]$ and subgradient element x* ∈ ∂F(x) is given by

$\begin{equation*}{D}_{{x}^{{\ast}}}^{F}\left(y,x\right)=F\left(y\right)-F\left(x\right)-\langle {x}^{{\ast}},\enspace y-x\rangle .\end{equation*}$

The convergence rate results are then a consequence of the following proposition.

Proposition 2.18. In the situation of theorem 2.17, let K*w^† ∈ ∂TV(u^†) for some w^† ∈ Y*. Then,

$\begin{equation}{D}_{{K}^{{\ast}}{w}^{{\dagger}}}^{\mathrm{T}\mathrm{V}}\left({u}^{\alpha ,\delta },{u}^{{\dagger}}\right){\leqslant}\frac{1}{\alpha }\left({S}_{{f}^{\delta }}^{{\ast}}\left(\alpha {w}^{{\dagger}}\right)+{S}_{{f}^{\delta }}^{{\ast}}\left(-\alpha {w}^{{\dagger}}\right)+2\delta \right).\end{equation} \tag{ 5 }$

Proof. Using the minimality of u^α,δ yields ${S}_{{f}^{\delta }}\left(K{u}^{\alpha ,\delta }\right)+\alpha \mathrm{T}\mathrm{V}\left({u}^{\alpha ,\delta }\right){\leqslant}\alpha \mathrm{T}\mathrm{V}\left({u}^{{\dagger}}\right)+\delta$ . Rearranging, adding ⟨K*w^†, u^† − u^α,δ⟩ on both sides as well as using Fenchel's inequality twice yields

$\begin{align*}\hfill {S}_{{f}^{\delta }}\left(K{u}^{\alpha ,\delta }\right)+\alpha {D}_{{K}^{{\ast}}{w}^{{\dagger}}}^{\mathrm{T}\mathrm{V}}\left({u}^{\delta ,\alpha },{u}^{{\dagger}}\right)& {\leqslant}\alpha \langle {K}^{{\ast}}{w}^{{\dagger}},\enspace {u}^{{\dagger}}-{u}^{\alpha ,\delta }\rangle +\delta \hfill \\ \hfill & =\langle \alpha {w}^{{\dagger}},\enspace {f}^{{\dagger}}\rangle -\langle \alpha {w}^{{\dagger}},\enspace K{u}^{\alpha ,\delta }\rangle +\delta \hfill \\ \hfill & {\leqslant}{S}_{{f}^{\delta }}^{{\ast}}\left(\alpha {w}^{{\dagger}}\right)-\langle \alpha {w}^{{\dagger}},\enspace K{u}^{\alpha ,\delta }\rangle +2\delta \hfill \\ \hfill & {\leqslant}{S}_{{f}^{\delta }}^{{\ast}}\left(\alpha {w}^{{\dagger}}\right)+{S}_{{f}^{\delta }}^{{\ast}}\left(-\alpha {w}^{{\dagger}}\right)+{S}_{{f}^{\delta }}\left(K{u}^{\alpha ,\delta }\right)+2\delta .\hfill \end{align*}$

Subtracting ${S}_{{f}^{\delta }}\left(K{u}^{\alpha ,\delta }\right)$ and dividing by α gives the result. □

For well-known discrepancy terms, one easily gets parameter choice rules that lead to rates for ${D}_{{K}^{{\ast}}{w}^{{\dagger}}}^{\mathrm{T}\mathrm{V}}\left({u}^{\alpha ,\delta }\right)$ .

Example 2.19.

For ${S}_{{f}^{\delta }}\left(v\right)=\frac{1}{q}{{\Vert}v-{f}^{\delta }{\Vert}}_{Y}^{q}$ with q > 1, ${S}_{{f}^{\delta }}^{{\ast}}\left(w\right)=\frac{1}{{q}^{{\ast}}}{{\Vert}w{\Vert}}_{{Y}^{{\ast}}}^{{q}^{{\ast}}}+\langle {f}^{\delta },\enspace w\rangle$ where 1/q + 1/q* = 1, hence (5) reads as
$\begin{equation*}{D}_{{K}^{{\ast}}{w}^{{\dagger}}}^{\mathrm{T}\mathrm{V}}\left({u}^{\alpha ,\delta },{u}^{{\dagger}}\right){\leqslant}\frac{2{\alpha }^{{q}^{{\ast}}-1}}{{q}^{{\ast}}}{{\Vert}{w}^{{\dagger}}{\Vert}}_{{Y}^{{\ast}}}^{{q}^{{\ast}}}+\frac{2\delta }{\alpha }.\end{equation*}$
In the non-trivial case of w^† ≠ 0, the right-hand side becomes minimal for $\alpha ={{\Vert}{w}^{{\dagger}}{\Vert}}_{{Y}^{{\ast}}}^{-1}{\left(\frac{{q}^{{\ast}}}{{q}^{{\ast}}-1}\right)}^{1/{q}^{{\ast}}}{\delta }^{1/{q}^{{\ast}}}$ giving the well-known rate of $\mathcal{O}\left({\delta }^{1/q}\right)=\mathcal{O}\left({{\Vert}{f}^{\delta }-{f}^{{\dagger}}{\Vert}}_{Y}\right)$ for the Bregman distance.
For the Kullback–Leibler discrepancy, i.e., ${S}_{{f}^{\delta }}\left(v\right)=\mathrm{K}\mathrm{L}\left(v,{f}^{\delta }\right)$ on L¹(Ω'), a direct, pointwise computation shows that the dual functional obeys ${S}_{{f}^{\delta }}^{{\ast}}\left(w\right)+{S}_{{f}^{\delta }}^{{\ast}}\left(-w\right)={\int }_{{{\Omega}}^{\prime }}-{f}^{\delta }\enspace \mathrm{log}\left(1-{w}^{2}\right)\enspace \mathrm{d}x$ if $\left\vert w\right\vert {\leqslant}1$ almost everywhere, setting −t log(0) = ∞ for t > 0 and −0 log(0) = 0, and ${S}_{{f}^{\delta }}^{{\ast}}\left(w\right)+{S}_{{f}^{\delta }}^{{\ast}}\left(-w\right)=\infty$ else. As w^† ∈ L^∞(Ω'), we may choose α > 0 such that $\alpha {{\Vert}{w}^{{\dagger}}{\Vert}}_{\infty }{\leqslant}\frac{1}{\sqrt{2}}$ . Then, the equivalence
$\begin{equation*}{\alpha }^{2}{\int }_{{{\Omega}}^{\prime }}{f}^{\delta }{\left({w}^{{\dagger}}\right)}^{2}\enspace \mathrm{d}x{\leqslant}-{\int }_{{{\Omega}}^{\prime }}{f}^{\delta }\enspace \mathrm{log}\left(1-{\alpha }^{2}{\left({w}^{{\dagger}}\right)}^{2}\right)\enspace \mathrm{d}x{\leqslant}{\alpha }^{2}2\enspace \mathrm{log}\left(2\right){\int }_{{{\Omega}}^{\prime }}{f}^{\delta }{\left({w}^{{\dagger}}\right)}^{2}\enspace \mathrm{d}x\end{equation*}$
holds. Assuming ${\int }_{{{\Omega}}^{\prime }}{f}^{{\dagger}}{\left({w}^{{\dagger}}\right)}^{2}\enspace \mathrm{d}x{ >}0$ , the weak convergence f^δ ⇀ f in L¹(Ω') (see lemma A.2) implies ${S}_{{f}^{\delta }}^{{\ast}}\left(\alpha {w}^{{\dagger}}\right)+{S}_{{f}^{\delta }}^{{\ast}}\left(-\alpha {w}^{{\dagger}}\right)\sim {\alpha }^{2}$ independent from δ. Hence, choosing $\alpha \sim \sqrt{\delta }$ yields the rate $\mathcal{O}\left(\sqrt{\delta }\right)$ for the Bregman distance as δ → 0.

2.3. Further first-order approaches

Besides these functional-analytic properties, functions of bounded variation admit interesting structural and fine properties. Let us briefly discuss the structure of the gradient ∇u for a u ∈ BV(Ω). By Lebesgue's decomposition theorem, ∇u can be split into an absolutely continuous part ∇^a u with respect to the Lebesgue measure and a singular part ∇^s u. We tacitly identify ∇^a u with the Radon–Nikodým derivative, i.e., ∇^a u ∈ L¹(Ω, R^d) via the measure ${\nabla }^{a}u{\mathcal{L}}^{d}$ .

The singular part ∇^s u therefore has to capture the jump discontinuities of u. Indeed, introducing the jump set, it can further be decomposed. Recall that a u ∈ L¹(Ω) is almost everywhere approximately continuous, i.e., for almost every x ∈ Ω there exists a z ∈ R such that

$\begin{equation*}\underset{r\to 0}{\mathrm{lim}}- {\int }_{{B}_{r}\left(x\right)}\left\vert u\left(y\right)-z\right\vert \enspace \mathrm{d}y=0.\end{equation*}$

The collection of all points S_u for which u is not approximately continuous is called the discontinuity set of u.

Definition 2.20. Let $u\in {L}_{\mathrm{l}\mathrm{o}\mathrm{c}}^{1}\left({\Omega}\right)$ and x ∈ Ω.

(a)
The function u is called approximately differentiable in x if there exists a v ∈ R^d such that
$\begin{equation*}\underset{r\to 0}{\mathrm{lim}}\frac{1}{r}- {\int }_{{\Omega}}\left\vert u\left(y\right)-u\left(x\right)-v\cdot \left(y-x\right)\right\vert \enspace \mathrm{d}y=0.\end{equation*}$
The vector ∇^≈ u(x) = v is called the approximate gradient of u at x.
(b)
The point x is an approximate jump point of u if there exist u⁺(x) > u⁻(x) and a ν ∈ R^d, $\left\vert \nu \right\vert =1$ such that
$\begin{equation*}\underset{r\to 0}{\mathrm{lim}}- {\int }_{{B}_{r}^{+}\left(x,\nu \right)}\left\vert u\left(y\right)-{u}^{+}\left(x\right)\right\vert \enspace \mathrm{d}y=0,\quad \underset{r\to 0}{\mathrm{lim}}- {\int }_{{B}_{r}^{-}\left(x,\nu \right)}\left\vert u\left(y\right)-{u}^{-}\left(x\right)\right\vert \enspace \mathrm{d}y=0\end{equation*}$
where ${B}_{r}^{+}\left(x,\nu \right)$ and ${B}_{r}^{-}\left(x,\nu \right)$ are balls cut by the hyperplane perpendicular to ν and containing x, i.e.,
$\begin{equation*}\begin{aligned}\hfill {B}_{r}^{+}\left(x,\nu \right)& =\left\{y\in {\mathbf{R}}^{d}\enspace \vert \enspace \left\vert y-x\right\vert {< }r,\enspace \left(y-x\right)\cdot \nu { >}0\right\},\hfill \\ \hfill {B}_{r}^{-}\left(x,\nu \right)& =\left\{y\in {\mathbf{R}}^{d}\enspace \vert \enspace \left\vert y-x\right\vert {< }r,\enspace \left(y-x\right)\cdot \nu {< }0\right\}.\hfill \end{aligned}\end{equation*}$
The set J_u of all approximate jump points is called the jump set of u.

Theorem 2.21. ([7]). Let u ∈ BV(Ω). Then,

(a)
u is almost everywhere approximately differentiable with ∇^a u = ∇^≈ u in L¹(Ω, R^d).
(b)
The jump set satisfies ${\mathcal{H}}^{d-1}\left({S}_{u}{\backslash}{J}_{u}\right)=0$ and we have $\nabla u\enspace \llcorner \enspace {J}_{u}=\left({u}^{+}-{u}^{-}\right){\nu }_{u}{\mathcal{H}}^{d-1}$ .
(c)
The restriction ∇u ⌞ (Ω\S_u) is absolutely continuous with respect to ${\mathcal{H}}^{d-1}$ .

In particular, the involved sets and functions are Borel sets and functions, respectively.

Denoting by

$\begin{equation*}{\nabla }^{j}u={\nabla }^{s}u\enspace \llcorner \enspace {J}_{u},\quad {\nabla }^{c}u={\nabla }^{s}u\enspace \llcorner \enspace \left({\Omega}{\backslash}{S}_{u}\right)\end{equation*}$

where ∇^j u and ∇^c u is the jump and Cantor part of ∇u, respectively, the gradient of a u ∈ BV(Ω) can be decomposed into

$\begin{equation}\nabla u={\nabla }^{a}u{\mathcal{L}}^{d}+\left({u}^{+}-{u}^{-}\right){\nu }_{u}{\mathcal{H}}^{d-1}\enspace \llcorner \enspace {J}_{u}+{\nabla }^{c}u\end{equation} \tag{ 6 }$

with ∇^c u being singular with respect to ${\mathcal{L}}^{d}$ and absolutely continuous with respect to ${\mathcal{H}}^{d-1}$ .

This construction allows in particular to define penalties beyond the total variation seminorm (see, for instance [7, section 5.5]). Letting g : R^d → [0, ∞] a proper, convex and lower semi-continuous function and g_∞ be given according to

$\begin{equation*}{g}_{\infty }\left(x\right)=\underset{t\to \infty }{\mathrm{lim}}\frac{g\left(tx\right)}{t}\end{equation*}$

with ∞ allowed, then the functional

$\begin{equation}{\mathcal{R}}_{g}\left(u\right)={\int }_{{\Omega}}g\left({\nabla }^{a}u\right)\enspace \mathrm{d}x+{\int }_{{J}_{u}}\left({u}^{+}-{u}^{-}\right){g}_{\infty }\left({\nu }_{u}\right)\enspace \mathrm{d}{\mathcal{H}}^{d-1}+{\int }_{{\Omega}}{g}_{\infty }\left({\sigma }_{{\nabla }^{c}u}\right)\enspace \mathrm{d}\left\vert {\nabla }^{c}u\right\vert \end{equation} \tag{ 7 }$

where ${\sigma }_{{\nabla }^{c}u}$ is the sign of ∇^c u, i.e., ${\nabla }^{c}u={\sigma }_{{\nabla }^{c}u}\left\vert {\nabla }^{c}u\right\vert$ , is proper, convex and lower semi-continuous on BV(Ω). With the Fenchel-dual functional, i.e., ${g}^{{\ast}}\left(y\right)=\underset{x\in {\mathbf{R}}^{d}}{\mathrm{sup}}\enspace x\cdot y-g\left(x\right)$ , it can also be expressed in (pre-)dual form as

$\begin{equation*}{\mathcal{R}}_{g}\left(u\right)=\mathrm{sup}\enspace \left\{{\int }_{{\Omega}}u\enspace \mathrm{d}\mathrm{i}\mathrm{v}\enspace \varphi -{g}^{{\ast}}\left(\varphi \right)\enspace \mathrm{d}x\enspace \vert \enspace \varphi \in {\mathcal{C}}_{\mathrm{c}}^{\infty }\left({\Omega},{\mathbf{R}}^{d}\right)\right\}.\end{equation*}$

Obviously, the usual TV-case corresponds to g being the Euclidean norm on R^d. Also, g_∞(x) = ∞ for some $\left\vert x\right\vert =1$ does not allow jumps in the direction of x, so one usually assumes that g_∞(x) < ∞ for each $\left\vert x\right\vert =1$ in order to obtain a genuine penalty in BV(Ω). In addition, if there are c₀ > 0 and R > 0 such that $g\left(x\right){\geqslant}{c}_{0}\left\vert x\right\vert$ for each $\left\vert x\right\vert {\geqslant}R$ , then there is a constant C > 0 such that

$\begin{equation*}{\mathcal{R}}_{g}\left(u\right){\geqslant}{c}_{0}\mathrm{T}\mathrm{V}\left(u\right)-C\end{equation*}$

for all u ∈ BV(Ω), i.e., ${\mathcal{R}}_{g}$ is as coercive as TV. Consequently, the well-posedness and convergence statements in theorems 2.11, 2.14 and 2.17 as well as in proposition 2.18 can be adapted to ${\mathcal{R}}_{g}$ in a straightforward manner with the proofs following the same line of argumentation.

Example 2.22. There are several possibilities for replacing the non-differentiable norm function $\left\vert \cdot \right\vert$ in the TV-functional by a smooth approximation in 0.

Choosing a ɛ > 0, consider

$\begin{equation*}{g}_{\varepsilon }^{1}\left(x\right)=\left\{\begin{array}{cc}\frac{1}{2\varepsilon }{\left\vert x\right\vert }^{2}\hfill & \text{for}\;\left\vert x\right\vert {\leqslant}\varepsilon ,\hfill \\ \left\vert x\right\vert -\frac{\varepsilon }{2}\hfill & \text{else},\hfill \end{array}\right.\quad {g}_{\varepsilon }^{2}\left(x\right)=\sqrt{{\left\vert x\right\vert }^{2}+{\varepsilon }^{2}}-\varepsilon ,\end{equation*}$

both being continuously differentiable in R^d and approximating $\left\vert \cdot \right\vert$ for ɛ → 0.

The associated penalties ${\mathcal{R}}_{{g}_{\varepsilon }^{1}}$ and ${\mathcal{R}}_{{g}_{\varepsilon }^{2}}$ are often referred to as Huber-TV and smooth TV, respectively.

Example 2.23. Taking g as a non-Euclidean norm on R^d yields functionals of anisotropic total-variation type. The common choice is g = |⋅|₁ which is also often referred to as anisotropic TV.

Remark 2.24. It is worth noting that g as above can also be made spatially dependent, which has applications for instance in the context of regularisation for inverse problems involving multiple modalities or multiple spectra. Under some assumptions, functionals ${\mathcal{R}}_{g}$ as in (7) with spatially dependent g are again lower semi-continuous on BV [6] and well-posedness results for TV apply [105].

2.4. Colour and multichannel images

Colour and multichannel images are usually represented by functions mapping into a vector-space. Total-variation functionals and regularisation approaches can easily be extended to such vector-valued functions; definition 2.5 already contains an isotropic variant for functions with values in a finite-dimensional space H, where we used the Hilbert-space norm $\vert x\vert ={\left({\sum }_{i=1}^{d}{x}_{i}\cdot {x}_{i}\right)}^{1/2}$ as pointwise norm on H^d for the test functions $\varphi \in {\mathcal{C}}_{\mathrm{c}}^{\infty }\left({\Omega},{H}^{d}\right)$ .

However, in contrast to the scalar case, this is not the only choice yielding TV-functionals that are invariant under distance-preserving transformations. The essential property for a norm |⋅|_◦ on H^d needed for the latter is

$\begin{equation*}\vert Ox{\vert }_{{\circ}}=\vert x{\vert }_{{\circ}}\quad \text{for}\;\text{all}\quad x\in {H}^{d}\quad \text{and}\quad O\in {\mathbf{R}}^{d{\times}d},\quad {O}^{{\ast}}O=\mathrm{i}\mathrm{d}\end{equation*}$

where ${\left(Ox\right)}_{i}={\sum }_{j=1}^{d}{o}_{ij}{x}_{j}$ . We call such norms unitarily left invariant. Denoting by |⋅|_* the dual norm, the associated total variation for a $u\in {L}_{\mathrm{l}\mathrm{o}\mathrm{c}}^{1}\left({\Omega},H\right)$ is given by

$\begin{equation*}\mathrm{T}\mathrm{V}\left(u\right)=\mathrm{sup}\enspace \left\{{\int }_{{\Omega}}u\cdot \mathrm{d}\mathrm{i}\mathrm{v}\enspace \varphi \enspace \mathrm{d}x\enspace \vert \enspace \varphi \in {\mathcal{C}}_{\mathrm{c}}^{\infty }\left({\Omega},{H}^{d}\right),\enspace \vert \varphi \left(x\right){\vert }_{{\ast}}{\leqslant}1\quad \forall x\in {\Omega}\right\}\end{equation*}$

and invariant to distance-preserving transformations. If the norm |⋅|_◦ is moreover unitarily right invariant, i.e.,

$\begin{equation*}\vert xO{\vert }_{{\circ}}=\vert x{\vert }_{{\circ}}\quad \text{for}\enspace \text{all}\quad x\in {H}^{d},\quad O:H\to H\quad \text{unitary}\end{equation*}$

where ${\left(xO\right)}_{i}={\left(O\left(x\right)\right)}_{i}$ , then it can be written as a unitarily invariant matrix norm and hence |x|_◦ only depends on the singular values of the mapping associated with x in a permutation- and sign-invariant manner. More precisely, there exists a norm |⋅|_Σ on R^d with |Pσ|_Σ = |σ|_Σ for all σ ∈ R^d and P ∈ R^d×d with $\left\vert P\right\vert$ being a permutation matrix, such that |x|_◦ = |σ|_Σ for all x ∈ H^d, where σ are the singular values of the mapping H^d → R^d given by $y{\mapsto}{\left({x}_{i}\cdot y\right)}_{i}$ . Conversely, any such norm on R^d induces a unitarily invariant matrix norm. A common choice are the norms generated by the p-vector norm, the Schatten- p -norms. For p = 1, p = 2 and p = ∞, they correspond to the nuclear norm, the Frobenius norm and the usual spectral norm, respectively, all of which have been proposed in the existing literature to use in conjunction with TV, see, e.g., [79, 164]. Among those possibilities, the nuclear norm appears particularly attractive as it provides a relaxation of the rank functional [156]. Hence, solutions with low-rank gradients and more pronounced edges can be expected from nuclear-norm-TV regularisation.

Also here, the well-posedness and convergence results in theorems 2.11, 2.14 and 2.17 as well as in proposition 2.18 are transferable to the vector-valued case, as can be seen from equivalence of norms.

Moreover, functionals of the type (7) are possible with g : H^d → [0, ∞] proper, convex and lower semi-continuous such that g_∞ exists. However, u takes values in H which calls for some adaptations which we briefly describe in the following. First, concerning definition 2.20(a), we are able to generalise in a straightforward way by considering v ∈ H^d, the norm in H and the scalar product in H^d such that the approximate gradient of u at x is ∇^≈ u(x) ∈ H^d. For jump points according to (b), we are no longer able to require u⁺(x) > u⁻(x) such that we have to replace this by u⁺(x) ≠ u⁻(x) and arrive at a meaningful definition replacing the absolute value by the norm in H. However, u⁺, u⁻ and ν are then only unique up to a sign. Nevertheless, (u⁺ − u⁻) ⊗ ν according to ${\left(\left({u}^{+}-{u}^{-}\right)\otimes \nu \right)}_{i}=\left({u}^{+}-{u}^{-}\right){\nu }_{i}$ is still unique. The analogue of theorem 2.21 and (6) holds with these notions, with the following adaptation:

$\begin{equation*}\nabla u={\nabla }^{a}u{\mathcal{L}}^{d}+\left({u}^{+}-{u}^{-}\right)\otimes {\nu }_{u}{\mathcal{H}}^{d-1}\enspace \llcorner \enspace {J}_{u}+{\nabla }^{c}u\end{equation*}$

with the Cantor part being of rank one, i.e., ${\nabla }^{c}u={\sigma }_{{\nabla }^{c}u}\left\vert {\nabla }^{c}u\right\vert$ where ${\sigma }_{{\nabla }^{c}u}$ is rank one $\left\vert {\nabla }^{c}u\right\vert$ -almost everywhere [7, theorem 3.94]. The functional ${\mathcal{R}}_{g}$ according to

$\begin{equation*}{\mathcal{R}}_{g}\left(u\right)={\int }_{{\Omega}}g\left({\nabla }^{a}u\right)\enspace \mathrm{d}x+{\int }_{{J}_{u}}{g}_{\infty }\left(\left({u}^{+}-{u}^{-}\right)\otimes {\nu }_{u}\right)\enspace \mathrm{d}{\mathcal{H}}^{d-1}+{\int }_{{\Omega}}{g}_{\infty }\left({\sigma }_{{\nabla }^{c}u}\right)\enspace \mathrm{d}\left\vert {\nabla }^{c}u\right\vert \end{equation*}$

then realises a regulariser with the same regularisation properties as its counterpart for scalar functions.

3. Higher-order TV regularisation

First-order regularisation for imaging problems might not always lead to results of sufficient quality. Recall that taking the total variation as regularisation functional has the advantage that the solution space BV(Ω) naturally allows for discontinuities along hypersurfaces ('jumps') which correspond, for imaging applications, to object boundaries. Indeed, TV has a good performance in edge preservation which can also be observed numerically.

However, for noisy data, the solutions suffer from non-flat regions appearing flat in conjunction with the introduction of undesired edges. This effect is called the staircasing effect, see figure 1, in particular panel (c). Thinking of TV as a one-norm type penalty for the gradient, this is, on the one hand, due to the 'linear growth' of the Euclidean norm $\left\vert \cdot \right\vert$ at infinity (which implies BV(Ω) as solution space). On the other hand, $\left\vert \cdot \right\vert$ is non-differentiable in 0 which can be seen to be responsible for the flat regions in the solutions.

As we have seen in subsection 2.3, the latter can be remedied by considering convex functions of the measure ∇u instead of TV which are smooth in the origin and have linear growth at ∞, also see example 2.22. Then, ${\mathcal{R}}_{g}$ can be taken as a first-order regulariser under the same conditions as for TV regularisation leading to solutions which are still in BV(Ω) and may, in particular, admit jumps. Additionally, less flat regions tend to appear in solutions for noisy data as we no longer have a singularity at 0. However, this feature comes with two drawbacks: first, compared to TV, noise removal seems not to be so strong in numerical solutions. Second, in addition to the regularisation parameter for the inverse problem, one has to choose the parameter ɛ appropriately. A too small choice might again lead to staircasing to appear while choosing ɛ too big may lead to edges being lost, see figure 1(d). The question remains whether we can improve on this.

Here, we like to discuss and study the use of higher-order derivatives for regularisation in imaging. This can be motivated by modelling images as piecewise smooth functions, i.e., assuming that an image is several times differentiable (in some sense) while still allowing for object boundaries where the function may jump. With this model in mind, higher-order variational approaches arise quite naturally and we refer for instance to [17, 73, 104] for spaces and regularisation approaches related to second-order variational approaches.

3.1. Symmetric tensor calculus

For smooth functions, higher-order derivatives can be represented as tensor fields, i.e., the derivative represents a tensor in each point. As the order of partial differentiation might be interchanged, these tensors turn out to be symmetric. Symmetric tensors are therefore a suitable tool for representing these objects independent from indices. There are several ways to introduce and motivate tensors and vector spaces of tensors. For our purposes, the following definition will be sufficient. Note that there and throughout this chapter, l ⩾ 0 will always be a tensor order.

**Figure 1.** First-order denoising example. (a) Ground truth, (b) noisy image with additive Gaussian noise (PSNR: 13.9 dB), (c) TV-regularised solution (PSNR: 29.3 dB), (d) regularisation with smooth TV-like penalty $\varphi \left(x\right)=\sqrt{{x}^{2}+{\varepsilon }^{2}}-\varepsilon$ (PSNR: 29.7 dB). All parameters were manually tuned via grid search to yield highest PSNR.
Download figure:
Standard image High-resolution image

$\varphi \left(x\right)=\sqrt{{x}^{2}+{\varepsilon }^{2}}-\varepsilon $ — **Figure 1.** First-order denoising example. (a) Ground truth, (b) noisy image with additive Gaussian noise (PSNR: 13.9 dB), (c) TV-regularised solution (PSNR: 29.3 dB), (d) regularisation with smooth TV-like penalty $\varphi \left(x\right)=\sqrt{{x}^{2}+{\varepsilon }^{2}}-\varepsilon$ (PSNR: 29.7 dB). All parameters were manually tuned via grid search to yield highest PSNR.
Download figure:
Standard image High-resolution image

Definition 3.1. We define

$\begin{equation*}\begin{matrix}\hfill {\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)=\left\{\xi :\underset{l\enspace \mathrm{t}\mathrm{i}\mathrm{m}\mathrm{e}\mathrm{s}}{\underbrace{{\mathbf{R}}^{d}{\times}\cdots {\times}{\mathbf{R}}^{d}}}\to \mathbf{R}\enspace \vert \enspace \xi \enspace l-\text{linear}\right\},\hfill \\ \hfill {\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)=\left\{\xi :\underset{l\enspace \mathrm{t}\mathrm{i}\mathrm{m}\mathrm{e}\mathrm{s}}{\underbrace{{\mathbf{R}}^{d}{\times}\cdots {\times}{\mathbf{R}}^{d}}}\to \mathbf{R}\enspace \vert \enspace \xi \enspace l-\text{linear}\;\text{and}\;\text{symmetric}\right\},\hfill \end{matrix}\end{equation*}$

as the vector space of l -tensors and symmetric l -tensors, respectively.

Here, $\xi \in {\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)$ is called symmetric, if ξ(a₁, ..., a_l) = ξ(a_π(1), ..., a_π(l)) for all a₁, ..., a_l ∈ R^d and π ∈ S_l, where S_l denotes the permutation group of {1, ..., l}.

For $\xi \in {\mathcal{T}}^{k}\left({\mathbf{R}}^{d}\right)$ , k ⩾ 0 and $\eta \in {\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)$ the tensor product is defined as the element $\xi \otimes \eta \in {\mathcal{T}}^{k+l}\left({\mathbf{R}}^{d}\right)$ obeying

$\begin{equation*}\left(\xi \otimes \eta \right)\left({a}_{1},\dots ,{a}_{k+l}\right)=\xi \left({a}_{1},\dots ,{a}_{k}\right)\eta \left({a}_{k+1},\dots ,{a}_{k+l}\right)\end{equation*}$

for all a₁, ..., a_k+l ∈ R^d.

Note that the space of l-tensors is actually the space of (0, l)-covariant tensors, however, we will not need to distinguish between co- and contravariant tensors. We have

$\begin{equation*}{\mathcal{T}}^{0}\left({\mathbf{R}}^{d}\right)\equiv \mathbf{R},\quad {\mathcal{T}}^{1}\left({\mathbf{R}}^{d}\right)\equiv {\mathbf{R}}^{d},\dots ,{\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)\equiv {\mathbf{R}}^{d{\times}\cdots {\times}d},\end{equation*}$

while for low orders, the symmetric tensor spaces coincide with well-known spaces Sym⁰(R^d) ≡ R, Sym¹(R^d) ≡ R^d and Sym²(R^d) ≡ S^d×d, the space of symmetric d × d matrices.

In the following, we give a brief overview of the tensor operations that are the most relevant to define regularisation functionals on higher-order derivatives.

Remark 3.2. The space ${\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)$ can be associated with a unit basis. Indexed by p ∈ {1, ..., d}^l, its elements are given by ${e}_{p}\left({a}_{1},\dots ,{a}_{l}\right)={\prod }_{i=1}^{l}{a}_{i,{p}_{i}}$ while the respective coefficient for a $\xi \in {\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)$ is given by ${\xi }_{p}=\xi \left({e}_{{p}_{1}},\dots ,{e}_{{p}_{l}}\right)$ . Each $\xi \in {\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)$ thus has the representation

$\begin{equation*}\xi \left({a}_{1},\dots ,{a}_{l}\right)=\sum _{p\in {\left\{1,\dots ,d\right\}}^{l}}{\xi }_{p}{e}_{p}\left({a}_{1},\dots ,{a}_{l}\right).\end{equation*}$

The identity of vector spaces ${\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)={\mathbf{R}}^{d{\times}\cdots {\times}d}$ is evident from that.

The space Sym^l(R^d) is obviously a (generally proper) subspace of ${\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)$ . A (non-symmetric) tensor $\xi \in {\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)$ can be symmetrised by averaging over all permuted arguments, i.e.,

$\begin{equation*}\left(\vert \vert \vert \xi \right)\left({a}_{1},\dots ,{a}_{l}\right)=\frac{1}{l!}\sum _{\pi \in {S}_{l}}\xi \left({a}_{\pi \left(1\right)},\dots ,{a}_{\pi \left(l\right)}\right).\end{equation*}$

The symmetrisation operator $\vert \vert \vert :{\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)\to {\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)$ obviously defines a projection. A basis for Sym^l(R^d) is given by ${\mathrm{e}}_{p}^{\mathrm{S}\mathrm{y}\mathrm{m}}=\vert \vert \vert {e}_{p}$ for p ranging over all tuples in {1, ..., d}^l with non-decreasing entries. The coefficients ξ_p can still be obtained by ${\xi }_{p}=\xi \left({e}_{{p}_{1}},\dots ,{e}_{{p}_{l}}\right)$ .

We would like to equip the spaces with a Hilbert space structure.

Definition 3.3. For $\xi ,\eta \in {\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)$ , the scalar product and Frobenius norm are defined as

$\begin{equation*}\xi \cdot \eta =\sum _{p\in {\left\{1,\dots ,d\right\}}^{l}}\xi \left({e}_{{p}_{1}},\dots ,{e}_{{p}_{l}}\right)\eta \left({e}_{{p}_{1}},\dots ,{e}_{{p}_{l}}\right),\quad \left\vert \xi \right\vert =\sqrt{\xi \cdot \xi }.\end{equation*}$

Example 3.4. For ξ ∈ Sym^l(R^d), the norm corresponds to the absolute value for l = 0, the Euclidean norm in R^d for l = 1 and in case l = 2, we can identify ξ ∈ Sym²(R^d) with

$\begin{equation*}\xi =\left(\begin{matrix}\hfill {\xi }_{11}\hfill & \hfill \cdots \hfill & \hfill {\xi }_{1d}\hfill \\ \hfill {\vdots}\hfill & \hfill \ddots \hfill & \hfill {\vdots}\hfill \\ \hfill {\xi }_{1d}\hfill & \hfill \cdots \hfill & \hfill {\xi }_{dd}\hfill \end{matrix}\right),\quad \left\vert \xi \right\vert ={\left(\sum _{i=1}^{d}{\xi }_{ii}^{2}+2\sum _{i{< }j}{\xi }_{ij}^{2}\right)}^{1/2}.\end{equation*}$

With the Frobenius norm, tensor spaces become Hilbert spaces of finite dimension and the symmetrisation becomes an orthogonal projection, see, e.g., [99].

Proposition 3.5.

(a)
With the above scalar-product and norm, the spaces ${\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)$ , Sym^l(R^d) are finite-dimensional Hilbert spaces with $\mathrm{dim}{\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)={d}^{l}$ and $\mathrm{dim}{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)=\left(\genfrac{}{}{0pt}{}{d+l-1}{l}\right)$ .
(b)
The symmetrisation | | | is the orthogonal projection in ${\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)$ onto Sym^l(R^d).

Tensor-valued mappings ${\Omega}\to {\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)$ on the domain Ω ⊂ R^d are called tensor fields. The tensor-field spaces $\mathcal{C}\left(\overline{{\Omega}},{\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ , ${\mathcal{C}}_{\mathrm{c}}\left(\overline{{\Omega}},{\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ and ${\mathcal{C}}_{0}\left(\overline{{\Omega}},{\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ as well as the Lebesgue spaces ${L}^{p}\left({\Omega},{\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ are then given in the usual manner. Also, measures can be tensor-valued, giving $\mathcal{M}\left({\Omega},{\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ , the space of l-tensor-valued Radon measures. Duality according to proposition 2.4 holds, i.e., $\mathcal{M}\left({\Omega},{\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)\right)={\mathcal{C}}_{0}{\left({\Omega},{\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)\right)}^{{\ast}}$ . Note that for all spaces, the Frobenius norm is used as pointwise norm in the respective definitions of the tensor-field norm. Furthermore, all the above applies analogously to symmetric tensor fields, i.e., mappings between Ω → Sym^l(R^d).

Turning to differentiation, the kth Fréchet derivative of a sufficiently smooth l-tensor field, where from now on k ⩾ 1 will always denote an order of differentiation, is naturally a (k + l)-tensor field which we denote by ${\nabla }^{k}\otimes u:{\Omega}\to {\mathcal{T}}^{k+l}\left({\mathbf{R}}^{d}\right)$ according to

$\begin{equation*}\left({\nabla }^{k}\otimes u\right)\left(x\right)\left({a}_{1},\dots ,{a}_{k+l}\right)=\left({\mathrm{D}}^{k}u\left(x\right)\left({a}_{1},\dots ,{a}_{k}\right)\right)\left({a}_{k+1},\dots ,{a}_{k+l}\right).\end{equation*}$

The fact that gradient tensor-fields are not symmetric in general gives rise to consider the k th symmetrised derivative given by ${\mathcal{E}}^{k}u=\vert \vert \vert {\nabla }^{k}\otimes u$ . This definition is consistent as ${\mathcal{E}}^{{k}_{2}}{\mathcal{E}}^{{k}_{1}}={\mathcal{E}}^{{k}_{1}+{k}_{2}}$ for k₁, k₂ ⩾ 0. Divergence operators are then, up to the sign, formal adjoints of these differentiation operators. They are given as follows. Introducing the trace of a tensor $\xi \in {\mathcal{T}}^{l+2}\left({\mathbf{R}}^{d}\right)$ according to

$\begin{equation*}\mathrm{t}\mathrm{r}\left(\xi \right)\left({a}_{1},\dots ,{a}_{l}\right)=\sum _{i=1}^{d}\xi \left({e}_{i},{a}_{1},\dots ,{a}_{l},{e}_{i}\right)\end{equation*}$

gives an l-tensor. It can be interpreted as the tensor contraction of the first and the last component of the tensor. As for the vector-field case, the divergence is now the trace of the derivative. For k-times differentiable $v:{\Omega}\to {\mathcal{T}}^{k+l}\left({\mathbf{R}}^{d}\right)$ , the kth divergence is thus given by

$\begin{equation*}{\mathrm{d}\mathrm{i}\mathrm{v}}^{k}v={\mathrm{t}\mathrm{r}}^{k}\left({\nabla }^{k}\otimes v\right).\end{equation*}$

Again, this is consistent with repeated application, i.e., ${\mathrm{d}\mathrm{i}\mathrm{v}}^{{k}_{1}+{k}_{2}}={\mathrm{d}\mathrm{i}\mathrm{v}}^{{k}_{2}}{\mathrm{d}\mathrm{i}\mathrm{v}}^{{k}_{1}}$ . Note that there might be other choices of the divergence, such as contracting the derivative with any other than the last components of the tensor. This affects, however, only non-symmetric tensor fields. For symmetric tensor fields, the result is independent from the choice of the contraction components and always a symmetric tensor field.

Example 3.6. The symmetrised gradient of scalar functions Ω → Sym⁰(R^d) coincides with the usual gradient while the divergence for mappings Ω → Sym¹(R^d) coincides with the usual divergence.

The cases ${\mathcal{E}}^{2}{u}^{0}$ and $\mathcal{E}{u}^{1}$ for u⁰ : Ω → Sym⁰(R^d) and u¹ : Ω → Sym¹(R^d) can be handled with the identification of Sym²(R^d) and symmetric matrices S^d×d:

$\begin{equation*}{\left({\mathcal{E}}^{2}{u}^{0}\right)}_{ij}=\frac{{\partial }^{2}{u}^{0}}{\partial {x}_{i}\partial {x}_{j}},\quad {\left(\mathcal{E}{u}^{1}\right)}_{ij}=\frac{1}{2}\left(\frac{\partial {u}_{i}^{1}}{\partial {x}_{j}}+\frac{\partial {u}_{j}^{1}}{\partial {x}_{i}}\right).\end{equation*}$

Analogously, for the divergence of a v : Ω → Sym²(R^d), we have that

$\begin{equation*}{\left(\mathrm{d}\mathrm{i}\mathrm{v}\enspace v\right)}_{i}=\sum _{j=1}^{d}\frac{\partial {v}_{ij}}{\partial {x}_{j}},\quad {\mathrm{d}\mathrm{i}\mathrm{v}}^{2}v=\sum _{i=1}^{d}\frac{{\partial }^{2}{v}_{ii}}{\partial {x}_{i}^{2}}+\sum _{i{< }j}2\frac{{\partial }^{2}{v}_{ij}}{\partial {x}_{i}\partial {x}_{j}}.\end{equation*}$

In particular, for k ⩾ 1, there are the usual spaces of continuously differentiable tensor fields which are denoted by ${\mathcal{C}}^{k}\left(\overline{{\Omega}},{\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ and equipped with the usual norm ${{\Vert}u{\Vert}}_{k,\infty }={\mathrm{max}}_{0{\leqslant}m{\leqslant}k}\enspace {{\Vert}{\nabla }^{m}\otimes u{\Vert}}_{\infty }$ . Likewise, we consider k-times continuously differentiable tensor fields with compact support ${\mathcal{C}}_{\mathrm{c}}^{k}\left({\Omega},{\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ where k = ∞ leads to the space of test tensor fields. Also, for finite k, the space ${\mathcal{C}}_{0}^{k}\left({\Omega},{\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ is given as the closure of ${\mathcal{C}}_{\mathrm{c}}^{k}\left({\Omega},{\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ in ${\mathcal{C}}^{k}\left(\overline{{\Omega}},{\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ . Of course, the analogous constructions apply to symmetric tensor fields, leading to the spaces ${\mathcal{C}}^{k}\left(\overline{{\Omega}},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ , ${\mathcal{C}}_{\mathrm{c}}^{k}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ and ${\mathcal{C}}_{0}^{k}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ as well as the space of test symmetric tensor fields ${\mathcal{C}}_{\mathrm{c}}^{\infty }\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ .

As Ω is assumed to be a connected set, we are able to describe the kernels of ∇^k and ${\mathcal{E}}^{k}$ for (symmetric) tensor fields in terms of finite-dimensional spaces of polynomials.

Proposition 3.7. Let $u\in {\mathcal{C}}^{k}\left(\overline{{\Omega}},{\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ such that ∇^k ⊗ u = 0. Then, u is a ${\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)$ -valued polynomial of maximal order k − 1, i.e., there are ${\xi }^{m}\in {\mathcal{T}}^{l+m}\left({\mathbf{R}}^{d}\right)$ , m = 0, ..., k − 1 such that

$\begin{equation}u\left(x\right)=\sum _{m=0}^{k-1}{\mathrm{t}\mathrm{r}}^{m}\left({\xi }^{m}\otimes \left(\underset{m\mathrm{t}\mathrm{i}\mathrm{m}\mathrm{e}\mathrm{s}}{\underbrace{x\otimes \cdots \otimes x}}\right)\right)\quad \text{for}\enspace \text{each}\quad x\in {\Omega}.\end{equation} \tag{ 8 }$

If ${\mathcal{E}}^{k}u=0$ for $u\in {\mathcal{C}}^{k+l}\left(\overline{{\Omega}},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ , then u is a Sym^l(R^d)-valued polynomial of maximal order k + l − 1, i.e., the above representation holds for ξ^m ∈ Sym^l+m(R^d), m = 0, ..., k + l − 1 with the sum ranging from 0 to k + l − 1.

Proof. At first we note that any ${\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)$ - and Sym^l(R^d)-valued polynomial of maximal order k − 1 and k + l − 1, respectively, admits a representation as claimed. In case ∇^k ⊗ u = 0 for $u\in {\mathcal{C}}^{k}\left(\overline{{\Omega}},{\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ it follows directly from a basis representation of u(x) that u is a ${\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)$ -valued polynomial of maximal order k − 1.

Now in case ${\mathcal{E}}^{k}u=0$ for $u\in {\mathcal{C}}^{k+l}\left(\overline{{\Omega}},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ , we get that ∇^k+l ⊗ u = 0, see lemma A.3. This implies that u is a Sym^l(R^d)-valued polynomial of maximal degree k + l − 1 as claimed. □

Next, we would like to introduce and discuss weak forms of differentiation for (symmetric) tensor fields. Starting point for this is a version of the well-known Gauss–Green theorem for smooth (symmetric) tensor fields [28].

Proposition 3.8. Let Ω ⊂ R^d be a bounded Lipschitz domain, $u\in \mathcal{C}\left(\overline{{\Omega}},{\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ , $v\in {\mathcal{C}}^{1}\left(\overline{{\Omega}},{\mathcal{T}}^{l+1}\left({\mathbf{R}}^{d}\right)\right)$ . Then, a Gauss–Green theorem holds in the following form:

$\begin{equation*}{\int }_{{\Omega}}u\cdot \mathrm{d}\mathrm{i}\mathrm{v}\enspace v\enspace \mathrm{d}x={\int }_{\partial {\Omega}}\left(u\otimes \nu \right)\cdot v\enspace \mathrm{d}{\mathcal{H}}^{d-1}-{\int }_{{\Omega}}\left(\nabla \otimes u\right)\cdot v\enspace \mathrm{d}x\end{equation*}$

with ν being the outward unit normal on ∂Ω.

If $u\in \mathcal{C}\left(\overline{{\Omega}},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ , $v\in {\mathcal{C}}^{1}\left(\overline{{\Omega}},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l+1}\left({\mathbf{R}}^{d}\right)\right)$ , the identity reads as

$\begin{equation*}{\int }_{{\Omega}}u\cdot \mathrm{d}\mathrm{i}\mathrm{v}\enspace v\enspace \mathrm{d}x={\int }_{\partial {\Omega}}\vert \vert \vert \left(u\otimes \nu \right)\cdot v\enspace \mathrm{d}{\mathcal{H}}^{d-1}-{\int }_{{\Omega}}\mathcal{E}u\cdot v\enspace \mathrm{d}x.\end{equation*}$

If one of the tensor fields u or v have compact support in Ω the boundary term does not appear and the identities are valid for arbitrary domains Ω.

As usual, being able to express integrals of the form ${\int }_{{\Omega}}$ (∇ ⊗ u) ⋅ v dx and ${\int }_{{\Omega}}\mathcal{E}u\cdot v\enspace \mathrm{d}x$ for test tensor fields without the derivative of u allows to introduce a weak notion of ∇ ⊗ u and $\mathcal{E}u$ , respectively, as well as associated Sobolev spaces.

Definition 3.9. For $u\in {L}_{\mathrm{l}\mathrm{o}\mathrm{c}}^{1}\left({\Omega},{\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ , $w\in {L}_{\mathrm{l}\mathrm{o}\mathrm{c}}^{1}\left({\Omega},{\mathcal{T}}^{l+1}\left({\mathbf{R}}^{d}\right)\right)$ is the weak derivative of u, denoted w = ∇ ⊗ u, if for all $\varphi \in {\mathcal{C}}_{\mathrm{c}}^{\infty }\left({\Omega},{\mathcal{T}}^{l+1}\left({\mathbf{R}}^{d}\right)\right)$ , it holds that

$\begin{equation*}{\int }_{{\Omega}}u\cdot \mathrm{d}\mathrm{i}\mathrm{v}\enspace \varphi \enspace \mathrm{d}x=-{\int }_{{\Omega}}w\cdot \varphi \enspace \mathrm{d}x.\end{equation*}$

Likewise, for $u\in {L}_{\mathrm{l}\mathrm{o}\mathrm{c}}^{1}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ , w ∈ L¹ _loc(Ω, Sym^l+1(R^d)) is the weak symmetrised derivative of u, denoted $w=\mathcal{E}u$ , if the above identity holds for all $\varphi \in {\mathcal{C}}_{\mathrm{c}}^{\infty }\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l+1}\left({\mathbf{R}}^{d}\right)\right)$ .

Like the scalar versions, ∇ and $\mathcal{E}$ are well-defined and constitute closed operators between the respective Lebesgue spaces with dense domain.

Definition 3.10. The Sobolev space of tensor fields of order l of differentiation order k and exponent p ∈ [1, ∞] is defined as

$\begin{equation*}\begin{aligned}\hfill {H}^{k,p}\left({\Omega},{\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)\right)& =\left\{u\in {L}^{p}\left({\Omega},{\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)\right)\enspace \vert \enspace {{\Vert}u{\Vert}}_{k,p}{< }\infty \right\},\hfill \\ \hfill {{\Vert}u{\Vert}}_{k,p}& ={\left(\sum _{m=0}^{k}{{\Vert}{\nabla }^{m}\otimes u{\Vert}}_{p}^{p}\right)}^{1/p}\quad \text{if}\enspace p{< }\infty ,\hfill \\ \hfill {{\Vert}u{\Vert}}_{k,\infty }& =\underset{m=0,\dots ,k}{\mathrm{max}}\enspace {{\Vert}{\nabla }^{m}\otimes u{\Vert}}_{\infty },\hfill \end{aligned}\end{equation*}$

while ${H}_{0}^{k,p}\left({\Omega},{\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ is the closure of the subspace ${\mathcal{C}}_{\mathrm{c}}^{\infty }\left({\Omega},{\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ with respect to the ||⋅||_k,p-norm.

Replacing ${\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)$ by Sym^l(R^d) and letting

$\begin{equation*}{{\Vert}u{\Vert}}_{k,p}={\left(\sum _{m=0}^{k}{{\Vert}{\mathcal{E}}^{m}u{\Vert}}_{p}^{p}\right)}^{1/p}\quad \text{if}\enspace p{< }\infty ,\quad {{\Vert}u{\Vert}}_{\infty ,k}=\underset{m=0,\dots ,k}{\mathrm{max}}\enspace {{\Vert}{\mathcal{E}}^{m}u{\Vert}}_{\infty },\end{equation*}$

defines the Sobolev space of symmetric tensor fields, denoted by H^k,p(Ω, Sym^l(R^d)). The space ${H}_{0}^{k,p}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ is again the closure ${\mathcal{C}}_{\mathrm{c}}^{\infty }\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ with respect to the corresponding norm.

By closedness of the differential operators, the Sobolev spaces are Banach spaces. Also, since weak derivatives are symmetric, we have that ${H}^{k,p}\left({\Omega},{\mathcal{T}}^{0}\left({\mathbf{R}}^{d}\right)\right)={H}^{k,p}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{0}\left({\mathbf{R}}^{d}\right)\right)$ in the sense of Banach space isometry, as well as coincidence with the usual Sobolev spaces. For l ⩾ 1, the space ${H}^{k,p}\left({\Omega},{\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ corresponds to the space where all components of u are in H^k,p(Ω). However, generally, for l ⩾ 1, the norm of H^k,p(Ω, Sym^l(R^d)) is weaker than the norm in ${H}^{k,p}\left({\Omega},{\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ , such that only ${H}^{k,p}\left({\Omega},{\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)\right)\hookrightarrow {H}^{k,p}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ in the sense of continuous embedding and the latter is a strictly larger space.

Nevertheless, equality holds if some kind of Korn's inequality can be established which is, for instance, the case for the spaces ${H}_{0}^{1,p}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{1}\left({\mathbf{R}}^{d}\right)\right)$ for 1 < p < ∞ [120, section 5.6] as well as and the spaces ${H}_{0}^{1,2}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ for l ⩾ 1 (which follows from [28, proposition 3.6] via smooth approximation).

Finally, let us briefly discuss (symmetric) tensor-valued distributions and the distributional forms of ∇^k and ${\mathcal{E}}^{k}$ .

Definition 3.11. A ${\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)$ -valued distribution on Ω is a linear mapping $u:{\mathcal{C}}_{\mathrm{c}}^{\infty }\left({\Omega},{\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)\right)\to \mathbf{R}$ that satisfies the following continuity estimate: for each K ⊂⊂ Ω, there is an m ∈ N and a C > 0 such that

$\begin{equation*}\left\vert u\left(\varphi \right)\right\vert {\leqslant}C{{\Vert}\varphi {\Vert}}_{m,\infty }\quad \text{for}\enspace \text{all}\quad \varphi \in {\mathcal{C}}_{\mathrm{c}}^{\infty }\left(K,{\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)\right).\end{equation*}$

The distribution u is regular if there is a $\bar{u}\in {L}_{\mathrm{l}\mathrm{o}\mathrm{c}}^{1}\left({\Omega},{\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ such that

$\begin{equation*}u\left(\varphi \right)={\int }_{{\Omega}}\bar{u}\cdot \varphi \enspace \mathrm{d}x\quad \text{for}\enspace \text{all}\quad \varphi \in {\mathcal{C}}_{\mathrm{c}}^{\infty }\left({\Omega},{\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)\right).\end{equation*}$

A Sym^l(R^d)-valued distribution on Ω and its regularity is analogously defined by replacing ${\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)$ by Sym^l(R^d) in the above definition.

Then, the distributional (symmetrised) derivatives are given by (∇^k ⊗ u)(φ) = (−1)^k u(div^k φ), $\varphi \in {\mathcal{C}}_{\mathrm{c}}^{\infty }\left({\Omega},{\mathcal{T}}^{k+1}\left({\mathbf{R}}^{d}\right)\right)$ and $\left({\mathcal{E}}^{k}u\right)\left(\varphi \right)={\left(-1\right)}^{k}u\left({\mathrm{d}\mathrm{i}\mathrm{v}}^{k}\varphi \right)$ , $\varphi \in {\mathcal{C}}_{\mathrm{c}}^{\infty }\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{k+1}\left({\mathbf{R}}^{d}\right)\right)$ which makes them a ${\mathcal{T}}^{k+l}\left({\mathbf{R}}^{d}\right)$ - and Sym^k+l(R^d)-valued distribution, respectively. We then have the following generalisation of theorem 3.7 which will be useful for analysing functionals that depend on (symmetrised) distributional derivatives.

Proposition 3.12. If ∇^k ⊗ u = 0 for a ${\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)$ -valued distribution, then u is regular and a ${\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)$ -valued polynomial of maximal degree k − 1.

If ${\mathcal{E}}^{k}u=0$ for a Sym^l(R^d)-valued distribution, then u is regular and a Sym^l(R^d)-valued polynomial of maximal degree k + l − 1.

Proof. This can be deduced from proposition 3.7 via mollification arguments similar as in [28, proposition 3.3]. □

3.2. Functions of higher-order bounded variation

In the following, we discuss functions whose derivative is a Radon measure for a fixed order of differentiation. As higher-order derivatives of scalar functions are always symmetric, it suffices to consider only the symmetrised higher-order derivative ${\mathcal{E}}^{k}$ in this case as well as symmetric tensor fields. However, as we are also interested in intermediate differentiation orders, we moreover discuss spaces of symmetric tensors for which the symmetrised derivative of some order is a Radon measure.

In the following, recall that k ⩾ 1 denotes a differentiation order and l ⩾ 0 denotes a tensor order.

Definition 3.13. Let Ω ⊂ R^d be a domain.

(a)
In the case l = 0, for $u\in {L}_{\mathrm{l}\mathrm{o}\mathrm{c}}^{1}\left({\Omega}\right)$ , the total variation of order k is defined as
$\begin{equation*}{\mathrm{T}\mathrm{V}}^{k}\left(u\right)=\mathrm{sup}\enspace \left\{{\int }_{{\Omega}}u\enspace {\mathrm{d}\mathrm{i}\mathrm{v}}^{k}\varphi \enspace \mathrm{d}x\enspace \vert \enspace \varphi \in {\mathcal{C}}_{\mathrm{c}}^{k}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{k}\left({\mathbf{R}}^{d}\right)\right),\enspace {{\Vert}\varphi {\Vert}}_{\infty }{\leqslant}1\right\}.\end{equation*}$
For general l ⩾ 0 and $u\in {L}_{\mathrm{l}\mathrm{o}\mathrm{c}}^{1}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ , the total deformation of order k is
$\begin{equation*}{\mathrm{T}\mathrm{D}}^{k}\left(u\right)=\mathrm{sup}\enspace \left\{{\int }_{{\Omega}}u\cdot {\mathrm{d}\mathrm{i}\mathrm{v}}^{k}\varphi \enspace \mathrm{d}x\enspace \vert \enspace \varphi \in {\mathcal{C}}_{\mathrm{c}}^{k}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{k+l}\left({\mathbf{R}}^{d}\right)\right),\enspace {{\Vert}\varphi {\Vert}}_{\infty }{\leqslant}1\right\}.\end{equation*}$
(b)
The normed space according to
$\begin{equation*}\begin{aligned}\hfill {\mathrm{B}\mathrm{D}}^{k}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)& =\left\{u\in {L}^{1}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\right)\left({\mathbf{R}}^{d}\right)\enspace \vert \enspace {\mathrm{T}\mathrm{D}}^{k}\left(u\right){< }\infty \right\},\hfill \\ \hfill {{\Vert}u{\Vert}}_{{\mathrm{B}\mathrm{D}}^{k}}& ={{\Vert}u{\Vert}}_{1}+{\mathrm{T}\mathrm{D}}^{k}\left(u\right)\hfill \end{aligned}\end{equation*}$
is called the space of symmetric tensor fields of bounded deformation of order k. The scalar case, i.e., l = 0, is referred to as the space of functions of bounded variation of order k. The latter spaces are denoted by BV^k(Ω).

We note that the Hilbert-space norm on the tensor space for the definition of TV^k leads to a corresponding pointwise norm on the derivatives. While this choice is rather natural, and does not require to distinguish primal and dual norms, also other choices are possible for which we refer to [128] in the second-order case.

Let us analyse some of the basic properties of these spaces.

Proposition 3.14. Let Ω ⊂ R^d be a domain, p ∈ [1, ∞]. Then:

(a)
TD^k is proper, convex and a lower semi-continuous seminorm on L^p(Ω, Sym^l(R^d)).
(b)
TD^k(u) = 0 if and only if ${\mathcal{E}}^{k}u=0$ . In particular, TD^k(u) = 0 implies that u is a Sym^l(R^d)-valued polynomial of maximal degree k + l − 1.

Proof. With p* being the dual exponent to p, each test tensor field obeys ${\mathrm{d}\mathrm{i}\mathrm{v}}^{k}\varphi \in {L}^{{p}^{{\ast}}}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ for $\varphi \in {\mathcal{C}}_{\mathrm{c}}^{k}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{k+l}\left({\mathbf{R}}^{d}\right)\right)$ . The functional TD^k is thus a pointwise supremum over a set of continuous linear functionals and, consequently, convex and lower semi-continuous. By definition, it is obviously proper and positively homogeneous since if div^k φ is a test vector field, then also −div^k φ is.

By definition of TD^k we see that TD^k(u) = 0 if and only if ${\int }_{{\Omega}}$ u ⋅ div^k φ dx = 0 for each $\varphi \in {\mathcal{C}}_{\mathrm{c}}^{k}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{k+l}\left({\mathbf{R}}^{d}\right)\right)$ . But this is equivalent to ${\mathcal{E}}^{k}u=0$ in the distributional sense such that in particular, the polynomial representation follows from proposition 3.12. □

In order to show more properties, for instance, that BD^k(Ω, Sym^l(R^d)) is a Banach space, let us adopt a more abstract viewpoint. We say that a function $\left\vert \cdot \right\vert :X\to \left[0,\infty \right]$ for X a Banach space is a lower semi-continuous seminorm on X if $\left\vert \cdot \right\vert$ is positive homogeneous, satisfies the triangle inequality and is lower semi-continuous. The kernel of $\left\vert \cdot \right\vert$ , denoted $\mathrm{ker}\left(\left\vert \cdot \right\vert \right)$ , is the set $\left\{x\in X\enspace \vert \enspace \left\vert x\right\vert =0\right\}$ which is a closed linear subspace of X.

Lemma 3.15. Let $\left\vert \cdot \right\vert$ be a lower semi-continuous seminorm on the Banach space X with norm ||⋅||_X. Then,

$\begin{equation*}Y=\left\{x\in X\enspace \vert \enspace \left\vert x\right\vert {< }\infty \right\},\quad {{\Vert}x{\Vert}}_{Y}={{\Vert}x{\Vert}}_{X}+\left\vert x\right\vert \end{equation*}$

is a Banach space. The seminorm $\left\vert \cdot \right\vert$ is continuous in Y.

Proof. It is immediate that Y is a normed space. Let {xⁿ} be a Cauchy sequence in Y which is obviously a Cauchy sequence in X. Hence, a limit x ∈ X exists for which the lower semi-continuity yields $\left\vert x\right\vert {\leqslant}{\text{lim inf}}_{n\to \infty }\left\vert {x}^{n}\right\vert {< }\infty$ , the latter since $\left\{\left\vert {x}^{n}\right\vert \right\}$ is a real Cauchy sequence. In particular, x ∈ Y.

To obtain convergence with respect to $\left\vert \cdot \right\vert$ , choose, for ɛ > 0, an n such that for all m ⩾ n, $\left\vert {x}^{n}-{x}^{m}\right\vert {\leqslant}\varepsilon$ . Letting m → ∞ gives, as xⁿ − x^m → xⁿ − x in X,

$\begin{equation*}\left\vert {x}^{n}-x\right\vert {\leqslant}\underset{n\to \infty }{\text{lim inf}}\enspace \left\vert {x}^{n}-{x}^{m}\right\vert {\leqslant}\varepsilon .\end{equation*}$

This implies xⁿ → x in Y which is what we intended to show.

Finally, the continuity of $\left\vert \cdot \right\vert$ follows from the standard estimate $\left\vert \left\vert {x}^{1}\right\vert -\left\vert {x}^{2}\right\vert \right\vert {\leqslant}\left\vert {x}^{1}-{x}^{2}\right\vert {\leqslant}{{\Vert}{x}^{1}-{x}^{2}{\Vert}}_{Y}$ for x¹, x² ∈ Y. □

It is then obvious from proposition 3.14 and lemma 3.15 that BD^k(Ω, Sym^l(R^d)) is a Banach space. In order to examine the structure of these spaces, it is crucial to understand the case k = 1, i.e., BD(Ω, Sym^l(R^d)) = BD¹(Ω, Sym^l(R^d)), where the symmetrised derivative is only a measure. For l ⩾ 1, these spaces are strictly greater that BV(Ω, Sym^l(Ω)) as a consequence of the failure of Korn's inequality. Important properties of these spaces are summarised as follows.

Theorem 3.16. ([32, theorem 2.6]).If u is a Sym^l(R^d)-valued distribution on Ω a bounded Lipschitz domain with $\mathcal{E}u\in \mathcal{M}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l+1}\left({\mathbf{R}}^{d}\right)\right)$ , then u ∈ BD(Ω, Sym^l(R^d)).

Theorem 3.17. ([28, theorems 4.16 and 4.17]). For Ω a bounded Lipschitz domain and, 1 ⩽ p ⩽ d/(d − 1), the space BD(Ω, Sym^l(R^d)) is continuously embedded in L^p(Ω, Sym^l(R^d)). Moreover, for p < d/(d − 1), the embedding is compact.

Theorem 3.18. (Sobolev–Korn inequality [28, corollary 4.20]). For Ω a bounded Lipschitz domain and ${R}_{l}:{L}^{d/\left(d-1\right)}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)\to \mathrm{ker}\left(\mathcal{E}\right)$ a linear and continuous projection onto the kernel of $\mathcal{E}$ , there exists a constant C > 0 such that for each u ∈ BD(Ω, Sym^l(R^d)) it follows that

$\begin{equation}{{\Vert}u-{R}_{l}u{\Vert}}_{d/\left(d-1\right)}{\leqslant}C{{\Vert}\mathcal{E}u{\Vert}}_{\mathcal{M}}.\end{equation} \tag{ 9 }$

Note that the projection R_l as stated always exists as $\mathrm{ker}\left(\mathcal{E}\right)$ is finite-dimensional (see proposition 3.12).

Now, for general k and u ∈ BD^k(Ω, Sym^l(R^d)) fixed, $w={\mathcal{E}}^{k-1}u$ is a Sym^l+k−1(R^d)-valued distribution with the property

$\begin{equation*}\left(\mathcal{E}w\right)\left(\varphi \right)=-w\left(\mathrm{d}\mathrm{i}\mathrm{v}\enspace \varphi \right)={\left(-1\right)}^{k}u\left({\mathrm{d}\mathrm{i}\mathrm{v}}^{k}\varphi \right)={\left(-1\right)}^{k}{\int }_{{\Omega}}u\cdot {\mathrm{d}\mathrm{i}\mathrm{v}}^{k}\varphi \enspace \mathrm{d}x={\int }_{{\Omega}}\varphi \cdot \enspace \mathrm{d}{\mathcal{E}}^{k}u\end{equation*}$

for $\varphi \in {\mathcal{C}}_{\mathrm{c}}^{\infty }\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{k+l}\left({\mathbf{R}}^{d}\right)\right)$ . In other words, $\mathcal{E}w={\mathcal{E}}^{k}u\in \mathcal{M}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{k+l}\left({\mathbf{R}}^{d}\right)\right)$ , thus theorem 3.16 implies that ${\mathcal{E}}^{k-1}u=w\in \mathrm{B}\mathrm{D}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{k+l-1}\left({\mathbf{R}}^{d}\right)\right)$ and, in particular, we have u ∈ BD^k−1(Ω, Sym^l(R^d)). Hence, the spaces are nested:

$\begin{equation*}{\mathrm{B}\mathrm{D}}^{k}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)\subset {\mathrm{B}\mathrm{D}}^{k-1}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)\subset \cdots \subset \mathrm{B}\mathrm{D}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right).\end{equation*}$

Let us look at the norms: by the Sobolev–Korn inequality (9), for some linear projection ${R}_{k+l-1}:\mathrm{B}\mathrm{D}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{k+l-1}\left({\mathbf{R}}^{d}\right)\right)\to \mathrm{ker}\left(\mathcal{E}\right)$ , we see

$\begin{equation*}{{\Vert}{\mathcal{E}}^{k-1}u-{R}_{k+l-1}{\mathcal{E}}^{k-1}u{\Vert}}_{1}{\leqslant}C{{\Vert}{\mathcal{E}}^{k}u{\Vert}}_{\mathcal{M}}\end{equation*}$

which implies

$\begin{equation*}{{\Vert}{\mathcal{E}}^{k-1}u{\Vert}}_{\mathcal{M}}{\leqslant}C\left({{\Vert}{\mathcal{E}}^{k}u{\Vert}}_{\mathcal{M}}+{{\Vert}{R}_{k+l-1}{\mathcal{E}}^{k-1}u{\Vert}}_{1}\right).\end{equation*}$

Now, $u{\mapsto}{R}_{k+l-1}{\mathcal{E}}^{k-1}u$ is well-defined on BD^k(Ω, Sym^l(R^d)), linear, has finite-dimensional image and is hence continuous. We may therefore estimate

$\begin{equation*}{{\Vert}{\mathcal{E}}^{k-1}u{\Vert}}_{\mathcal{M}}{\leqslant}C\left({{\Vert}u{\Vert}}_{1}+{{\Vert}{\mathcal{E}}^{k}u{\Vert}}_{\mathcal{M}}\right).\end{equation*}$

Proceeding inductively, we arrive at the estimate

$\begin{equation}\sum _{m=0}^{k}{{\Vert}{\mathcal{E}}^{m}u{\Vert}}_{\mathcal{M}}{\leqslant}C\left({{\Vert}u{\Vert}}_{1}+{{\Vert}{\mathcal{E}}^{k}u{\Vert}}_{\mathcal{M}}\right)\end{equation} \tag{ 10 }$

for some C > 0 independent of u. Therefore, we obtain the following theorem.

Theorem 3.19. If Ω ⊂ R^d is a bounded Lipschitz domain, then the norm equivalence

$\begin{equation}{{\Vert}u{\Vert}}_{1}+{{\Vert}{\mathcal{E}}^{k}u{\Vert}}_{\mathcal{M}}\sim \sum _{m=0}^{k}{{\Vert}{\mathcal{E}}^{m}u{\Vert}}_{\mathcal{M}}\end{equation} \tag{ 11 }$

holds on BD^k(Ω, Sym^l(R^d)). The embeddings

$\begin{equation*}{\mathrm{B}\mathrm{D}}^{k}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)\hookrightarrow {\mathrm{B}\mathrm{D}}^{k-1}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)\hookrightarrow \cdots \hookrightarrow \mathrm{B}\mathrm{D}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)\end{equation*}$

are continuous.

Proof. The nontrivial estimate to establish norm equivalence has just been shown in (10). The continuity of the embedding follows from the fact that the norm on the right-hand side in (11) is increasing with respect to k. □

In the scalar case, we can furthermore establish Sobolev embeddings.

Theorem 3.20. Let Ω be a bounded Lipschitz domain and 0 ⩽ m < k.

For k − m ⩽ d : the space BV^k(Ω) is continuously embedded in H^m,p(Ω) for $1{\leqslant}p{\leqslant}\frac{d}{d-\left(k-m\right)}$ , where we set $\frac{d}{d-\left(k-m\right)}=\infty$ for k − m = d.

If $p{< }\frac{d}{d-\left(k-m\right)}$ , then the embedding is compact.

For k − m > d : the space BV^k(Ω) is compactly embedded in ${\mathcal{C}}^{m,\alpha }\left(\overline{{\Omega}}\right)$ for each $\alpha \in \left.\right]0,1 \left[\right.$ .

Proof. In the scalar case, ${{\Vert}u{\Vert}}_{1}+{\sum }_{\left\vert \beta \right\vert {\leqslant}k-1}{{\Vert}\nabla {\partial }^{\beta }u{\Vert}}_{\mathcal{M}}$ for β ∈ N^d a multiindex and u ∈ BV^k(Ω) constitutes an equivalent norm on BV^k(Ω), as a consequence of theorem 3.19. By the Poincaré inequality in BV(Ω),

$\begin{equation*}{{\Vert}{\partial }^{\beta }u{\Vert}}_{d/\left(d-1\right)}{\leqslant}C\left({{\Vert}\nabla {\partial }^{\beta }u{\Vert}}_{\mathcal{M}}+{{\Vert}u{\Vert}}_{1}\right)\end{equation*}$

for each $\left\vert \beta \right\vert {\leqslant}k-1$ . This establishes the continuous embedding BV^k(Ω) → W^{k−1,d/(d−1)}(Ω). Application of the well-known embedding theorems for Sobolev spaces (see [1, theorems 5.4 and 6.2]) then give the results for the cases k − m < d and k − m > d as well as for the case k − m = d and p < ∞.

For the case k − m = d and p = ∞ we note that again by Sobolev embeddings [1, theorem 5.4] we get for a constant C > 0 and all u ∈ H^k,1(Ω) that

$\begin{equation*}\sum _{i=0}^{m}{{\Vert}{\nabla }^{i}u{\Vert}}_{\infty }{\leqslant}C\sum _{i=0}^{k}{{\Vert}{\nabla }^{i}u{\Vert}}_{1}.\end{equation*}$

Approximating u ∈ BV^k(Ω) with a sequence {uⁿ} in C^∞(Ω) ∩ BV^k(Ω) strictly converging to u in BV^k(Ω) as in Lemma A.4, the result follows from applying this estimate to each uⁿ and using lower semi-continuity of the L^∞-norm with respect to convergence in L¹. □

We would like to employ TD^k as a regulariser and first characterise its kernel. For that purpose, we note that TD^k(u) = 0 for some u ∈ BD^k(Ω, Sym^l(R^d)) implies that ${\mathcal{E}}^{k}u=0$ in the distributional sense, hence proposition 3.12 implies that u is a Sym^l(R^d)-valued polynomial of maximal degree k + l − 1. This yields the following result.

Proposition 3.21. The space ker(TD^k) is a subspace of polynomials of degree less than k + l. If l = 0, then ker(TV^k) = P^k−1 = {u : Ω → R|u polynomial of degree ⩽ k − 1}.

Next, we like to discuss coercivity of the higher-order total variation functionals.

Proposition 3.22. Let k ⩾ 1, l ⩾ 0 and Ω be a bounded Lipschitz domain. Then, TD^k is coercive in the following sense: for each linear and continuous projection R : L^d/(d−1)(Ω, Sym^l(R^d)) → ker(TD^k), there is a C > 0 such that

$\begin{equation*}{{\Vert}u-Ru{\Vert}}_{d/\left(d-1\right)}{\leqslant}C{\mathrm{T}\mathrm{D}}^{k}\left(u\right)\quad \text{for}\enspace \text{all}\quad u\in {\mathrm{B}\mathrm{D}}^{k}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right).\end{equation*}$

Proof. At first note that by the embeddings BD^k(Ω, Sym^l(R^d)) ↪ BD(Ω, Sym^l(R^d)) ↪ L^d/(d−1)(Ω, Sym^l(R^d)) the left-hand side of the claimed inequality is well defined and finite.

We use a contradiction argument in conjunction with compactness. Suppose for R as stated above there is a sequence {uⁿ} such that ${{\Vert}{u}^{n}-R{u}^{n}{\Vert}}_{d/\left(d-1\right)}=1$ and TD^k(uⁿ) → 0 as n → ∞. This implies $\left\{{{\Vert}{u}^{n}-R{u}^{n}{\Vert}}_{1}\right\}$ being bounded, TD^k(uⁿ − Ruⁿ) → 0 and by theorems 3.19 and 3.17 {uⁿ − Ruⁿ} has to be precompact in L¹(Ω, Sym^l(R^d)), i.e., without loss of generality, we may assume that uⁿ − Ruⁿ → u in L¹(Ω, Sym^l(R^d)). By lower semi-continuity,

$\begin{equation*}{\mathrm{T}\mathrm{D}}^{k}\left(u\right){\leqslant}\underset{n\to \infty }{\text{lim inf}}\enspace {\mathrm{T}\mathrm{D}}^{k}\left({u}^{n}\right)=0,\end{equation*}$

hence u ∈ ker(TD^k) = rg(R). On the other hand, R(uⁿ − Ruⁿ) = 0 for each n as R is a projection, thus, Ru = 0 and, consequently, u = 0. In total, we have lim_n→∞(uⁿ − Ruⁿ) = 0 in BD^k(Ω), and again by continuous embedding, also in L^d/(d−1)(Ω) which is a contradiction to ${{\Vert}{u}^{n}-R{u}^{n}{\Vert}}_{d/\left(d-1\right)}=1$ for all n. Consequently, coercivity has to hold. □

Corollary 3.23. In the scalar case, for p ∈ [1, ∞] with $p{\leqslant}\frac{d}{d-k}$ if k < d, we also have

$\begin{equation*}{{\Vert}u-Ru{\Vert}}_{p}{\leqslant}C{\mathrm{T}\mathrm{V}}^{k}\left(u\right).\end{equation*}$

Proof. This follows with the embedding theorem 3.20:

$\begin{equation*}{{\Vert}u-Ru{\Vert}}_{p}{\leqslant}C\left({{\Vert}u-Ru{\Vert}}_{1}+{\mathrm{T}\mathrm{V}}^{k}\left(u\right)\right){\leqslant}C{\mathrm{T}\mathrm{V}}^{k}\left(u\right).\end{equation*}$

□

Remark 3.24. The above coercivity estimate also implies that the Fenchel conjugate of TV^k is the indicator functional of closed convex set in ${L}^{{p}^{{\ast}}}\left({\Omega}\right)\cap \mathrm{ker}{\left({\mathrm{T}\mathrm{V}}^{k}\right)}^{\perp }$ with non-empty interior. Indeed, for $\xi \in {L}^{{p}^{{\ast}}}\left({\Omega}\right)\cap \mathrm{ker}{\left({\mathrm{T}\mathrm{V}}^{k}\right)}^{\perp }$ such that ${{\Vert}\xi {\Vert}}_{{p}^{{\ast}}}{\leqslant}{C}^{-1}$ it follows for any u ∈ L^p(Ω) that

$\begin{equation*}\langle \xi ,\enspace u\rangle =\langle \xi ,\enspace u-Ru\rangle {\leqslant}{{\Vert}\xi {\Vert}}_{{p}^{{\ast}}}{{\Vert}u-Ru{\Vert}}_{p}{\leqslant}{\mathrm{T}\mathrm{V}}^{k}\left(u\right)\end{equation*}$

which means that ${\left({\mathrm{T}\mathrm{V}}^{k}\right)}^{{\ast}}\left(\xi \right)=0$ . On the other hand, if $\xi \in {L}^{{p}^{{\ast}}}\left({\Omega}\right){\backslash}\mathrm{ker}{\left({\mathrm{T}\mathrm{V}}^{k}\right)}^{\perp }$ , then ⟨ξ, u⟩ > 0 for some u ∈ ker(TV^k). Thus, ⟨ξ, u⟩ > TV^k(u) so ${\left({\mathrm{T}\mathrm{V}}^{k}\right)}^{{\ast}}\left(\xi \right)=\infty$ .

It is interesting to note that a coercivity estimate similar to the one of corollary 3.23 also holds between two higher-order TV functionals of different order.

Lemma 3.25. Let Ω be a bounded Lipschitz domain, 1 ⩽ k₁ < k₂ be two orders of differentiation, $p\in \left[\right.1,\infty \left[\right.$ with p ⩽ d/(d − k₂) if k₂ < d and $R:{L}^{p}\left({\Omega}\right)\to \mathrm{ker}\left({\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)$ be a continuous, linear projection. Then there exists a constant C > 0 such that

$\begin{equation}{\mathrm{T}\mathrm{V}}^{{k}_{1}}\left(u-Ru\right){\leqslant}C{\mathrm{T}\mathrm{V}}^{{k}_{2}}\left(u\right)\end{equation} \tag{ 12 }$

holds for each $u\in {\mathrm{B}\mathrm{V}}^{{k}_{2}}\left({\Omega}\right)$ .

Proof. Assume the opposite, i.e., the existence of {uⁿ} such that ${\mathrm{T}\mathrm{V}}^{{k}_{1}}\left({u}^{n}-R{u}^{n}\right)=1$ and ${\mathrm{T}\mathrm{V}}^{{k}_{2}}\left({u}^{n}\right)\to 0$ as n → ∞. Then, by compact embedding ${\mathrm{B}\mathrm{D}}^{{k}_{2}-{k}_{1}}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{{k}_{1}}\left({\mathbf{R}}^{d}\right)\right)\to {L}^{1}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{{k}_{1}}\left({\mathbf{R}}^{d}\right)\right)$ , we have ${\nabla }^{{k}_{1}}\left({u}^{n}-R{u}^{n}\right)\to v$ as n → ∞ for some $v\in {L}^{1}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{{k}_{1}}\left({\mathbf{R}}^{d}\right)\right)$ for a subsequence (not relabelled). On the other hand, the Poincaré estimate gives ${{\Vert}{u}^{n}-R{u}^{n}{\Vert}}_{p}{\leqslant}C{\mathrm{T}\mathrm{V}}^{{k}_{2}}\left({u}^{n}\right)$ , so uⁿ − Ruⁿ → 0 as n → ∞ in L¹(Ω). By closedness of ${\nabla }^{{k}_{1}}$ this yields v = 0. By convergence in ${L}^{1}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{{k}_{1}}\left({\mathbf{R}}^{d}\right)\right)$ , this gives the contradiction ${\mathrm{T}\mathrm{V}}^{{k}_{1}}\left({u}^{n}-R{u}^{n}\right)\to 0$ as n → ∞. □

3.3. Tikhonov regularisation

The coercivity which has just been established can be regarded as the most important step towards existence for variational problems with TV^k-regularisation. Here, we first prove an existence result for linear inverse problems in a general abstract version.

Theorem 3.26. Let X be a reflexive Banach space, Y be a Banach space, K : X → Y be linear and continuous, S_f : Y → [0, ∞] a proper, convex, lower semi-continuous and coercive discrepancy functional associated with some data f, $\left\vert \cdot \right\vert :X\to \left[0,\infty \right]$ a lower semi-continuous seminorm and α > 0. Assume that there exists a linear and continuous projection $R:X\to \mathrm{ker}\left(\left\vert \cdot \right\vert \right)$ and a C > 0 such that

$\begin{equation*}{{\Vert}u-Ru{\Vert}}_{X}{\leqslant}C\left\vert u\right\vert \quad \text{for}\enspace \text{all}\quad u\in X,\end{equation*}$

and either

(a)
$\mathrm{ker}\left(\left\vert \cdot \right\vert \right)$ is finite-dimensional or, more generally,
(b)
$\mathrm{ker}\left(K\right)\cap \mathrm{ker}\left(\left\vert \cdot \right\vert \right)$ admits a complement Z in $\mathrm{ker}\left(\left\vert \cdot \right\vert \right)$ and ||u||_X ⩽ C||Ku||_Y for some C > 0 and all u ∈ Z.

Then, the Tikhonov minimisation problem

$\begin{equation}\underset{u\in X}{\mathrm{min}}\enspace {S}_{f}\left(Ku\right)+\alpha \left\vert u\right\vert \end{equation} \tag{ 13 }$

is well-posed, i.e., there exists a solution and the solution mapping is stable in sense that, if ${S}_{{f}^{n}}$ converges to S_f as in (4) and $\left\{{S}_{{f}^{n}}\right\}$ is equi-coercive, then for each sequence of minimisers {uⁿ} of (13) with discrepancy ${S}_{{f}^{n}}$ ,

Either ${S}_{{f}^{n}}\left(K{u}^{n}\right)+\alpha \left\vert {u}^{n}\right\vert \to \infty$ as n → ∞ and (13) with discrepancy S_f does not admit a finite solution,
Or ${S}_{{f}^{n}}\left(K{u}^{n}\right)+\alpha \left\vert {u}^{n}\right\vert \to {\mathrm{min}}_{u\in X}{S}_{f}\left(u\right)+\alpha \left\vert u\right\vert$ as n → ∞ and there is, possibly up to shifts by functions in $\mathrm{ker}\left(K\right)\cap \mathrm{ker}\left(\left\vert \cdot \right\vert \right)$ , a weak accumulation point u ∈ X that minimises (13) with discrepancy S_f.

Further, in case (13) with discrepancy S_f admits a finite solution, for each subsequence $\left\{{u}^{{n}_{k}}\right\}$ weakly converging to some u ∈ X, it holds that $\left\vert {u}^{{n}_{k}}\right\vert \to \left\vert u\right\vert$ as k → ∞. Also, if S_f is strictly convex and K is injective, finite solutions u of (13) are unique and uⁿ ⇀ u in X.

The same result is true if, for instance, instead of being reflexive, X is the dual of a separable space, and we replace weak convergence by weak^* convergence in the (lower semi-) continuity assumptions on K, $\left\vert \cdot \right\vert$ , S_f and in (4).

Proof. At first note that $\mathrm{ker}\left(\left\vert \cdot \right\vert \right)$ being finite-dimensional implies condition (b) above, hence we can assume that (b) holds. We start with existence. Assume that the objective functional in (13) is proper as otherwise, there is nothing to show. For a minimising sequence {uⁿ}, by the coercivity assumption, {uⁿ − Ruⁿ} is bounded in X. Now, (b) implies the existence of a linear and continuous projection ${P}_{Z}:\mathrm{ker}\left(\left\vert \cdot \right\vert \right)\to Z$ such that id − P_Z projects $\mathrm{ker}\left(\left\vert \cdot \right\vert \right)$ onto $\mathrm{ker}\left(K\right)\cap \mathrm{ker}\left(\left\vert \cdot \right\vert \right)$ . With vⁿ = P_Z Ruⁿ, we see that also {uⁿ − Ruⁿ + vⁿ} is a minimising sequence and it suffices to show boundedness of {vⁿ} to obtain a convergent subsequence. But the latter holds true since by assumption ${{\Vert}{v}^{n}{\Vert}}_{p}{\leqslant}C{{\Vert}K{v}^{n}{\Vert}}_{Y}$ , such that ${{\Vert}K{v}^{n}{\Vert}}_{Y}{\leqslant}{{\Vert}K\left({u}^{n}-R{u}^{n}+{v}^{n}\right){\Vert}}_{Y}+{\Vert}K{\Vert}{{\Vert}{u}^{n}-R{u}^{n}{\Vert}}_{X}$ , with the right-hand side being bounded as a consequence of the coercivity of S_f and the boundedness of {uⁿ − Ruⁿ}. Hence, as X is reflexive, a subsequence of {uⁿ − Ruⁿ + vⁿ} converges weakly to a limit u ∈ X. By continuity of K and lower semi-continuity of both S_f and $\left\vert \cdot \right\vert$ it follows that u is a solution to (13). In case S_f is strictly convex and K is injective, S_f ◦ K is already strictly convex, so finite minimisers of (13) have to be unique.

Now let {uⁿ} be a sequence of minimisers of (13) with discrepancy ${S}_{{f}^{n}}$ . We denote by $F={S}_{f}\enspace {\circ}\enspace K+\alpha \left\vert \cdot \right\vert$ as well as ${F}_{n}={S}_{{f}^{n}}\enspace {\circ}\enspace K+\alpha \left\vert \cdot \right\vert$ and first suppose that {F_n(uⁿ)} is bounded. We can then add ${v}^{n}-R{u}^{n}\in \mathrm{ker}\left(K\right)\cap \mathrm{ker}\left(\left\vert \cdot \right\vert \right)$ to uⁿ, with vⁿ = P_Z Ruⁿ, and from equi-coercivity of $\left\{{S}_{{f}^{n}}\right\}$ obtain boundedness of {uⁿ − Ruⁿ + vⁿ} as before.

This shows that by shifting the minimisers within $\mathrm{ker}\left(K\right)\cap \mathrm{ker}\left(\left\vert \cdot \right\vert \right)$ always leads to a bounded sequence, i.e., we may assume without loss of generality that {uⁿ} is bounded such that a weak accumulation point exists. Suppose that ${u}^{{n}_{k}}\rightharpoonup u$ as k → ∞. Then, estimating as in the proof of theorem 2.14, we can obtain that u is a minimiser for F and that ${\mathrm{lim}}_{k\to \infty }{F}_{{n}_{k}}\left({u}^{{n}_{k}}\right)=F\left(u\right)$ as well as ${\mathrm{lim}}_{k\to \infty }\left\vert {u}^{{n}_{k}}\right\vert =\left\vert u\right\vert$ . Also, if u is the unique minimiser for (13) with discrepancy S_f, uⁿ ⇀ u as n → ∞ follows since any subsequence has to contain another subsequence that converges weakly to u.

The result for the two remaining cases lim inf_n→∞ F_n(uⁿ) < ∞ and F_n(uⁿ) → ∞, respectively, finally follows analogously to theorem 2.14. □

Given that ker(TV^k) is finite dimensional, the above result immediately implies well-posedness for $\left\vert \cdot \right\vert ={\mathrm{T}\mathrm{V}}^{k}$ with X = L^p(Ω), as stated in the following corollary. The crucial ingredient here is the estimate ||u − Ru||_p ⩽ CTV^k(u), which restricts the exponent of the underlying L^p-space to p ⩽ d/(d − k) if k < d. This shows that, the higher the order of differentiation used in the regularisation, the weaker are the requirements on the underlying spaces and, consequently, on the continuity of the operator K.

Corollary 3.27. With X = L^p(Ω), Ω being a bounded Lipschitz domain, and S_f and K as in theorem 3.26,

$\begin{equation}\underset{u\in {L}^{p}\left({\Omega}\right)}{\mathrm{min}}\enspace {S}_{f}\left(Ku\right)+\alpha {\mathrm{T}\mathrm{V}}^{k}\left(u\right).\end{equation} \tag{ 14 }$

is well-posed in the sense of theorem 3.26 whenever $p\in \left.\right]1,\infty \left[\right.$ with p ⩽ d/(d − k) if k < d.

As can be easily seen from the respective proofs, also the convergence result of theorem 2.17 and the result on convergence rates as in proposition 2.18 transfer to TV^k regularisation.

Theorem 3.28. With the assumptions of corollary 3.27, let u^† ∈ BV(Ω) be a minimum-TV^k-solution of Ku^† = f^† for some data f^† in Y and for each δ > 0 let f^δ be such that ${S}_{{f}^{\delta }}\left({f}^{{\dagger}}\right){\leqslant}\delta$ and denote by u^α,δ a finite solution of (14) for parameter α > 0 and data f^δ. Let the discrepancy functionals $\left\{{S}_{{f}^{\delta }}\right\}$ be equi-coercive and converge to ${S}_{{f}^{{\dagger}}}$ in the sense of (4) and ${S}_{{f}^{{\dagger}}}\left(v\right)=0$ if and only if v = f^†. Choose for each δ > 0 the parameter α > 0 such that

$\begin{equation*}\alpha \to 0,\quad \frac{\delta }{\alpha }\to 0\quad \text{as}\enspace \delta \to 0.\end{equation*}$

Then, up to shifts by functions in ker(K) ∩ P^k−1, {u^α,δ} has at least one L^p-weak accumulation point. Each L^p-weak accumulation point is a minimum-TV^k-solution of Ku = f^† and lim_δ→0TV^k(u^α,δ) = TV^k(u^†).

Proposition 3.29. In the situation of theorem 3.28, let K*w^† ∈ ∂TV^k(u^†) for some w^† ∈ Y*. Then,

$\begin{equation}{D}_{{K}^{{\ast}}{w}^{{\dagger}}}^{{\mathrm{T}\mathrm{V}}^{k}}\left({u}^{\alpha ,\delta },{u}^{{\dagger}}\right){\leqslant}\frac{1}{\alpha }\left({S}_{{f}^{\delta }}^{{\ast}}\left(\alpha {w}^{{\dagger}}\right)+{S}_{{f}^{\delta }}^{{\ast}}\left(-\alpha {w}^{{\dagger}}\right)+2\delta \right).\end{equation} \tag{ 15 }$

The last result in particular guarantees convergence rates for the settings of example 2.19. Note also that the above results remain true in case p = 1 or in case p = d/(d − k) = ∞ and K is weak^*-to-weak continuous.

Let us finally note some first-order optimality conditions. For this purpose, recall that for X a Banach space, the normal cone ${\mathcal{N}}_{K}\left(u\right)$ of a set K ⊂ X at u ∈ K is given by the collection of all w ∈ X* for which ${\langle w,\enspace v-u\rangle }_{{X}^{{\ast}}{\times}X}{\leqslant}0$ for all v ∈ K. If we set ${\mathcal{N}}_{K}\left(u\right)=\varnothing$ for u ∉ K, we have that ${\mathcal{N}}_{K}=\partial {\mathcal{I}}_{K}$ where ${\mathcal{I}}_{K}$ is the indicator function of K, i.e., ${\mathcal{I}}_{K}\left(u\right)=0$ if u ∈ K and ∞ otherwise.

Proposition 3.30. In the situation of corollary 3.27, if ${S}_{f}\left(v\right)=\frac{1}{2}{\Vert}v-f{{\Vert}}_{Y}^{2}$ and Y is a Hilbert space, u* ∈ L^p(Ω) is a solution of

$\begin{equation}\underset{u\in {L}^{p}\left({\Omega}\right)}{\mathrm{min}}\enspace \frac{1}{2}{{\Vert}Ku-f{\Vert}}_{Y}^{2}+\alpha {\mathrm{T}\mathrm{V}}^{k}\left(u\right)\end{equation} \tag{ 16 }$

if and only if

$\begin{equation*}{u}^{{\ast}}\in {\mathcal{N}}_{{\mathrm{T}\mathrm{V}}^{k}}\left(\frac{{K}^{{\ast}}\left(f-K{u}^{{\ast}}\right)}{\alpha }\right)\end{equation*}$

where ${\mathcal{N}}_{{\mathrm{T}\mathrm{V}}^{k}}$ is the normal cone associated with the set $\overline{{\mathcal{B}}_{{\mathrm{T}\mathrm{V}}^{k}}}$ where

$\begin{equation*}{\mathcal{B}}_{{\mathrm{T}\mathrm{V}}^{k}}=\left\{w\in {L}^{{p}^{{\ast}}}\left({\Omega}\right)\enspace \vert \enspace w={\mathrm{d}\mathrm{i}\mathrm{v}}^{k}\varphi ,\enspace \varphi \in {\mathcal{C}}_{\mathrm{c}}^{k}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{k}\left({\mathbf{R}}^{d}\right)\right),\enspace {{\Vert}\varphi {\Vert}}_{\infty }{\leqslant}1\right\}.\end{equation*}$

Proof. As $u{\mapsto}\frac{1}{2}{{\Vert}Ku-f{\Vert}}_{Y}^{2}$ is Gâteaux differentiable, it is continuous with unique subgradient, so, by subdifferential calculus, optimality of u* is equivalent to K*(f − Ku*) ∈ α∂TV^k(u*) which can also be expressed as

$\begin{equation*}{u}^{{\ast}}\in \partial {\left({\mathrm{T}\mathrm{V}}^{k}\right)}^{{\ast}}\left(\frac{{K}^{{\ast}}\left(f-K{u}^{{\ast}}\right)}{\alpha }\right).\end{equation*}$

Now since ${\mathrm{T}\mathrm{V}}^{k}={\mathcal{I}}_{{\mathcal{B}}_{{\mathrm{T}\mathrm{V}}^{k}}}^{{\ast}}$ , it follows that ${\left({\mathrm{T}\mathrm{V}}^{k}\right)}^{{\ast}}={\mathcal{I}}_{{\mathcal{B}}_{{\mathrm{T}\mathrm{V}}^{k}}}^{{\ast}{\ast}}={\mathcal{I}}_{\overline{{\mathcal{B}}_{{\mathrm{T}\mathrm{V}}^{k}}}}$ , so $\partial {\left({\mathrm{T}\mathrm{V}}^{k}\right)}^{{\ast}}={\mathcal{N}}_{{\mathrm{T}\mathrm{V}}^{k}}$ . □

Remark 3.31. In the situation of proposition 3.30, it is also possible to give an a-priori estimate for the solutions of (16) in case K is injective on P^k−1. Indeed, with R : L^p(Ω) → P^k−1 the continuous projection operator on the kernel of TV^k and C > 0 the coercivity constant, i.e., ||u − Ru||_p ⩽ CTV^k(u) for all u ∈ L^p(Ω), by optimality, a solution u* satisfies $\alpha {\mathrm{T}\mathrm{V}}^{k}\left({u}^{{\ast}}\right){\leqslant}\frac{1}{2}{{\Vert}f{\Vert}}_{Y}^{2}$ and consequently, ${{\Vert}u-Ru{\Vert}}_{p}{\leqslant}\frac{1}{2\alpha }C{{\Vert}f{\Vert}}_{Y}^{2}$ . Likewise, comparing with u* − Ru*, optimality also gives ${{\Vert}K{u}^{{\ast}}-f{\Vert}}_{Y}^{2}{\leqslant}{{\Vert}K\left({u}^{{\ast}}-R{u}^{{\ast}}\right)-f{\Vert}}_{Y}^{2}$ , which is equivalent to ${{\Vert}KR{u}^{{\ast}}{\Vert}}_{Y}^{2}{\leqslant}2\langle KR{u}^{{\ast}},\enspace f-K\left({u}^{{\ast}}-R{u}^{{\ast}}\right)\rangle$ . Using $ab{\leqslant}\frac{1}{4}{a}^{2}+{b}^{2}$ , the latter leads to ${{\Vert}KR{u}^{{\ast}}{\Vert}}_{Y}^{2}{\leqslant}4{{\Vert}f-K\left({u}^{{\ast}}-R{u}^{{\ast}}\right){\Vert}}_{Y}^{2}$ , where the right-hand side can further be estimated, using ${\left(a+b\right)}^{2}{\leqslant}\left(1+\varepsilon \right)\left({a}^{2}+\frac{1}{\varepsilon }{b}^{2}\right)$ with $\varepsilon =\frac{1}{4{\alpha }^{2}}{C}^{2}{{\Vert}K{\Vert}}^{2}$ to give

$\begin{equation*}{{\Vert}KR{u}^{{\ast}}{\Vert}}_{Y}^{2}{\leqslant}4\left(1+\frac{{C}^{2}{{\Vert}K{\Vert}}^{2}}{4{\alpha }^{2}}\right)\left(1+{{\Vert}f{\Vert}}_{Y}^{2}\right){{\Vert}f{\Vert}}_{Y}^{2}.\end{equation*}$

Now, as K is injective on P^k−1 = rg(R), there is a c > 0 such that c||Ru||_p ⩽ ||KRu||_Y for all u ∈ L^p(Ω). Consequently, employing the triangle inequality and estimating yields

$\begin{equation}{{\Vert}{u}^{{\ast}}{\Vert}}_{p}{\leqslant}\frac{1}{2\alpha }\left(\frac{2}{c}\sqrt{4{\alpha }^{2}+{C}^{2}{{\Vert}K{\Vert}}^{2}}+C\right)\sqrt{1+{{\Vert}f{\Vert}}_{Y}^{2}}{{\Vert}f{\Vert}}_{Y},\end{equation} \tag{ 17 }$

which is an a priori bound that only requires the knowledge of the Poincaré–Wirtinger-type constant C, the constant c in the inverse estimate for K on P^k+1, as well as an estimate on ||K||. Beyond being of theoretical interest, such a bound can for instance be used in numerical algorithms, see section 6, example 6.24.

If the Kullback–Leibler divergence is used instead of the quadratic Hilbert space discrepancy, i.e., S_f(v) = KL(v, f), Y = L¹(Ω'), and data f ⩾ 0 a.e., then one has to choose a u⁰ ∈ BV^k(Ω) such that KL(Ku⁰, f) < ∞. Set C_f = KL(Ku⁰, f) + αTV^k(u⁰). Then, an optimal solution u* will satisfy ${\mathrm{T}\mathrm{V}}^{k}\left({u}^{{\ast}}\right){\leqslant}\frac{{C}_{f}}{\alpha }$ . Further, we have ||v||₁ ⩽ 2KL(v, f) + 2||f||₁ for v ∈ L¹(Ω') with v ⩾ 0 a.e., see lemma A.1, such that, if c > 0 is a constant with c||Ru||_p ⩽ ||KRu||₁ for all u ∈ L^p(Ω), we get

$\begin{equation*}{{\Vert}R{u}^{{\ast}}{\Vert}}_{p}{\leqslant}\frac{1}{c}\left({{\Vert}K{u}^{{\ast}}{\Vert}}_{1}+{\Vert}K{\Vert}{{\Vert}{u}^{{\ast}}-R{u}^{{\ast}}{\Vert}}_{p}\right){\leqslant}\frac{1}{c}\left(\frac{2\alpha +C{\Vert}K{\Vert}}{\alpha }{C}_{f}+2{{\Vert}f{\Vert}}_{1}\right),\end{equation*}$

and finally arrive at

$\begin{equation}{{\Vert}{u}^{{\ast}}{\Vert}}_{p}{\leqslant}\frac{1}{c}\left(\frac{2\alpha +C{\Vert}K{\Vert}+cC}{\alpha }{C}_{f}+2{{\Vert}f{\Vert}}_{1}\right).\end{equation} \tag{ 18 }$

This constitutes an a priori estimate similar to (17) for the Kullback–Leibler discrepancy, however, with the difference that also a suitable constant C_f has to determined.

Remark 3.32. In order to show the effect of TV² regularisation in contrast to TV regularisation, we performed a numerical denoising experiment for f shown in figure 2(a), i.e., solved ${\mathrm{min}}_{u\in {L}^{2}\left({\Omega}\right)}\frac{1}{2}{{\Vert}u-f{\Vert}}_{2}^{2}+{\mathcal{R}}_{\alpha }\left(u\right)$ where ${\mathcal{R}}_{\alpha }=\alpha \mathrm{T}\mathrm{V}$ or ${\mathcal{R}}_{\alpha }=\alpha {\mathrm{T}\mathrm{V}}^{2}$ . One clearly sees that TV² regularisation (figure 2(c)) reduces the staircasing effect of TV regularisation (figure 2(b)) and piecewise linear structures are well recovered. However, TV² regularisation also blurs the object boundaries which appear less sharp in contrast to TV regularisation.

This is due to the fact that TV^k regularisation for k ⩾ 2 is not able to produce solutions with jump discontinuities. Indeed, TV^k regularisation implies that a solution u has be to contained in BV^k(Ω) which embeds into the Sobolev space H^k−1,1(Ω) ↪ H^1,1(Ω). As we have seen, for instance, in example 2.1, this means that characteristic functions cannot be solutions. More generally, for u ∈ H^1,1(Ω) ⊂ BV(Ω), the derivative ∇u interpreted as a measure is absolutely continuous with respect to the Lebesgue measure such that the singular part satisfies ∇^s u = 0. Theorem 2.21 then implies that the jump set J_u is a ${\mathcal{H}}^{d-1}$ -negligible set, i.e., u cannot jump on (d − 1)-dimensional hypersurfaces.

**Figure 2.** Second-order total-variation denoising example. (a) Noisy image (PSNR: 13.9 dB), (b) regularisation with TV (PSNR: 29.3 dB), (c) regularisation with TV² (PSNR: 27.8 dB), (d) regularisation with ${{\Vert}{\Delta}\cdot {\Vert}}_{\mathcal{M}}$ (PSNR: 26.3 dB). All parameters were manually tuned via grid search to give highest PSNR with respect to the ground truth (figure 1(a)).
Download figure:
Standard image High-resolution image

**Figure 2.** Second-order total-variation denoising example. (a) Noisy image (PSNR: 13.9 dB), (b) regularisation with TV (PSNR: 29.3 dB), (c) regularisation with TV² (PSNR: 27.8 dB), (d) regularisation with ${{\Vert}{\Delta}\cdot {\Vert}}_{\mathcal{M}}$ (PSNR: 26.3 dB). All parameters were manually tuned via grid search to give highest PSNR with respect to the ground truth (figure 1(a)).
Download figure:
Standard image High-resolution image

Remark 3.33. Instead of taking higher-order TV which bases on the full gradient, one could also try to regularise with other differential operators, for instance with the (weak) Laplacian:

$\begin{equation*}{\mathcal{R}}_{\alpha }\left(u\right)=\alpha {{\Vert}{\Delta}u{\Vert}}_{\mathcal{M}}.\end{equation*}$

However, the kernel of this seminorm is the space of p-integrable harmonic functions on Ω, the Bergman spaces, which are infinite-dimensional. Therefore, in view of theorem 3.26, to use ${\mathcal{R}}_{\alpha }$ for the regularisation of ill-posed linear inverse problems, the forward operator K must be continuously invertible on a complement of $\mathrm{ker}\left({\mathcal{R}}_{\alpha }\right)\cap \mathrm{ker}\left(K\right)$ , i.e., well-posed. This limits the applicability of this regulariser. Nevertheless, denoising problems can, for instance, still be solved, see figure 2(d), leading to 'speckle' artefacts in the solutions. Another possibility would be to add more regularising functionals, which is discussed in the next section.

Higher order TV for multichannel images. In analogy to TV, also higher order TV can be extended to colour and multichannel images represented by functions mapping into a vector space, say R^m. This is achieved by testing with ${\mathrm{S}\mathrm{y}\mathrm{m}}^{k}{\left({\mathbf{R}}^{d}\right)}^{m}$ -valued tensor fields, where

$\begin{equation*}{\mathrm{S}\mathrm{y}\mathrm{m}}^{k}{\left({\mathbf{R}}^{d}\right)}^{m}=\left\{\xi =\left({\xi }_{1},\dots ,{\xi }_{m}\right)\enspace \vert \enspace {\xi }_{i}\in {\mathrm{S}\mathrm{y}\mathrm{m}}^{k}\left({\mathbf{R}}^{d}\right),\enspace i=1,\dots ,m\right\}\end{equation*}$

and requires to choose a norm for this space. While, also in view of the Frobenius norm used in Sym^k(R^d), the most natural choice seems to pick the norm that is induced by the inner product

$\begin{equation*}\xi \cdot \eta =\sum _{i=1}^{m}{\xi }_{i}\cdot {\eta }_{i}\quad \text{for}\quad \xi ,\eta \in {\mathrm{S}\mathrm{y}\mathrm{m}}^{k}{\left({\mathbf{R}}^{d}\right)}^{m},\end{equation*}$

as with TV, this is not the only possible choice and different norms imply different types of coupling of the multiple channels. Generally, we can take |⋅|_◦ to be any norm on ${\mathrm{S}\mathrm{y}\mathrm{m}}^{k}{\left({\mathbf{R}}^{d}\right)}^{m}$ , set |⋅|_∗ to be the corresponding dual norm and extend TV^k to functions $u\in {L}_{\mathrm{l}\mathrm{o}\mathrm{c}}^{1}\left({\Omega},{\mathbf{R}}^{m}\right)$ as

$\begin{equation}{\mathrm{T}\mathrm{V}}^{k}\left(u\right)=\mathrm{sup}\enspace \left\{{\int }_{{\Omega}}u\cdot {\mathrm{d}\mathrm{i}\mathrm{v}}^{k}\varphi \enspace \mathrm{d}x\enspace \vert \enspace \varphi \in {\mathcal{C}}_{\mathrm{c}}^{k}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{k}{\left({\mathbf{R}}^{d}\right)}^{m}\right),\enspace {{\Vert}\varphi {\Vert}}_{\infty ,{\ast}}{\leqslant}1\right\}\end{equation} \tag{ 19 }$

where ||φ||_∞,∗ is the pointwise supremum of the scalar function x ↦ |φ(x)|_∗. By equivalence of norms in finite dimensions, the functional-analytic properties of TV^k and the results on regularisation for inverse problems transfer one-to-one to its multichannel extension. Further, TV^k is invariant under rotations whenever the tensor norm |⋅|_∗ is unitarily invariant in the sense that for any orthonormal matrix O ∈ R^d×d and $\left({\xi }_{1},\dots ,{\xi }_{m}\right)\in {\mathrm{S}\mathrm{y}\mathrm{m}}^{k}{\left({\mathbf{R}}^{d}\right)}^{m}$ it holds that

$\begin{equation*}\vert \left({\xi }_{1}O,\dots ,{\xi }_{m}O\right){\vert }_{{\ast}}=\vert \left({\xi }_{1},\dots ,{\xi }_{m}\right){\vert }_{{\ast}},\end{equation*}$

where we define (ξ_i O)(a₁, ..., a_k) = ξ_i(Oa₁, ..., Oa_k) for i = 1, ..., m.

Fractional-order TV. Recently, ideas from fractional calculus started to be transferred to construct new classes of higher-order TV, namely fractional-order total variation. The latter bases on fractional partial differentiation with respect to the coordinate axes. The partial fractional derivative of a non-integral order α > 0 of a function u compactly supported on the interval $\left.\right]a,b\left[\right.\to \mathbf{R}$ can, for instance, be defined as

$\begin{equation*}\frac{{\partial }_{\left[a,b\right]}^{\alpha }u}{\partial {x}^{\alpha }}\left(x\right)=\frac{1}{2}\left(\frac{{\partial }_{\left[a,x\right]}^{\alpha }u}{\partial {x}^{\alpha }}+{\left(-1\right)}^{k}\frac{{\partial }_{\left[x,b\right]}^{\alpha }u}{\partial {x}^{\alpha }}\right),\end{equation*}$

where k ∈ N is such that k − 1 < α < k and, denoting by Γ the gamma-function, i.e., ${\Gamma}\left(t\right)={\int }_{0}^{\infty }{s}^{t-1}{\mathrm{e}}^{-t}\enspace \mathrm{d}s$ ,

$\begin{equation*}\frac{{\partial }_{\left[a,x\right]}^{\alpha }u}{\partial {x}^{\alpha }}=\frac{1}{{\Gamma}\left(k-\alpha \right)}\frac{{\partial }^{k}}{\partial {x}^{k}}{\int }_{a}^{x}\frac{u\left(t\right)}{{\left(x-t\right)}^{\alpha -k+1}}\enspace \mathrm{d}t\end{equation*}$

as well as

$\begin{equation*}\frac{{\partial }_{\left[x,b\right]}^{\alpha }u}{\partial {x}^{\alpha }}=\frac{{\left(-1\right)}^{k}}{{\Gamma}\left(k-\alpha \right)}\frac{{\partial }^{k}}{\partial {x}^{k}}{\int }_{x}^{b}\frac{u\left(t\right)}{{\left(t-x\right)}^{\alpha -k+1}}\enspace \mathrm{d}t.\end{equation*}$

This fractional-order derivative corresponds to a central version of the Riemann–Liouville definition [140, 199]. However, one has to mention that there are also other possibilities to define fractional-order derivatives [146]. On a rectangular domain ${\Omega}=\left.\right]{a}_{1},{b}_{1}\left[\right.{\times}\cdots {\times} \left.\right]{a}_{d},{b}_{d}\left[\right.\subset {\mathbf{R}}^{d}$ and for test vector fields $\varphi \in {\mathcal{C}}_{\mathrm{c}}^{k}\left({\Omega},{\mathbf{R}}^{d}\right)$ , the fractional divergence of order α can then be defined as ${\mathrm{d}\mathrm{i}\mathrm{v}}^{\alpha }\varphi ={\sum }_{i=1}^{d}\frac{{\partial }_{\left[{a}_{i},{b}_{i}\right]}^{\alpha }{\varphi }_{i}}{\partial {x}_{i}^{\alpha }}$ which is still a bounded function. Consequently, the fractional total variation of order α for u ∈ L¹(Ω) is given as

$\begin{equation*}{\mathrm{T}\mathrm{V}}^{\alpha }\left(u\right)=\mathrm{sup}\enspace \left\{{\int }_{{\Omega}}u\cdot {\mathrm{d}\mathrm{i}\mathrm{v}}^{\alpha }\varphi \enspace \mathrm{d}x\enspace \vert \enspace \varphi \in {\mathcal{C}}_{\mathrm{c}}^{k}\left({\Omega},{\mathbf{R}}^{d}\right),\enspace {{\Vert}\varphi {\Vert}}_{\infty }{\leqslant}1\right\}.\end{equation*}$

It is easy to see that this defines a proper, convex and lower semi-continuous functional on each L^p(Ω) which makes the functional suitable as a regulariser for denoising [194, 199], typically for 1 < α < 2. The use of TV^α for the regularisation of linear inverse problems, however, seems to be unexplored so far, and not many properties of the solutions appear to be known.

4. Combined approaches

We have seen that employing higher-order total variation for regularisation yields well-posedness results for general linear inverse problems that are comparable to first-order TV regularisation, where the use of higher-order differentiation even weakens the continuity requirements on the forward operator. On the other hand, TV^k regularisation, for k > 1, does not allow to recover jump discontinuities, as we have shown analytically and observed numerically (see remark 3.32). An interesting question in this context is how combinations of TV functionals with different orders behave with respect to these properties. Addressing this question, we consider, in the following, the combination of higher-order TV functionals via addition and infimal convolution. While these two approaches yield regularisation methods with rather different analytical properties, each of them has advantages and disadvantages with respect to continuity requirements on the forward operator and the possibility to recover non-smooth structures such as jumps. Also, a combination of two or more functionals introduces additional parameters. To account for that, we consider multi-parameter convergence results for each of the two approaches, which yield, in particular, an interpretation of the involved parameters and can be helpful for parameter choice in practice.

4.1. Additive multi-order regularisation

In this section, we consider the additive combination of total variation functionals with different orders. That is, we are interested in the following Tikhonov approach:

$\begin{equation}\underset{u\in {L}^{p}\left({\Omega}\right)}{\mathrm{min}}{S}_{f}\left(Ku\right)+{\alpha }_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}\left(u\right)+{\alpha }_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\left(u\right)\end{equation} \tag{ 20 }$

with α_i > 0 for i = 1, 2 and 1 ⩽ k₁ < k₂. With k₁ = 1, k₂ = 2, such an approach has for instance been considered in [142] for the regularisation of linear inverse problems.

The following proposition summarises, in the general setting of seminorms, basic properties of the function spaces arising from the additive combination of two different regularisers. Its proof is straightforward.

Proposition 4.1. Let |⋅|₁ and |⋅|₂ be two lower semi-continuous seminorms on the Banach space X. Then,

(a)
The functional $\left\vert \cdot \right\vert =\vert \cdot {\vert }_{1}+\vert \cdot {\vert }_{2}$ is a seminorm on X.
(b)
We have
$\begin{equation*}\mathrm{ker}\left(\left\vert \cdot \right\vert \right)=\mathrm{ker}\left(\vert \cdot {\vert }_{1}\right)\cap \mathrm{ker}\left(\vert \cdot {\vert }_{2}\right).\end{equation*}$
(c)
The seminorm $\left\vert \cdot \right\vert$ is lower semi-continuous and
$\begin{equation*}Y=\left\{x\in X\enspace \vert \enspace \left\vert \cdot \right\vert {< }\infty \right\},\quad {{\Vert}x{\Vert}}_{Y}={{\Vert}x{\Vert}}_{X}+\left\vert x\right\vert \end{equation*}$
constitutes a Banach space.
(d)
With Y_i the Banach spaces arising from the norms ||⋅||_X + |⋅|_i, i = 1, 2 (see lemma 3.15),
$\begin{equation*}Y\hookrightarrow {Y}_{i}\quad \text{for}\enspace i=1,2.\end{equation*}$

Setting $\vert \cdot {\vert }_{i}={\alpha }_{i}{\mathrm{T}\mathrm{V}}^{{k}_{i}}$ for i = 1, 2 shows in particular that the function space associated with the additive combination of the ${\mathrm{T}\mathrm{V}}^{{k}_{i}}$ is embedded in ${\mathrm{B}\mathrm{V}}^{{k}_{2}}\left({\Omega}\right)$ , i.e., the BV space corresponding to the highest order. Hence non-trivial combinations of different ${\mathrm{T}\mathrm{V}}^{{k}_{i}}$ again do not allow to recover jumps and, as the following proposition shows, in fact even yield the same space as the single TV term with the highest order.

Theorem 4.2. Let 1 ⩽ k₁ < k₂, α₁ > 0, α₂ > 0 and Ω be a bounded Lipschitz domain. For X = L¹(Ω) and the seminorm $\left\vert \cdot \right\vert ={\alpha }_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}+{\alpha }_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}$ , let Y be the associated Banach space according to lemma 3.15. Then,

$\begin{equation*}Y={\mathrm{B}\mathrm{V}}^{{k}_{2}}\left({\Omega}\right)\end{equation*}$

in the sense of Banach space equivalence, and for p ∈ [1, ∞], p ⩽ d/(d − k₂) if k₂ < d, $R:{L}^{p}\left({\Omega}\right)\to \mathrm{ker}\left({\mathrm{T}\mathrm{V}}^{{k}_{1}}\right)$ a continuous, linear projection, there is a C > 0 independent of u such that

$\begin{equation*}{{\Vert}u-Ru{\Vert}}_{p}{\leqslant}C\mathrm{min}{\left\{{\alpha }_{1},{\alpha }_{2}\right\}}^{-1}\left({\alpha }_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}+{\alpha }_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left(u\right)\end{equation*}$

for all u ∈ L^p(Ω).

Proof. For the claimed norm equivalence, one estimate is immediate, while the other one follows from theorem 3.19. Denoting by ${R}_{2}:{L}^{p}\left({\Omega}\right)\to \mathrm{ker}\left({\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)$ a continuous, linear projection, the estimate on ||u − Ru||_p follows from corollary 3.23 and norm equivalence in finite-dimensional spaces as

$\begin{align*}\hfill {{\Vert}u-Ru{\Vert}}_{p}& {\leqslant}{{\Vert}u-{R}_{2}u{\Vert}}_{p}+{{\Vert}Ru-{R}_{2}u{\Vert}}_{p}\hfill \\ \hfill & {\leqslant}C\left({\mathrm{T}\mathrm{V}}^{{k}_{2}}\left(u\right)+{{\Vert}Ru-{R}_{2}u{\Vert}}_{1}\right)\hfill \\ \hfill & {\leqslant}C\left({\mathrm{T}\mathrm{V}}^{{k}_{2}}\left(u\right)+{{\Vert}u-Ru{\Vert}}_{1}+{{\Vert}u-{R}_{2}u{\Vert}}_{1}\right)\hfill \\ \hfill & {\leqslant}C\left({\mathrm{T}\mathrm{V}}^{{k}_{1}}\left(u\right)+{\mathrm{T}\mathrm{V}}^{{k}_{2}}\left(u\right)\right)\hfill \\ \hfill & {\leqslant}C\mathrm{min}{\left\{{\alpha }_{1},{\alpha }_{2}\right\}}^{-1}\left({\alpha }_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}\left(u\right)+{\alpha }_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\left(u\right)\right),\hfill \end{align*}$

with C > 0 a generic constant. □

Tikhonov regularisation. For employing ${\alpha }_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}+{\alpha }_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}$ as regularisation in a Tikhonov setting, the coercivity estimate in theorem 4.2 is crucial since it allows to transfer the well-posedness result of theorem 3.26. Observe in particular that $\mathrm{ker}\left({\alpha }_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}+{\alpha }_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)$ is finite-dimensional, such that assumption (a) in theorem 3.26 is satisfied.

Proposition 4.3. With X = L^p(Ω), $p\in \left.\right] 1,\infty \left[\right.$ , Ω a bounded Lipschitz domain, Y a Banach space, K : X → Y linear and continuous, S_f : Y → [0, ∞] proper, convex, lower semi-continuous and coercive, 1 ⩽ k₁ < k₂, α₁ > 0, α₂ > 0 the Tikhonov minimisation problem

$\begin{equation}\underset{u\in {L}^{p}\left({\Omega}\right)}{\mathrm{min}}\enspace {S}_{f}\left(Ku\right)+{\alpha }_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}\left(u\right)+{\alpha }_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\left(u\right).\end{equation} \tag{ 21 }$

is well-posed in the sense of theorem 3.26 whenever p ⩽ d/(d − k₂) if k₂ < d.

It is interesting to note that the necessary coercivity estimate on ${\alpha }_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}+{\alpha }_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}$ uses a projection to the smaller kernel of ${\mathrm{T}\mathrm{V}}^{{k}_{1}}$ and an L^p norm with a larger exponent corresponding to ${\mathrm{T}\mathrm{V}}^{{k}_{2}}$ . Hence, in view of the assumptions in theorem 3.26, the additive combination of ${\mathrm{T}\mathrm{V}}^{{k}_{1}}$ and ${\mathrm{T}\mathrm{V}}^{{k}_{2}}$ inherits the best properties of the two summands, i.e., the ones that are the least restrictive for applications in an inverse problems context.

Regarding the convergence result of theorem 3.28 and the rates of proposition 3.29, a direct extension to regularisation with ${\alpha }_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}+{\alpha }_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}$ can be obtained by regarding the weights α₁, α₂ to be fixed and introducing an additional factor α > 0 for both terms, which then acts as the regularisation parameter. A more tailored approach, however, would be to regard both α₁, α₂ as regularisation parameters and study the limiting behaviour of the method as α₁, α₂ converge to zero in some sense. This is covered by the following theorem.

Theorem 4.4. In the situation of proposition 4.3, let for each δ > 0 the data f^δ be given such that ${S}_{{f}^{\delta }}\left({f}^{{\dagger}}\right){\leqslant}\delta$ , let $\left\{{S}_{{f}^{\delta }}\right\}$ be equi-coercive and converge to ${S}_{{f}^{{\dagger}}}$ for some data f^† ∈ Y in the sense of (4) with ${S}_{{f}^{{\dagger}}}\left(v\right)=0$ if and only if v = f^†.

Choose the positive parameters α = (α₁, α₂) in dependence of δ such that

$\begin{equation*}\mathrm{max}\left\{{\alpha }_{1},{\alpha }_{2}\right\}\to 0,\quad \frac{\delta }{\mathrm{max}\left\{{\alpha }_{1},{\alpha }_{2}\right\}}\to 0,\quad \text{as}\enspace \delta \to 0,\end{equation*}$

and $\left({\tilde {\alpha }}_{1},{\tilde {\alpha }}_{2}\right)=\left({\alpha }_{1},{\alpha }_{2}\right)/\mathrm{max}\left\{{\alpha }_{1},{\alpha }_{2}\right\}\to \left({\alpha }_{1}^{{\dagger}},{\alpha }_{2}^{{\dagger}}\right)$ as δ → 0. Set

$\begin{equation*}k=\begin{cases}_{1}\hfill & \text{if}\enspace {\alpha }_{2}^{{\dagger}}=0,\hfill \\ {k}_{2}\hfill & \text{else},\hfill \end{cases}\end{equation*}$

and assume p ⩽ d/(d − k) in case of k < d, and that there exists u₀ ∈ BV^k(Ω) such that Ku₀ = f^†. Then, up to shifts in $\mathrm{ker}\left(K\right)\cap {\mathbf{P}}^{{k}_{1}-1}$ , any sequence {u^α,δ}, with each u^α,δ being a solution to (20) for parameters (α₁, α₂) and data f^δ, has at least one L^p-weak accumulation point. Each L^p-weak accumulation point is a minimum- $\left({\alpha }_{1}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{1}}+{\alpha }_{2}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)$ -solution of Ku = f^† and ${\mathrm{lim}}_{\delta \to 0}\left({\tilde {\alpha }}_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}+{\tilde {\alpha }}_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{\alpha ,\delta }\right)=\left({\alpha }_{1}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{1}}+{\alpha }_{2}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{{\dagger}}\right)$ .

Proof. First note that, as consequence of theorem 3.26 and the fact that u₀ ∈ BV^k(Ω) with Ku₀ = f^†, there exists a minimum- $\left({\alpha }_{1}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{1}}+{\alpha }_{2}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)$ -solution to Ku = f^†, that we denote by u^†, such that TV^k(u^†) < ∞. Using optimality of u^α,δ compared to u^† gives

$\begin{equation*}{S}_{{f}^{\delta }}\left(K{u}^{\alpha ,\delta }\right)+\left({\alpha }_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}+{\alpha }_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{\alpha ,\delta }\right){\leqslant}\delta +\left({\alpha }_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}+{\alpha }_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{{\dagger}}\right).\end{equation*}$

Since max{α₁, α₂} → 0 as δ → 0, we have that ${S}_{{f}^{\delta }}\left(K{u}^{\alpha ,\delta }\right)\to 0$ as δ → 0. Moreover, as also δ/max{α₁, α₂} → 0, it follows that

$\begin{align*}\hfill \underset{\delta \to 0}{\text{lim sup}}\enspace \left({\alpha }_{1}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{1}}+{\alpha }_{2}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{\alpha ,\delta }\right)& {\leqslant}\underset{\delta \to 0}{\text{lim sup}}\enspace \left({\tilde {\alpha }}_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}+{\tilde {\alpha }}_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{\alpha ,\delta }\right)\hfill \\ \hfill & {\leqslant}\left({\alpha }_{1}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{1}}+{\alpha }_{2}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{{\dagger}}\right).\hfill \end{align*}$

The choice of k allows to conclude that {TV^k(u^α,δ)} is bounded, which, in case k = k₁, means that $\left\{{\mathrm{T}\mathrm{V}}^{{k}_{1}}\left({u}^{\alpha ,\delta }\right)\right\}$ is bounded. Now we show that also in the other case when k = k₂, $\left\{{\mathrm{T}\mathrm{V}}^{{k}_{1}}\left({u}^{\alpha ,\delta }\right)\right\}$ is bounded. To this aim, denote by $R:{L}^{p}\left({\Omega}\right)\to \mathrm{ker}\left({\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)$ and ${P}_{Z}:\mathrm{ker}\left({\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\to Z$ linear, continuous projections, where Z is a complement of $\mathrm{ker}\left(K\right)\cap \mathrm{ker}\left({\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)$ in $\mathrm{ker}\left({\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)$ , i.e., id − P_Z projects $\mathrm{ker}\left({\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)$ to $\mathrm{ker}\left(K\right)\cap \mathrm{ker}\left({\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)$ . Then, by optimality and invariance of K and ${\mathrm{T}\mathrm{V}}^{{k}_{2}}$ on $\mathrm{ker}\left(K\right)\cap \mathrm{ker}\left({\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)$ we estimate

$\begin{align*}\hfill {S}_{{f}^{\delta }}\left(K{u}^{\alpha ,\delta }\right)+\left({\alpha }_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}+{\alpha }_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{\alpha ,\delta }\right)& {\leqslant}{S}_{{f}^{\delta }}\left(K{u}^{\alpha ,\delta }\right)+{\alpha }_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\left({u}^{\alpha ,\delta }\right)\hfill \\ \hfill & \quad +{\alpha }_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}\left({u}^{\alpha ,\delta }-\left(\mathrm{i}\mathrm{d}-{P}_{Z}\right)R{u}^{\alpha ,\delta }\right),\hfill \end{align*}$

which, together with lemma 3.25, norm equivalence on finite-dimensional spaces and injectivity of K on the finite-dimensional space Z, yields

$\begin{equation*}\begin{aligned}\hfill {\mathrm{T}\mathrm{V}}^{{k}_{1}}\left({u}^{\alpha ,\delta }\right)& {\leqslant}{\mathrm{T}\mathrm{V}}^{{k}_{1}}\left({u}^{\alpha ,\delta }-\left(\mathrm{i}\mathrm{d}-{P}_{Z}\right)R{u}^{\alpha ,\delta }\right){\leqslant}{\mathrm{T}\mathrm{V}}^{{k}_{1}}\left({u}^{\alpha ,\delta }-R{u}^{\alpha ,\delta }\right)+{\mathrm{T}\mathrm{V}}^{{k}_{1}}\left({P}_{Z}R{u}^{\alpha ,\delta }\right)\hfill \\ \hfill & {\leqslant}C\left({\mathrm{T}\mathrm{V}}^{{k}_{2}}\left({u}^{\alpha ,\delta }\right)+{\Vert}{P}_{Z}R{u}^{\alpha ,\delta }{{\Vert}}_{p}\right){\leqslant}C\left({\mathrm{T}\mathrm{V}}^{{k}_{2}}\left({u}^{\alpha ,\delta }\right)+{\Vert}K{P}_{Z}R{u}^{\alpha ,\delta }{{\Vert}}_{Y}\right)\hfill \\ \hfill & {\leqslant}C\left({\mathrm{T}\mathrm{V}}^{{k}_{2}}\left({u}^{\alpha ,\delta }\right)+{\Vert}K\left({u}^{\alpha ,\delta }-R{u}^{\alpha ,\delta }+{P}_{Z}R{u}^{\alpha ,\delta }\right){{\Vert}}_{Y}+{\Vert}K{\Vert}{\Vert}\left({u}^{\alpha ,\delta }-R{u}^{\alpha ,\delta }\right){{\Vert}}_{p}\right)\hfill \\ \hfill & {\leqslant}C\left({\mathrm{T}\mathrm{V}}^{{k}_{2}}\left({u}^{\alpha ,\delta }\right)+{\Vert}K{u}^{\alpha ,\delta }{{\Vert}}_{Y}\right).\hfill \end{aligned}\end{equation*}$

Now, the last expression is bounded due to boundedness of ${\mathrm{T}\mathrm{V}}^{{k}_{2}}\left({u}^{\alpha ,\delta }\right)$ and equi-coercivity of $\left\{{S}_{{f}^{\delta }}\right\}$ . Hence, $\left\{{\mathrm{T}\mathrm{V}}^{{k}_{1}}\left({u}^{\alpha ,\delta }\right)\right\}$ is always bounded and, again using the equi-coercivity of $\left\{{S}_{{f}^{\delta }}\right\}$ and the techniques in the proof of theorem 3.26, one sees that with possible shifts in $\mathrm{ker}\left(K\right)\cap {\mathbf{P}}^{{k}_{1}-1}$ , one can achieve that {u^α,δ} is bounded in BV^k(Ω). Therefore, by continuous embedding and reflexivity, it admits a weak accumulation point in L^p(Ω).

Next, let u* be a L^p-weak accumulation point associated with {δ_n}, δ_n → 0 as well as the corresponding parameters {α_n} = {(α_1,n, α_2,n)}. Then, ${S}_{{f}^{{\dagger}}}\left(K{u}^{{\ast}}\right){\leqslant}{\text{lim inf}}_{n\to \infty }{S}_{{f}^{{\delta }_{n}}}\left(K{u}^{{\alpha }_{n},{\delta }_{n}}\right)=0$ by convergence of ${S}_{{f}^{\delta }}$ to ${S}_{{f}^{{\dagger}}}$ , so Ku* = f^†. Moreover,

$\begin{align*}\hfill \left({\alpha }_{1}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{1}}+{\alpha }_{2}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{{\ast}}\right)& {\leqslant}\underset{n\to \infty }{\text{lim inf}}\enspace {\tilde {\alpha }}_{1,n}{\mathrm{T}\mathrm{V}}^{{k}_{1}}\left({u}^{{\alpha }_{n},{\delta }_{n}}\right)+\underset{n\to \infty }{\text{lim inf}}\enspace {\tilde {\alpha }}_{2,n}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\left({u}^{{\alpha }_{n},{\delta }_{n}}\right)\hfill \\ \hfill & {\leqslant}\underset{n\to \infty }{\text{lim inf}}\left({\tilde {\alpha }}_{1,n}{\mathrm{T}\mathrm{V}}^{{k}_{1}}+{\tilde {\alpha }}_{2,n}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{{\alpha }_{n},{\delta }_{n}}\right)\hfill \\ \hfill & {\leqslant}\left({\alpha }_{1}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{1}}+{\alpha }_{2}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{{\dagger}}\right),\hfill \end{align*}$

hence, u* is a minimum- $\left({\alpha }_{1}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{1}}+{\alpha }_{2}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)$ -solution. In particular,

$\begin{equation*}\left({\alpha }_{1}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{1}}+{\alpha }_{2}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{{\ast}}\right)=\left({\alpha }_{1}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{1}}+{\alpha }_{2}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{{\dagger}}\right),\end{equation*}$

so

$\begin{equation*}\underset{n\to \infty }{\mathrm{lim}}\left({\tilde {\alpha }}_{1,n}{\mathrm{T}\mathrm{V}}^{{k}_{1}}+{\tilde {\alpha }}_{2,n}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{{\alpha }_{n},{\delta }_{n}}\right)=\left({\alpha }_{1}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{1}}+{\alpha }_{2}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{{\dagger}}\right).\end{equation*}$

Finally, each sequence of {δ_n}, δ_n → 0 contains another subsequence (not relabelled) for which $\left({\tilde {\alpha }}_{1,n}{\mathrm{T}\mathrm{V}}^{{k}_{1}}+{\tilde {\alpha }}_{2,n}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{{\alpha }_{n},{\delta }_{n}}\right)\to \left({\alpha }_{1}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{1}}+{\alpha }_{2}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{{\dagger}}\right)$ as n → ∞, so we have $\left({\tilde {\alpha }}_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}+{\tilde {\alpha }}_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{\alpha ,\delta }\right)\to \left({\alpha }_{1}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{1}}+{\alpha }_{2}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{{\dagger}}\right)$ as δ → 0. □

Remark 4.5.

Theorem 4.4 shows that, with the additive combination higher-order TV functionals, the maximum of the parameters plays the role of the regularisation parameter. The regularity assumption on u₀ such that Ku₀ = f^† on the other hand depends on whether some parameters converge to zero faster than the maximum or not, i.e., it depends on the ratio of the parameters. Assuming for instance that α₂/max{α₁, α₂} → 0 leads to the weaker ${\mathrm{B}\mathrm{V}}^{{k}_{1}}$ -regularity requirement for u₀.In view of a practical parameter choice strategy, this motivates a parametrisation via (α₁, α₂) = α(λ₁, λ₂), where α > 0 is interpreted as the regularisation parameter and chosen in dependence of the noise level, and (λ₁, λ₂) with max{λ₁, λ₂} = 1 is interpreted as model parameter, which is chosen (or learned) once for a particular class of image data and then fixed independent of the concrete forward model or noise level.
Although (20) incorporates multiple orders, a solution is always contained in ${\mathrm{B}\mathrm{V}}^{{k}_{2}}\left({\Omega}\right)$ . Since k₂ ⩾ 2, this space is always contained in H^1,1(Ω), so jump discontinuities cannot appear. One can observe that for numerical solutions, this is reflected in blurry reconstructions of edges while higher-order smoothness is usually captured quite well, see figure 2(c).
Naturally, it is also possible to consider the weighted sum of more than two TV-type functionals for regularisation, i.e.,
$\begin{equation}\underset{u\in {L}^{p}\left({\Omega}\right)}{\mathrm{min}}{S}_{f}\left(Ku\right)+\left(\sum _{i=1}^{m}{\alpha }_{i}{\mathrm{T}\mathrm{V}}^{{k}_{i}}\right)\left(u\right).\end{equation} \tag{ 22 }$
with orders k₁, ..., k_m ⩾ 1 and weights α₁, ..., α_m > 0. Solutions then exist, for appropriate p, in the space BV^k(Ω) for k = max{k₁, ..., k_m}.

Optimality conditions. As for TV^k, one can also consider optimality conditions for variational problems with ${\alpha }_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}+{\alpha }_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}$ as regularisation. Again, in the case that Y is a Hilbert space, q = 2 and ${S}_{f}\left(v\right)=\frac{1}{2}{{\Vert}v-f{\Vert}}_{Y}^{2}$ , one can argue according to proposition 3.30 and obtain that u* is optimal for (20) if and only if

$\begin{equation*}{K}^{{\ast}}\left(f-K{u}^{{\ast}}\right)\in \partial \left({\alpha }_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}+{\alpha }_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{{\ast}}\right)\end{equation*}$

or, equivalently,

$\begin{equation*}{u}^{{\ast}}\in \partial \left({\left({\alpha }_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}+{\alpha }_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)}^{{\ast}}\right)\left({K}^{{\ast}}\left(f-K{u}^{{\ast}}\right)\right).\end{equation*}$

A difficulty with a further specification of these statements, however, is that it is not immediate that either the subdifferential is additive in this situation or that the dual of the sum of the ${\mathrm{T}\mathrm{V}}^{{k}_{i}}$ equals the infimal convolution of the duals (see definition 4.6 in the next subsection for a definition of the infimal convolution). A possible remedy is to consider the original minimisation problem in the space ${\mathrm{B}\mathrm{V}}^{{k}_{1}}\left({\Omega}\right)$ instead, such that ${\mathrm{T}\mathrm{V}}^{{k}_{1}}$ becomes continuous. This, however, yields subgradients in the dual of ${\mathrm{B}\mathrm{V}}^{{k}_{1}}\left({\Omega}\right)$ instead of L^p*(Ω) making the optimality conditions again difficult to interpret.

A priori estimates. In order to obtain a bound on a solution u* for a quadratic Hilbert-norm discrepancy, i.e., ${S}_{f}\left(v\right)=\frac{1}{2}{{\Vert}v-f{\Vert}}_{Y}^{2}$ , Y Hilbert space, on can proceed analogously to remark 3.31, provided that K is injective on the space ${\mathbf{P}}^{{k}_{1}-1}$ . We then also arrive at (17), with α replaced by α₁, C being the coercivity constant for ${\mathrm{T}\mathrm{V}}^{{k}_{1}}$ and c the inverse bound for K on ${\mathbf{P}}^{{k}_{1}-1}$ . Of course, in case K is still injective on the larger space ${\mathbf{P}}^{{k}_{2}-1}$ , the analogous bound can be obtained with α₂ instead of α₁ and respective constants C, c. In case of the Kullback–Leibler discrepancy, i.e., S_f(v) = KL(v, f), the analogous statements apply to the estimate (18).

Denoising performance. Figure 3 shows the effect of α₁TV + α₂TV² regularisation compared to pure TV-regularisation. While staircase artefacts are slightly reduced, the overall image is more blurry than the one obtained with TV, see figures 3(b) and (c). This is expected as additive regularisation inherits the analytical properties of the stronger regularisation term, hence α₁TV + α₂TV² is not able to recover jumps. The result is not much different when ${{\Vert}{\Delta}\cdot {\Vert}}_{\mathcal{M}}$ is used instead of TV², see figure 3(d). Nevertheless, although not discussed in this paper, the issue of limited applicability of ${{\Vert}{\Delta}\cdot {\Vert}}_{\mathcal{M}}$ for regularisation of general inverse problems, as mentioned in remark 3.33, is overcome in an additive combination with TV since the properties of TV are sufficient to guarantee well-posedness results.

**Figure 3.** Additive multi-order denoising example. (a) Noisy image (PSNR: 13.9 dB), (b) regularisation with TV (PSNR: 29.3 dB), (c) regularisation with α₁TV + α₂TV² (PSNR: 29.5 dB), (d) regularisation with ${\alpha }_{1}\mathrm{T}\mathrm{V}+{\alpha }_{2}{{\Vert}{\Delta}\cdot {\Vert}}_{\mathcal{M}}$ (PSNR: 29.4 dB). All parameters were manually tuned via grid search to give highest PSNR with respect to the ground truth (figure 1(a)).
Download figure:
Standard image High-resolution image

4.2. Multi-order infimal convolution

In order to overcome the smoothing effect of total variation of order two and higher, and additive combinations thereof, another idea would be to model an image u as the sum of a first-order part and a second order part, i.e.,

$\begin{equation*}u={u}_{1}+{u}_{2}\quad \text{with}\;{u}_{1}\in \mathrm{B}\mathrm{V}\left({\Omega}\right),{u}_{2}\in {\mathrm{B}\mathrm{V}}^{2}\left({\Omega}\right).\end{equation*}$

This has originally been proposed in [59], and different variants have subsequently been analysed in [16] and considered in [174, 175] in a discrete setting.

Obviously, such a decomposition exists for each u ∈ BV(Ω) but is, of course, not unique. The parts u₁ and u₂ are now regularised with first and second-order total variation associated with some weights α₁ > 0, α₂ > 0. The associated Tikhonov minimisation problem reads as

$\begin{equation*}\underset{\genfrac{}{}{0pt}{}{{u}_{1}\in \mathrm{B}\mathrm{V}\left({\Omega}\right),}{{u}_{2}\in {\mathrm{B}\mathrm{V}}^{2}\left({\Omega}\right)}}{\mathrm{min}}\enspace {S}_{f}\left(K\left({u}_{1}+{u}_{2}\right)\right)+{\alpha }_{1}\mathrm{T}\mathrm{V}\left({u}_{1}\right)+{\alpha }_{2}{\mathrm{T}\mathrm{V}}^{2}\left({u}_{2}\right).\end{equation*}$

As we are only interested in u, we rewrite this problem as

$\begin{equation}\underset{u\in {L}^{p}\left({\Omega}\right)}{\mathrm{min}}\enspace {S}_{f}\left(Ku\right)+\underset{\genfrac{}{}{0pt}{}{{u}_{1}\in \mathrm{B}\mathrm{V}\left({\Omega}\right),}{\genfrac{}{}{0pt}{}{{u}_{2}\in {\mathrm{B}\mathrm{V}}^{2}\left({\Omega}\right),}{u={u}_{1}+{u}_{2}}}}{\mathrm{inf}}\enspace {\alpha }_{1}\mathrm{T}\mathrm{V}\left({u}_{1}\right)+{\alpha }_{2}{\mathrm{T}\mathrm{V}}^{2}\left({u}_{2}\right).\end{equation} \tag{ 23 }$

This regularisation functional is called infimal convolution of α₁TV and α₂TV².

Definition 4.6. Let ${F}_{1},{F}_{2}:X\to \left.\right] - \infty ,\infty \left.\right]$ . Then,

$\begin{equation*}\left({F}_{1}{\triangle}{F}_{2}\right)\left(u\right)=\underset{{u}_{1}+{u}_{2}=u}{\mathrm{inf}}\enspace {F}_{1}\left({u}_{1}\right)+{F}_{2}\left({u}_{2}\right)\end{equation*}$

is the infimal convolution of F₁ and F₂.

An infimal convolution is called exact, if for each u ∈ X there is a pair u₁, u₂ ∈ X with

$\begin{equation*}{u}_{1}+{u}_{2}=u\quad \text{and}\quad {F}_{1}\left({u}_{1}\right)+{F}_{2}\left({u}_{2}\right)=\left({F}_{1}{\triangle}{F}_{2}\right)\left(u\right).\end{equation*}$

The infimal convolution may or may not be exact and may or may not be lower semi-continuous, even if both F₁, F₂ are lower semi-continuous. The next proposition, which should be compared to proposition 4.1 above, provides basic properties and the function spaces associated with infimal convolutions.

Proposition 4.7. Let |⋅|₁ and |⋅|₂ be two lower semi-continuous seminorms on the Banach space X. Then,

(a)
The functional $\left\vert \cdot \right\vert =\vert \cdot {\vert }_{1}{\triangle}\vert \cdot {\vert }_{2}$ is a seminorm on X.
(b)
We have
$\begin{equation*}\mathrm{ker}\left(\vert \cdot {\vert }_{1}\right)+\mathrm{ker}\left(\vert \cdot {\vert }_{2}\right)\subset \mathrm{ker}\left(\left\vert \cdot \right\vert \right)\end{equation*}$
with equality if |⋅|₁△|⋅|₂ is exact.
(c)
If |⋅|₁△|⋅|₂ is lower semi-continuous, then
$\begin{equation*}Y=\left\{u\in X\enspace \vert \enspace \left\vert u\right\vert {< }\infty \right\},\quad {{\Vert}u{\Vert}}_{Y}={{\Vert}u{\Vert}}_{X}+\left\vert u\right\vert \end{equation*}$
constitutes a Banach space.
(d)
With Y_i the Banach spaces arising from the norms ||⋅||_X + |⋅|_i, i = 1, 2 (see lemma 3.15),
$\begin{equation*}{Y}_{i}\hookrightarrow Y\quad \text{for}\enspace i=1,2.\end{equation*}$
(e)
It holds that ${\left(\vert \cdot {\vert }_{1}{\triangle}\vert \cdot {\vert }_{2}\right)}^{{\ast}}={\vert \cdot {\vert }_{1}}^{{\ast}}+{\vert \cdot {\vert }_{2}}^{{\ast}}$ and if |⋅|₁△|⋅|₂ is exact, then
$\begin{equation*}\partial \left({\vert \cdot {\vert }_{1}}^{{\ast}}+{\vert \cdot {\vert }_{2}}^{{\ast}}\right)=\partial {\vert \cdot {\vert }_{1}}^{{\ast}}+\partial {\vert \cdot {\vert }_{2}}^{{\ast}}\end{equation*}$

Proof. The seminorm axioms can easily be verified for $\left\vert \cdot \right\vert$ . If u = u₁ + u₂ for u_i ∈ ker(|⋅|_i), i = 1, 2, then

$\begin{equation*}\left\vert u\right\vert {\leqslant}\vert {u}_{1}{\vert }_{1}+\vert {u}_{2}{\vert }_{2}=0.\end{equation*}$

The converse inclusion follows directly from the exactness.

The third statement is a direct consequence of lemma 3.15 while the forth immediately follows from ${{\Vert}u{\Vert}}_{X}+\left\vert u\right\vert {\leqslant}{{\Vert}u{\Vert}}_{X}+\vert u{\vert }_{i}$ for i = 1, 2.

For the fifth statement, the assertion on the Fenchel dual follows by direct computation. Regarding equality of the subdifferentials, let w ∈ X* and u ∈ X such that $u\in \partial \left({\vert \cdot {\vert }_{1}}^{{\ast}}+{\vert \cdot {\vert }_{2}}^{{\ast}}\right)\left(w\right)$ . Then, the Fenchel identity yields

$\begin{equation}\langle w,\enspace u\rangle =\left(\vert \cdot {\vert }_{1}{\triangle}\vert \cdot {\vert }_{2}\right)\left(u\right)+{\vert \cdot {\vert }_{1}}^{{\ast}}\left(w\right)+{\vert \cdot {\vert }_{2}}^{{\ast}}\left(w\right)+=\vert {u}_{1}{\vert }_{1}+{\vert \cdot {\vert }_{1}}^{{\ast}}\left(w\right)+\vert {u}_{2}{\vert }_{1}+{\vert \cdot {\vert }_{2}}^{{\ast}}\left(w\right)\end{equation} \tag{ 24 }$

for the minimising u₁, u₂ ∈ X with u₁ + u₂ = u. As by the Fenchel inequality

$\begin{equation}\langle w,\enspace {u}_{1}\rangle {\leqslant}\vert {u}_{1}{\vert }_{1}+{\vert \cdot {\vert }_{1}}^{{\ast}}\left(w\right)\quad \text{and}\quad \langle w,\enspace {u}_{2}\rangle {\leqslant}\vert {u}_{2}{\vert }_{2}+{\vert \cdot {\vert }_{2}}^{{\ast}}\left(w\right),\end{equation} \tag{ 25 }$

the equation (24) can only be true when there is equality in (25). But this means, in turn, that ${u}_{1}\in \partial {\vert \cdot {\vert }_{1}}^{{\ast}}\left(w\right)$ and ${u}_{2}\in \partial {\vert \cdot {\vert }_{2}}^{{\ast}}\left(w\right)$ . Hence, $\partial \left({\vert \cdot {\vert }_{1}}^{{\ast}}+{\vert \cdot {\vert }_{2}}^{{\ast}}\right)\subset \partial {\vert \cdot {\vert }_{1}}^{{\ast}}+\partial {\vert \cdot {\vert }_{2}}^{{\ast}}$ . The other inclusion holds trivially. □

The statement (e) will be relevant for obtaining optimality conditions and we note that, as can be seen from the proof, it holds true for arbitrary convex functionals, not necessarily seminorms.

The previous proposition shows in particular that lower semi-continuity and exactness of the infimal convolution are important for obtaining an appropriate function space setting. Regarding the infimal convolution of TV functionals, this holds true on L^p-spaces as follows.

Proposition 4.8. Let Ω be a bounded Lipschitz domain, 1 ⩽ k₁ < k₂ and p ∈ [1, ∞] with p ⩽ d/(d − k₁) if k₁ < d. Then, for α = (α₁, α₂), α₁ > 0, α₂ > 0, the infimal convolution

$\begin{equation}{\mathcal{R}}_{\alpha }={\alpha }_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}{\triangle}{\alpha }_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}},\end{equation} \tag{ 26 }$

is exact and lower semi-continuous in L^p(Ω).

Proof. By continuous embedding, we may assume without loss of generality that p < ∞. Take a sequence {uⁿ} converging to some u in L^p(Ω) for which ${\text{lim inf}}_{n\to \infty }\enspace {\mathcal{R}}_{\alpha }\left({u}^{n}\right){< }\infty$ . For each n, we can select ${u}_{1}^{n},{u}_{2}^{n}\in {\mathrm{B}\mathrm{V}}^{{k}_{1}}\left({\Omega}\right)$ such that ${u}^{n}={u}_{1}^{n}+{u}_{2}^{n}$ ,

$\begin{equation*}{\alpha }_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}\left({u}_{1}^{n}\right)+{\alpha }_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\left({u}_{2}^{n}\right){\leqslant}\left({\alpha }_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}{\triangle}{\alpha }_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{n}\right)+\frac{1}{n},\end{equation*}$

and ${u}_{1}^{n}$ is in the complement of $\mathrm{ker}\left({\mathrm{T}\mathrm{V}}^{{k}_{1}}\right)$ in the sense that $R{u}_{1}^{n}=0$ for $R:{L}^{p}\left({\Omega}\right)\to \mathrm{ker}\left({\mathrm{T}\mathrm{V}}^{{k}_{1}}\right)$ a linear and continuous projection. The latter condition can always be satisfied since both ${\mathrm{T}\mathrm{V}}^{{k}_{1}}$ and ${\mathrm{T}\mathrm{V}}^{{k}_{2}}$ are invariant on $\mathrm{ker}\left({\mathrm{T}\mathrm{V}}^{{k}_{1}}\right)$ . Now, by coercivity of ${\mathrm{T}\mathrm{V}}^{{k}_{1}}$ as in corollary 3.23, we get that $\left\{{u}_{1}^{n}\right\}$ is bounded in ${\mathrm{B}\mathrm{V}}^{{k}_{1}}\left({\Omega}\right)$ . Hence, by the embedding of ${\mathrm{B}\mathrm{V}}^{{k}_{1}}\left({\Omega}\right)$ in either L^∞(Ω) or ${L}^{d/\left(d-{k}_{1}\right)}\left({\Omega}\right)$ in case of k₁ < d as in theorem 3.20, the choice of p and convergence of {uⁿ} in L^p(Ω), we can extract (non-relabelled) subsequences of $\left\{{u}_{1}^{n}\right\}$ and $\left\{{u}_{2}^{n}\right\}$ converging weakly to some u₁ and u₂ in L^p(Ω), respectively, such that u = u₁ + u₂. Thus, lower semi-continuity of both ${\mathrm{T}\mathrm{V}}^{{k}_{1}}$ and ${\mathrm{T}\mathrm{V}}^{{k}_{2}}$ implies

$\begin{align*}\hfill \left({\alpha }_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}{\triangle}{\alpha }_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left(u\right)& {\leqslant}{\alpha }_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}\left({u}_{1}\right)+{\alpha }_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\left({u}_{2}\right)\hfill \\ \hfill & {\leqslant}\underset{n\to \infty }{\text{lim inf}}\enspace \left({\alpha }_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}\left({u}_{1}^{n}\right)+{\alpha }_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\left({u}_{2}^{n}\right)\right)\hfill \\ \hfill & =\underset{n\to \infty }{\text{lim inf}}\enspace \left({\alpha }_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}{\triangle}{\alpha }_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{n}\right)\hfill \end{align*}$

such that lower semi-continuity holds. Finally, exactness for u ∈ L^p(Ω) with ${\mathcal{R}}_{\alpha }\left(u\right){< }\infty$ follows from choosing {uⁿ} as the sequence that is constant u. □

Given this, the special case $\vert \cdot {\vert }_{i}={\alpha }_{i}{\mathrm{T}\mathrm{V}}^{{k}_{i}}$ and X = L¹(Ω) of proposition 4.7 shows that both ${\mathrm{B}\mathrm{V}}^{{k}_{1}}\left({\Omega}\right)$ and ${\mathrm{B}\mathrm{V}}^{{k}_{2}}\left({\Omega}\right)$ are embedded in the Banach space Y. Hence, in contrast to the sum of different TV terms, their infimal convolution allows to recover jumps whenever k₁ = 1, independent of k₂. In fact, as the following proposition shows, the space Y is even equivalent to the BV space corresponding to the lowest order, in particular to BV(Ω) for k₁ = 1. Again, the result should be compared to theorem 4.2 above.

Theorem 4.9. Let 1 ⩽ k₁ < k₂, α₁ > 0, α₂ > 0, Ω be a bounded Lipschitz domain, and Y be the Banach space associated with X = L¹(Ω) and total-variation infimal convolution according to (26). Then,

$\begin{equation*}Y={\mathrm{B}\mathrm{V}}^{{k}_{1}}\left({\Omega}\right)\end{equation*}$

in the sense of Banach space equivalence, and for p ∈ [1, ∞], p ⩽ d/(d − k₁) if k₁ < d, and for $R:{L}^{p}\left({\Omega}\right)\to \mathrm{ker}\left({\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)$ a linear, continuous projection there exists a C > 0 such that

$\begin{equation}{{\Vert}u-Ru{\Vert}}_{p}{\leqslant}C\mathrm{min}{\left\{{\alpha }_{1},{\alpha }_{2}\right\}}^{-1}\left({\alpha }_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}{\triangle}{\alpha }_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left(u\right)\end{equation} \tag{ 27 }$

for all u ∈ L^p(Ω).

Proof. We first show the claimed norm equivalence. For this purpose, note that one estimate corresponds to the fourth statement in proposition 4.7.

For the converse estimate, let $u\in {\mathrm{B}\mathrm{V}}^{{k}_{1}}\left({\Omega}\right)$ and $R:{L}^{1}\left({\Omega}\right)\to \mathrm{ker}\left({\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)$ be a projection. Then,

$\begin{equation}{\mathrm{T}\mathrm{V}}^{{k}_{1}}\left(u\right){\leqslant}C\left({{\Vert}u{\Vert}}_{1}+{\mathrm{T}\mathrm{V}}^{{k}_{1}}\left(u-Rw\right)\right)\end{equation} \tag{ 28 }$

for C independent of $u,w\in {\mathrm{B}\mathrm{V}}^{{k}_{1}}\left({\Omega}\right)$ . Indeed, if for {uⁿ} and {wⁿ} we have ${\mathrm{T}\mathrm{V}}^{{k}_{1}}\left({u}^{n}\right)=1$ and ${{\Vert}{u}^{n}{\Vert}}_{1}\to 0$ as well as ${\mathrm{T}\mathrm{V}}^{{k}_{1}}\left({u}^{n}-R{w}^{n}\right)\to 0$ , meaning uⁿ → 0 in L¹(Ω) and ${\nabla }^{{k}_{1}}\left({u}^{n}-R{w}^{n}\right)\to 0$ in $\mathcal{M}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{{k}_{1}}\left({\mathbf{R}}^{d}\right)\right)$ . The latter implies that $\left\{{\nabla }^{{k}_{1}}R{w}^{n}\right\}$ is bounded in a finite-dimensional space, hence there is a convergent subsequence (not relabelled) with limit $v\in \mathcal{M}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{{k}_{1}}\left({\mathbf{R}}^{d}\right)\right)$ . Then, ${\nabla }^{{k}_{1}}{u}^{n}\to v$ in $\mathcal{M}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{{k}_{1}}\left({\mathbf{R}}^{d}\right)\right)$ and the closedness of ${\nabla }^{{k}_{1}}$ yields v = 0 which is a contradiction to ${\mathrm{T}\mathrm{V}}^{{k}_{1}}\left({u}^{n}\right)=1$ for all n.

Using this, together with the estimate

$\begin{equation}{\mathrm{T}\mathrm{V}}^{{k}_{1}}\left(w-Rw\right){\leqslant}C{\mathrm{T}\mathrm{V}}^{{k}_{2}}\left(w\right)\end{equation} \tag{ 29 }$

from lemma 3.25, it holds for $u\in {\mathrm{B}\mathrm{V}}^{{k}_{1}}\left({\Omega}\right)$ and $w\in {\mathrm{B}\mathrm{V}}^{{k}_{2}}\left({\Omega}\right)$ that

$\begin{align*}\hfill {\mathrm{T}\mathrm{V}}^{{k}_{1}}\left(u\right)& {\leqslant}C\left({{\Vert}u{\Vert}}_{1}+{\mathrm{T}\mathrm{V}}^{{k}_{1}}\left(u-Rw\right)\right)\hfill \\ \hfill & {\leqslant}C\left({{\Vert}u{\Vert}}_{1}+{\mathrm{T}\mathrm{V}}^{{k}_{1}}\left(u-w\right)+{\mathrm{T}\mathrm{V}}^{{k}_{1}}\left(w-Rw\right)\right)\hfill \\ \hfill & {\leqslant}C\left({{\Vert}u{\Vert}}_{1}+{\mathrm{T}\mathrm{V}}^{{k}_{1}}\left(u-w\right)+{\mathrm{T}\mathrm{V}}^{{k}_{2}}\left(w\right)\right).\hfill \end{align*}$

Taking the infimum over all $w\in {\mathrm{B}\mathrm{V}}^{{k}_{2}}\left({\Omega}\right)$ , adding ||u||₁ on both sides as well as observing that ${\mathrm{T}\mathrm{V}}^{{k}_{1}}{\triangle}{\mathrm{T}\mathrm{V}}^{{k}_{2}}{\leqslant}\mathrm{min}{\left\{{\alpha }_{1},{\alpha }_{2}\right\}}^{-1}\left({\alpha }_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}{\triangle}{\alpha }_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)$ yields

$\begin{equation*}{{\Vert}u{\Vert}}_{1}+{\mathrm{T}\mathrm{V}}^{{k}_{1}}\left(u\right){\leqslant}C\left({{\Vert}u{\Vert}}_{1}+\mathrm{min}{\left\{{\alpha }_{1},{\alpha }_{2}\right\}}^{-1}\left({\alpha }_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}{\triangle}{\alpha }_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left(u\right)\right),\end{equation*}$

and, consequently, the desired norm estimate. Likewise, the estimate ${\Vert}u-Ru{{\Vert}}_{p}{\leqslant}C\left({\mathrm{T}\mathrm{V}}^{{k}_{1}}{\triangle}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left(u\right)$ follows in analogy to proposition 3.22 and corollary 3.23, which immediately gives the claimed estimate for arbitrary α₁ > 0, α₂ > 0. □

Tikhonov regularisation. Again, the second estimate in theorem 4.9 is crucial as it allows to apply the well-posedness result of theorem 3.26.

Proposition 4.10. With X = L^p(Ω), $p\in \left.\right] 1,\infty \left[\right.$ , Ω being a bounded Lipschitz domain, Y a Banach space, K : X → Y linear and continuous, S_f : Y → [0, ∞] proper, convex lower semi-continuous and coercive, 1 ⩽ k₁ < k₂, α₁ > 0, α₂ > 0, the Tikhonov minimisation problem

$\begin{equation}\underset{u\in {L}^{p}\left({\Omega}\right)}{\mathrm{min}}\enspace {S}_{f}\left(Ku\right)+\left({\alpha }_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}{\triangle}{\alpha }_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left(u\right)\end{equation} \tag{ 30 }$

is well-posed in the sense of theorem 3.26 whenever p ⩽ d/(d − k₁) if k₁ < d.

Compared to the sum of different TV terms, we see that now the necessary coercivity estimate incorporates a projection to the larger kernel of ${\mathrm{T}\mathrm{V}}^{{k}_{2}}$ and an L^p norm with a smaller exponent corresponding to ${\mathrm{T}\mathrm{V}}^{{k}_{1}}$ . Hence, in view of the assumptions of theorem 3.26, the infimal convolution of ${\mathrm{T}\mathrm{V}}^{{k}_{1}}$ and ${\mathrm{T}\mathrm{V}}^{{k}_{2}}$ inherits the worst properties of the two summands, i.e., the ones that are more restrictive for applications in an inverse problems context. Nevertheless, such a slightly more restrictive assumption on the continuity of the forward operator is compensated by the fact that the infimal convolution with k₁ = 1 allows to reconstruct jumps. In addition, each solution u* of a Tikhonov functional admits an optimal decomposition ${u}^{{\ast}}={u}_{1}^{{\ast}}+{u}_{2}^{{\ast}}$ with ${u}_{i}^{{\ast}}\in {\mathrm{B}\mathrm{V}}^{{k}_{i}}\left({\Omega}\right)$ , i = 1, 2, which follows from the exactness of the infimal convolution.

Regarding the convergence result of theorem 3.28 and the rates of proposition 3.29, again a direct extension to regularisation with ${\alpha }_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}{\triangle}{\alpha }_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}$ can be obtained by regarding the weights α₁, α₂ to be fixed and introducing an additional factor α > 0 for both terms, which then acts as the regularisation parameter. Considering the limiting behaviour for both weights converging to zero, a counterpart of theorem 4.4 can be obtained as follows. There, we allow also for infinite weights α_i, i.e., for α_i = ∞, we set ${\alpha }_{i}{\mathrm{T}\mathrm{V}}^{{k}_{i}}\left(u\right)=0$ if $u\in \mathrm{ker}\left({\mathrm{T}\mathrm{V}}^{{k}_{i}}\right)$ and ${\alpha }_{i}{\mathrm{T}\mathrm{V}}^{{k}_{i}}\left(u\right)=\infty$ else. We first need a lower semi-continuity result.

Lemma 4.11. Let Ω be a bounded Lipschitz domain, $p\in \left[\right.1,\infty \left[\right.$ with 1 ⩽ p ⩽ d/(d − k₁) if k₁ < d, {(α_1,n, α_2,n)} be a sequence of positive parameters converging to some $\left({\alpha }_{1}^{{\dagger}},{\alpha }_{2}^{{\dagger}}\right)\in { \left.\right]0,\infty \left.\right]}^{2}$ and {uⁿ} be a sequence in L^p(Ω) weakly converging to u* ∈ L^p(Ω). Then,

$\begin{equation*}\left({\alpha }_{1}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{1}}{\triangle}{\alpha }_{2}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{{\ast}}\right){\leqslant}\underset{n\to \infty }{\mathrm{lim inf}}\enspace \left({\alpha }_{1,n}{\mathrm{T}\mathrm{V}}^{{k}_{1}}{\triangle}{\alpha }_{2,n}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{n}\right).\end{equation*}$

Proof. By moving to a subsequence, we can assume that $\left({\alpha }_{1,n}{\mathrm{T}\mathrm{V}}^{{k}_{1}}{\triangle}{\alpha }_{2,n}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{n}\right)$ converges to the limes inferior on the right-hand side of the claimed assertion and that the latter is finite. Choose $\left\{{u}_{1}^{n}\right\}$ , $\left\{{u}_{2}^{n}\right\}$ sequences such that for each n, we have ${u}_{1}^{n}+{u}_{2}^{n}={u}^{n}$ , $\left({\alpha }_{1,n}{\mathrm{T}\mathrm{V}}^{{k}_{1}}{\triangle}{\alpha }_{2,n}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{n}\right)={\alpha }_{1,n}{\mathrm{T}\mathrm{V}}^{{k}_{1}}\left({u}_{1}^{n}\right)+{\alpha }_{2,n}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\left({u}_{2}^{n}\right)$ and ${u}_{1}^{n}$ being in a complement of $\mathrm{ker}\left({\mathrm{T}\mathrm{V}}^{{k}_{1}}\right)$ in the sense that ${u}_{1}^{n}\in \mathrm{ker}\left(R\right)$ for a linear, continuous projection $R:{L}^{p}\left({\Omega}\right)\to \mathrm{ker}\left({\mathrm{T}\mathrm{V}}^{{k}_{1}}\right)$ . Setting ${\hat{\alpha }}_{i}=\mathrm{inf}\enspace \left\{{\alpha }_{i,n}\right\}{ >}0$ , we obtain

$\begin{equation*}{\hat{\alpha }}_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}\left({u}_{1}^{n}\right)+{\hat{\alpha }}_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\left({u}_{2}^{n}\right){\leqslant}{\alpha }_{1,n}{\mathrm{T}\mathrm{V}}^{{k}_{1}}\left({u}_{1}^{n}\right)+{\alpha }_{2,n}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\left({u}_{2}^{n}\right).\end{equation*}$

In particular, for a constant C > 0 it holds that

$\begin{equation*}{{\Vert}{u}_{1}^{n}{\Vert}}_{p}{\leqslant}C{\hat{\alpha }}_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}\left({u}_{1}^{n}\right){\leqslant}C\left({\alpha }_{1,n}{\mathrm{T}\mathrm{V}}^{{k}_{1}}\left({u}_{1}^{n}\right)+{\alpha }_{2,n}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\left({u}_{2}^{n}\right)\right),\end{equation*}$

which implies that $\left\{{u}_{1}^{n}\right\}$ is bounded in ${\mathrm{B}\mathrm{V}}^{{k}_{1}}\left({\Omega}\right)$ . By the embedding of theorem 3.20 and since {uⁿ} is convergent, both $\left\{{u}_{1}^{n}\right\}$ and $\left\{{u}_{2}^{n}\right\}$ admit subsequences (not relabelled) weakly converging to some ${u}_{1}^{{\ast}}$ and ${u}_{2}^{{\ast}}$ in L^p(Ω), respectively. Now, in case ${\alpha }_{i}^{{\dagger}}{< }\infty$ , we can conclude

$\begin{equation*}{\alpha }_{i}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{i}}\left({u}_{i}^{{\ast}}\right){\leqslant}\underset{n\to \infty }{\text{lim inf}}\enspace {\alpha }_{i,n}{\mathrm{T}\mathrm{V}}^{{k}_{i}}\left({u}_{i}^{n}\right).\end{equation*}$

Otherwise, we get by boundedness of ${\alpha }_{i,n}{\mathrm{T}\mathrm{V}}^{{k}_{i}}\left({u}_{i}^{n}\right)$ that ${\mathrm{T}\mathrm{V}}^{{k}_{i}}\left({u}_{i}^{n}\right)\to 0$ and by lower semi-continuity that ${\mathrm{T}\mathrm{V}}^{{k}_{i}}\left({u}_{i}^{{\ast}}\right)=0$ . Together, this implies

$\begin{align*}\hfill \left({\alpha }_{1}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{1}}{\triangle}{\alpha }_{2}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{{\ast}}\right)& {\leqslant}{\alpha }_{1}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{1}}\left({u}_{1}^{{\ast}}\right)+{\alpha }_{2}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\left({u}_{2}^{{\ast}}\right)\hfill \\ \hfill & {\leqslant}\underset{n\to \infty }{\text{lim inf}}\enspace \left({\alpha }_{1,n}{\mathrm{T}\mathrm{V}}^{{k}_{1}}\left({u}_{1}^{n}\right)+{\alpha }_{2,n}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\left({u}_{2}^{n}\right)\right)\hfill \\ \hfill & =\underset{n\to \infty }{\text{lim inf}}\enspace \left({\alpha }_{1,n}{\mathrm{T}\mathrm{V}}^{{k}_{1}}{\triangle}{\alpha }_{2,n}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{n}\right).\hfill \end{align*}$

This implies the desired statement. □

Theorem 4.12. In the situation of proposition 4.10 and for $p\in \left.\right] 1,\infty \left[\right.$ with p ⩽ d/(d − k₁) in case of k₁ < d, let for each δ > 0 the data f^δ be given such that ${S}_{{f}^{\delta }}\left({f}^{{\dagger}}\right){\leqslant}\delta$ , let $\left\{{S}_{{f}^{\delta }}\right\}$ be equi-coercive and converge to ${S}_{{f}^{{\dagger}}}$ for some data f^† in Y in the sense of (4) and ${S}_{{f}^{{\dagger}}}\left(v\right)=0$ if and only if v = f^†.

Choose the parameters α = (α₁, α₂) in dependence of δ such that

$\begin{equation*}\mathrm{min}\left\{{\alpha }_{1},{\alpha }_{2}\right\}\to 0,\quad \frac{\delta }{\mathrm{min}\left\{{\alpha }_{1},{\alpha }_{2}\right\}}\to 0,\quad \text{as}\enspace \delta \to 0,\end{equation*}$

and assume that $\left({\tilde {\alpha }}_{1},{\tilde {\alpha }}_{2}\right)=\left({\alpha }_{1},{\alpha }_{2}\right)/\mathrm{min}\left\{{\alpha }_{1},{\alpha }_{2}\right\}\to \left({\alpha }_{1}^{{\dagger}},{\alpha }_{2}^{{\dagger}}\right)\in {\left.\right]0,\infty \left.\right]}^{2}$ as δ → 0. Set

$\begin{equation*}k=\begin{cases}_{1}\hfill & \text{if}\enspace {\alpha }_{1}^{{\dagger}}{< }\infty ,\hfill \\ {k}_{2}\hfill & \text{else},\hfill \end{cases}\end{equation*}$

and assume that there exists u₀ ∈ BV^k(Ω) such that Ku₀ = f^†.

Then, up to shifts in $\mathrm{ker}\left(K\right)\cap {\mathbf{P}}^{{k}_{2}-1}$ , any sequence {u^α,δ}, with each u^α,δ being a solution to (30) for parameters (α₁, α₂) and data f^δ, has at least one L^p-weak accumulation point. Each L^p-weak accumulation point is a minimum- $\left({\alpha }_{1}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{1}}{\triangle}{\alpha }_{2}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)$ -solution of Ku = f^† and ${\mathrm{lim}}_{\delta \to 0}\left({\tilde {\alpha }}_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}{\triangle}{\tilde {\alpha }}_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{\alpha ,\delta }\right)=\left({\alpha }_{1}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{1}}{\triangle}{\alpha }_{2}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{{\dagger}}\right)$ .

Proof. First note that, with $R:{L}^{p}\left({\Omega}\right)\to \mathrm{ker}\left({\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)$ a linear, continuous projection, for any u ∈ L^p(Ω), we have

$\begin{equation*}{\Vert}u-Ru{{\Vert}}_{p}{\leqslant}C\left({\mathrm{T}\mathrm{V}}^{{k}_{1}}{\triangle}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left(u\right){\leqslant}C\left({\alpha }_{1}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{1}}{\triangle}{\alpha }_{2}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left(u\right),\end{equation*}$

and by the choice of k as well as u₀ ∈ BV^k(Ω), that $\left({\alpha }_{1}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{1}}{\triangle}{\alpha }_{2}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}_{0}\right){< }\infty$ . Hence, as a consequence of theorem 3.26, there exists a minimum- $\left({\alpha }_{1}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{1}}{\triangle}{\alpha }_{2}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)$ -solution u^† ∈ BV^k(Ω) to Ku = f^†. Using optimality of u^α,δ compared to u^† gives

$\begin{equation*}{S}_{{f}^{\delta }}\left(K{u}^{\alpha ,\delta }\right)+\left({\alpha }_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}{\triangle}{\alpha }_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{\alpha ,\delta }\right){\leqslant}\delta +\left({\alpha }_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}{\triangle}{\alpha }_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{{\dagger}}\right).\end{equation*}$

Now since $\left({\alpha }_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}{\triangle}{\alpha }_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{{\dagger}}\right){\leqslant}{\mathrm{min}}_{i=1,2}\left\{{\alpha }_{i}{\mathrm{T}\mathrm{V}}^{{k}_{i}}\left({u}^{{\dagger}}\right)\right\}$ and min{α₁, α₂} → 0 as δ → 0, we have that ${S}_{{f}^{\delta }}\left(K{u}^{\alpha ,\delta }\right)\to 0$ as δ → 0. Moreover, as also δ/min{α₁, α₂} → 0, it follows that

$\begin{align*}\hfill \underset{\delta \to 0}{\text{lim sup}}\enspace \left({\tilde {\alpha }}_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}{\triangle}{\tilde {\alpha }}_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{\alpha ,\delta }\right)& {\leqslant}\underset{\delta \to 0}{\text{lim sup}}\enspace \left({\tilde {\alpha }}_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}{\triangle}{\tilde {\alpha }}_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{{\dagger}}\right)\hfill \\ \hfill & {\leqslant}\left(1+\varepsilon \right)\left({\alpha }_{1}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{1}}{\triangle}{\alpha }_{2}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{{\dagger}}\right)\hfill \end{align*}$

for ɛ > 0 independent of u^α,δ and letting ɛ → 0, we obtain

$\begin{equation*}\underset{\delta \to 0}{\text{lim sup}}\left({\tilde {\alpha }}_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}{\triangle}{\tilde {\alpha }}_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{\alpha ,\delta }\right){\leqslant}\left({\alpha }_{1}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{1}}{\triangle}{\alpha }_{2}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{{\dagger}}\right).\end{equation*}$

In particular, using (27), we can conclude that {u^α,δ − Ru^α,δ} is bounded in L^p(Ω). By introducing appropriate shifts in $\mathrm{ker}\left(K\right)\cap {\mathbf{P}}^{{k}_{2}-1}$ as done in theorem 3.26 and using the equi-coercivity of $\left\{{S}_{{f}^{\delta }}\right\}$ , one can then achieve that {u^α,δ} is bounded in L^p(Ω) such that by reflexivity, it admits a L^p-weak accumulation point.

Next, let u* be a L^p-weak accumulation point associated with {δ_n}, δ_n → 0 as well as the corresponding parameters {α_n} = {(α_1,n, α_2,n)}. Then, ${S}_{{f}^{{\dagger}}}\left(K{u}^{{\ast}}\right){\leqslant}{\text{lim inf}}_{n\to \infty }\enspace {S}_{{f}^{{\delta }_{n}}}\left(K{u}^{{\alpha }_{n},{\delta }_{n}}\right)=0$ by convergence of ${S}_{{f}^{\delta }}$ to ${S}_{{f}^{{\dagger}}}$ , so Ku* = f^†. Moreover, employing lemma 4.11, we get

$\begin{align*}\hfill \left({\alpha }_{1}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{1}}{\triangle}{\alpha }_{2}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{{\ast}}\right)& {\leqslant}\underset{n\to \infty }{\text{lim inf}}\enspace \left({\tilde {\alpha }}_{1,n}{\mathrm{T}\mathrm{V}}^{{k}_{1}}{\triangle}{\tilde {\alpha }}_{2,n}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{{\alpha }_{n},{\delta }_{n}}\right)\hfill \\ \hfill & {\leqslant}\left({\alpha }_{1}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{1}}{\triangle}{\alpha }_{2}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{{\dagger}}\right),\hfill \end{align*}$

hence, u* is a minimum- $\left({\alpha }_{1}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{1}}{\triangle}{\alpha }_{2}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)$ -solution. The remaining assertions follow as in the proof of theorem 4.4 by replacing the sum with the infimal convolution. □

Remark 4.13.

Compared to the convergence result for the sum of higher-order TV functionals as in theorem 4.4, we see that now the minimum of the parameters plays the role of the regularisation parameter, but again the ratio of the parameters defines the required regularity assumption on u₀ such that Ku₀ = f^†. For a parameter choice in practice, this again motivates the choice (α₁, α₂) = α(λ₁, λ₂) as in remark 4.5, with α > 0 being interpreted as regularisation parameter and (λ₁, λ₂) with min{λ₁, λ₂} = 1 being interpreted as model parameter.
It is also possible to construct infimal convolutions of more than two TV-type functionals and, of course, other functionals than TV^k.
Introducing orders k₁, ..., k_m ⩾ 1 and weights α₁, ..., α_m > 0, one can consider
$\begin{equation}\underset{u\in {L}^{p}\left({\Omega}\right)}{\mathrm{min}}{S}_{f}\left(Ku\right)+\left({{\triangle}}_{i=1}^{m}{\alpha }_{i}{\mathrm{T}\mathrm{V}}^{{k}_{i}}\right)\left(u\right).\end{equation} \tag{ 31 }$
Solutions then exist, for appropriate p, in the space BV^k(Ω) for k = min{k₁, ..., k_m}.
The latter is in contrast to the multi-order TV regularisation (20) where the solution space is determined by the highest effective order of differentiation. Letting k_i = 1 for some i, the solution space is then BV(Ω) which allows for discontinuities; a desirable property for image restoration.

Optimality conditions. Again, in the situation that Y is a Hilbert space, q = 2 and ${S}_{f}\left(v\right)=\frac{1}{2}{{\Vert}v-f{\Vert}}_{Y}^{2}$ , we obtain some first-order optimality conditions. Noting that the dual of the infimal convolution of two functions is the sum of the respective duals, and arguing according to proposition 3.30, an u* is optimal for (30) if and only if

$\begin{equation*}{u}^{{\ast}}\in \partial \left({\left({\alpha }_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}\right)}^{{\ast}}+{\left({\alpha }_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)}^{{\ast}}\right)\left({K}^{{\ast}}\left(f-K{u}^{{\ast}}\right)\right).\end{equation*}$

By proposition 4.7, the subgradients are additive, so in terms of the normal cones introduced in proposition 3.30, the optimality condition reads as

$\begin{equation}{u}^{{\ast}}\in {\mathcal{N}}_{{\mathrm{T}\mathrm{V}}^{{k}_{1}}}\left(\frac{{K}^{{\ast}}\left(f-K{u}^{{\ast}}\right)}{{\alpha }_{1}}\right)+{\mathcal{N}}_{{\mathrm{T}\mathrm{V}}^{{k}_{2}}}\left(\frac{{K}^{{\ast}}\left(f-K{u}^{{\ast}}\right)}{{\alpha }_{2}}\right).\end{equation} \tag{ 32 }$

A priori estimates. Also here, in the above Hilbert space situation, i.e., ${S}_{f}\left(v\right)=\frac{1}{2}{{\Vert}v-f{\Vert}}_{Y}^{2}$ and Y Hilbert space, an a-priori bound of solutions u* can be derived thanks to the coercivity estimate (27). One indeed has ${{\Vert}u-Ru{\Vert}}_{p}{\leqslant}\frac{1}{2\mathrm{min}\left\{{\alpha }_{1},{\alpha }_{2}\right\}}C{{\Vert}f{\Vert}}_{Y}^{2}$ with R and C coming from (27). Hence, assuming that K is injective on ${\mathbf{P}}^{{k}_{2}-1}$ , which leads to c||Ru||_p ⩽ ||KRu||_Y for all u and some c > 0, one proceeds analogously to remark 3.31 to obtain the bound (17) with α replaced by min{α₁, α₂}. By analogy, for S_f(v) = KL(v, f) being the Kullback–Leibler discrepancy, an a priori estimate of the type (18) follows.

Moreover, it is possible to control w* up to ${\mathbf{P}}^{{k}_{1}-1}$ whenever $\left({\alpha }_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}{\triangle}{\alpha }_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{{\ast}}\right)={\alpha }_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}\left({u}^{{\ast}}-{w}^{{\ast}}\right)+{\alpha }_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\left({w}^{{\ast}}\right)$ . Let, in the following, C_f ⩾ 0 be an a priori estimate for the optimal functional value, for instance, ${C}_{f}=\frac{1}{2}{{\Vert}f{\Vert}}_{Y}^{2}$ in case of ${S}_{f}\left(v\right)=\frac{1}{2}{{\Vert}v-f{\Vert}}_{Y}^{2}$ , and ${C}_{f}=\mathrm{K}\mathrm{L}\left(K{u}^{0},f\right)+\left({\alpha }_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}{\triangle}{\alpha }_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{0}\right)$ for a ${u}^{0}\in {\mathrm{B}\mathrm{V}}^{{k}_{1}}\left({\Omega}\right)$ with KL(Ku⁰, f) < ∞ in case of S_f(v) = KL(v, f). Further, denoting by $\tilde {C}{ >}0$ a constant such that ${\mathrm{T}\mathrm{V}}^{{k}_{1}}\left(u\right){\leqslant}\tilde {C}\left({{\Vert}u{\Vert}}_{1}+\mathrm{min}{\left\{{\alpha }_{1},{\alpha }_{2}\right\}}^{-1}\left({\alpha }_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}{\triangle}{\alpha }_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left(u\right)\right)$ for all $u\in {\mathrm{B}\mathrm{V}}^{{k}_{1}}\left(u\right)$ (which exists by virtue of the norm equivalence in theorem 4.9), we see that ${\mathrm{T}\mathrm{V}}^{{k}_{1}}\left({u}^{{\ast}}\right){\leqslant}\tilde {C}\left({\left\vert {\Omega}\right\vert }^{1/p}{{\Vert}{u}^{{\ast}}{\Vert}}_{p}+\mathrm{min}{\left\{{\alpha }_{1},{\alpha }_{2}\right\}}^{-1}{C}_{f}\right)$ , hence

$\begin{equation*}{\alpha }_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}\left({w}^{{\ast}}\right){\leqslant}{\alpha }_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}\left({u}^{{\ast}}-{w}^{{\ast}}\right)+{\alpha }_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}\left({u}^{{\ast}}\right)+{\alpha }_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\left({w}^{{\ast}}\right){\leqslant}{\alpha }_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}\left({u}^{{\ast}}\right)+{C}_{f}.\end{equation*}$

Consequently, we obtain the bound

$\begin{equation}{\mathrm{T}\mathrm{V}}^{{k}_{1}}\left({w}^{{\ast}}\right){\leqslant}\tilde {C}{\left\vert {\Omega}\right\vert }^{1/p}{{\Vert}{u}^{{\ast}}{\Vert}}_{p}+\frac{\left(\tilde {C}+1\right){C}_{f}}{\mathrm{min}\left\{{\alpha }_{1},{\alpha }_{2}\right\}},\end{equation} \tag{ 33 }$

which gives an a priori estimate when plugging in the already-obtained bound on ${{\Vert}{u}^{{\ast}}{\Vert}}_{p}$ . Moreover, this estimate implies a bound on ${{\Vert}{w}^{{\ast}}-R{w}^{{\ast}}{\Vert}}_{p}$ by the Poincaré–Wirtinger inequality. However, the norm of w* can not fully be controlled since adding an element in ${\mathbf{P}}^{{k}_{1}-1}=\mathrm{ker}\left({\mathrm{T}\mathrm{V}}^{{k}_{1}}\right)$ to w* would still realise the infimum in the infimal convolution. Thus, an estimate of the type (33) is the best one could except in the considered setting.

Denoising performance. Figure 4 shows that it is indeed beneficial for denoising to regularise with α₁TV△α₂TV² compared to pure TV-regularisation: higher-order features as well as edges are recognised by this image model. Nevertheless, staircase artefacts are still present, see figure 4(c). Essentially, this does not change when the second-order component of the infimal convolution is replaced, for instance by ${{\Vert}{\Delta}\cdot {\Vert}}_{\mathcal{M}}$ as in remark 3.33, see figure 4(d). (For the latter penalty functional, basically the same problems as the ones mentioned in remark 3.33 appear.)

**Figure 4.** Infimal-convolution denoising example. (a) Noisy image (PSNR: 13.9 dB), (b) regularisation with TV (PSNR: 29.3 dB), (c) regularisation with α₁TV△α₂TV² (PSNR: 28.9 dB), (d) regularisation with ${\alpha }_{1}\mathrm{T}\mathrm{V}{\triangle}{\alpha }_{2}{{\Vert}{\Delta}\cdot {\Vert}}_{\mathcal{M}}$ (PSNR: 28.6 dB). All parameters were manually tuned via grid search to give highest PSNR with respect to the ground truth (figure 1(a)).
Download figure:
Standard image High-resolution image

5. Total generalised variation (TGV)

5.1. Basic concepts

As a motivation for TGV, consider the formal predual ball associated with the infimal convolution ${\mathcal{R}}_{\alpha }={\alpha }_{1}\mathrm{T}\mathrm{V}{\triangle}{\alpha }_{0}{\mathrm{T}\mathrm{V}}^{2}$ for α = (α₀, α₁), α₀, α₁ > 0. Then

$\begin{equation*}{\mathcal{R}}_{\alpha }^{{\ast}}={\left({\alpha }_{1}\mathrm{T}\mathrm{V}\right)}^{{\ast}}+{\left({\alpha }_{0}{\mathrm{T}\mathrm{V}}^{2}\right)}^{{\ast}}={\mathcal{I}}_{\overline{\mathcal{B}}}\quad \text{with}\quad \overline{\mathcal{B}}={\alpha }_{1}\overline{{\mathcal{B}}_{\mathrm{T}\mathrm{V}}}\cap {\alpha }_{0}\overline{{\mathcal{B}}_{{\mathrm{T}\mathrm{V}}^{2}}}\end{equation*}$

and

$\begin{align*}\hfill {\alpha }_{1}{\mathcal{B}}_{\mathrm{T}\mathrm{V}}& =\left\{\mathrm{d}\mathrm{i}\mathrm{v}\enspace {\varphi }_{1}\enspace \vert \enspace {\varphi }_{1}\in {\mathcal{C}}_{\mathrm{c}}^{1}\left({\Omega},{\mathbf{R}}^{d}\right),\enspace {{\Vert}{\varphi }_{1}{\Vert}}_{\infty }{\leqslant}{\alpha }_{1}\right\},\hfill \\ \hfill {\alpha }_{0}{\mathcal{B}}_{{\mathrm{T}\mathrm{V}}^{2}}& =\left\{{\mathrm{d}\mathrm{i}\mathrm{v}}^{2}{\varphi }_{2}\enspace \vert \enspace {\varphi }_{2}\in {\mathcal{C}}_{\mathrm{c}}^{2}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{2}\left({\mathbf{R}}^{d}\right)\right),\enspace {{\Vert}{\varphi }_{2}{\Vert}}_{\infty }{\leqslant}{\alpha }_{0}\right\}.\hfill \end{align*}$

Neglecting the closure for a moment, this leads to the predual ball according to

$\begin{equation}\mathcal{B}=\left\{\varphi \enspace \vert \enspace \varphi =\mathrm{d}\mathrm{i}\mathrm{v}\enspace {\varphi }_{1}={\mathrm{d}\mathrm{i}\mathrm{v}}^{2}{\varphi }_{2},\enspace {\varphi }_{i}\in {\mathcal{C}}_{\mathrm{c}}^{i}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{i}\left({\mathbf{R}}^{d}\right)\right),\enspace {{\Vert}{\varphi }_{i}{\Vert}}_{\infty }{\leqslant}{\alpha }_{2-i},\enspace i=1,2\right\}.\end{equation} \tag{ 34 }$

Each $\varphi \in \mathcal{B}$ possesses a representation as an ∞-bounded first and second-order divergence of some φ₁ and φ₂. However, as the kernel of the divergence is non-trivial (and even infinite-dimensional for d ⩾ 2), we can only conclude that φ₁ = div φ₂ + η for some η with div η = 0. Enforcing η = 0 thus gives the set

$\begin{equation*}{\mathcal{B}}_{{\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{2}}=\left\{{\mathrm{d}\mathrm{i}\mathrm{v}}^{2}\varphi \enspace \vert \enspace \varphi \in {\mathcal{C}}_{\mathrm{c}}^{2}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{2}\left({\mathbf{R}}^{d}\right)\right),\enspace {{\Vert}{\mathrm{d}\mathrm{i}\mathrm{v}}^{m}\varphi {\Vert}}_{\infty }{\leqslant}{\alpha }_{m},\enspace m=0,1\right\}\end{equation*}$

which leads, interpreted as a predual ball, to a seminorm which also incorporates first- and second-order derivatives but is different from infimal convolution: the total generalised variation [37].

There is also a primal version of this motivation via the (TV–TV²)-infimal convolution which reads as follows: writing

$\begin{equation*}\left({\alpha }_{1}\mathrm{T}\mathrm{V}{\triangle}{\alpha }_{0}{\mathrm{T}\mathrm{V}}^{2}\right)\left(u\right)=\underset{v\in {\mathrm{B}\mathrm{V}}^{2}\left({\Omega}\right)}{\mathrm{inf}}{\alpha }_{1}{\Vert}\nabla u-\nabla v{{\Vert}}_{\mathcal{M}}+{\alpha }_{0}{\Vert}{\nabla }^{2}v{{\Vert}}_{\mathcal{M}}\end{equation*}$

we see that the infimal convolution allows to subtract a vector field w = ∇v from the derivative of u at the cost of penalising its derivative $\nabla w=\mathcal{E}w$ , where the equality is due to symmetry of the weak Hessian ∇² v. While, by the embedding BD(Ω, R^d) ↪ L^d/(d−1)(Ω, R^d), necessarily w ∈ BD(Ω, R^d), it is not arbitrary among such functions but still restricted to be the gradient of v ∈ BV²(Ω). Omitting this additional constraint (in the predual version above, this corresponds to enforcing η = 0), we arrive at

$\begin{equation}{\mathcal{R}}_{\alpha }\left(u\right)=\underset{w\in \mathrm{B}\mathrm{D}\left({\Omega},{\mathbf{R}}^{d}\right)}{\mathrm{inf}}{\alpha }_{1}{\Vert}\nabla u-w{{\Vert}}_{\mathcal{M}}+{\alpha }_{0}{\Vert}\mathcal{E}w{{\Vert}}_{\mathcal{M}},\end{equation} \tag{ 35 }$

which is, as will be shown in this section, is an equivalent formulation of the TGV functional.

Definition 5.1. Let Ω ⊂ R^d be a domain, k ⩾ 1 and α₀, ..., α_k−1 > 0. Then, the total generalised variation of order k with weight α for $u\in {L}_{\mathrm{l}\mathrm{o}\mathrm{c}}^{1}\left({\Omega}\right)$ is defined as the value of the functional

$\begin{equation}{\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k}\left(u\right)=\mathrm{sup}\enspace \left\{{\int }_{{\Omega}}u\enspace {\mathrm{d}\mathrm{i}\mathrm{v}}^{k}\varphi \enspace \mathrm{d}x\enspace \vert \enspace \varphi \in {\mathcal{C}}_{\mathrm{c}}^{k}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{k}\left({\mathbf{R}}^{d}\right)\right),\enspace \underset{m=0,\dots ,k-1}{\underbrace{{{\Vert}{\mathrm{d}\mathrm{i}\mathrm{v}}^{m}\varphi {\Vert}}_{\infty }{\leqslant}{\alpha }_{m}}}\right\}\end{equation} \tag{ 36 }$

which takes the value ∞ in case the respective set is unbounded from above.

For symmetric tensors $u\in {L}_{\mathrm{l}\mathrm{o}\mathrm{c}}^{1}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ of order l ⩾ 0, the total generalised variation is given by

$\begin{equation}{\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l}\left(u\right)=\mathrm{sup}\enspace \left\{{\int }_{{\Omega}}u\cdot {\mathrm{d}\mathrm{i}\mathrm{v}}^{k}\varphi \enspace \mathrm{d}x\enspace \vert \enspace \varphi \in {\mathcal{C}}_{\mathrm{c}}^{k}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{k+l}\left({\mathbf{R}}^{d}\right)\right),\enspace \underset{m=0,\dots ,k-1}{\underbrace{{{\Vert}{\mathrm{d}\mathrm{i}\mathrm{v}}^{m}\varphi {\Vert}}_{\infty }{\leqslant}{\alpha }_{m}}}\right\}.\end{equation} \tag{ 37 }$

The space

$\begin{align*}\hfill {\mathrm{B}\mathrm{G}\mathrm{V}}_{\alpha }^{k}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)& =\left\{u\in {L}^{1}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)\enspace \vert \enspace {\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l}\left(u\right){< }\infty \right\},\hfill \\ \hfill {{\Vert}u{\Vert}}_{{\mathrm{B}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l}}& ={{\Vert}u{\Vert}}_{1}+{\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l}\left(u\right)\hfill \end{align*}$

is called the space of symmetric tensor fields of bounded generalised variation of order k with weight α. The special case l = 0 is denoted by ${\mathrm{B}\mathrm{G}\mathrm{V}}_{\alpha }^{k}\left({\Omega}\right)$ .

Remark 5.2. For k = 1 and α > 0, the definition coincides, up to a factor, with the total deformation of symmetric tensor fields of order l, i.e., ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{1,l}=\alpha \mathrm{T}\mathrm{D}$ , in particular ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{1,0}={\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{1}=\alpha \mathrm{T}\mathrm{V}$ . Hence, we can identify the spaces ${\mathrm{B}\mathrm{G}\mathrm{V}}_{\alpha }^{1}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)=\mathrm{B}\mathrm{D}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ . In particular, ${\mathrm{B}\mathrm{G}\mathrm{V}}_{\alpha }^{1}\left({\Omega}\right)=\mathrm{B}\mathrm{V}\left({\Omega}\right)$ .

In the following, we will derive some basic properties of the total generalised variation.

Proposition 5.3. The following basic properties hold:

(a)
${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l}$ is a lower semi-continuous seminorm on L^p(Ω, Sym^l(R^d)) for each p ∈ [1, ∞].
(b)
The kernel satisfies $\mathrm{ker}\left({\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l}\right)=\mathrm{ker}\left({\mathrm{T}\mathrm{D}}^{k}\right)$ for the kth order total deformation for symmetric tensor fields of order l. In particular, $\mathrm{ker}\left({\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l}\right)$ is a finite-dimensional subspace of polynomials of order less than k + l. For l = 0, we have $\mathrm{ker}\left({\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k}\right)={\mathbf{P}}^{k-1}$ .
(c)
${\mathrm{B}\mathrm{G}\mathrm{V}}_{\alpha }^{k}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ is a Banach space independent of α.

Proof. Observe that ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l}$ is the seminorm associated with the predual ball

$\begin{equation}{\mathcal{B}}_{{\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l}}=\left\{{\mathrm{d}\mathrm{i}\mathrm{v}}^{k}\varphi \enspace \vert \enspace \varphi \in {\mathcal{C}}_{\mathrm{c}}^{k}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{k+l}\left({\mathbf{R}}^{d}\right)\right),\enspace {{\Vert}{\mathrm{d}\mathrm{i}\mathrm{v}}^{m}\varphi {\Vert}}_{\infty }{\leqslant}{\alpha }_{m},\enspace m=0,\dots ,k-1\right\}.\end{equation} \tag{ 38 }$

By definition, each element of ${\mathcal{B}}_{{\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l}}$ can be associated with an element of the dual space of L^p(Ω, Sym^l(R^d)), so ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l}$ is convex and lower semi-continuous as pointwise supremum over a set of linear and continuous functionals. The positive homogeneity finally follows from $\lambda {\mathcal{B}}_{{\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l}}\subset {\mathcal{B}}_{{\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l}}$ for each $\left\vert \lambda \right\vert {\leqslant}1$ .

The statement about the kernel of ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l}$ is a consequence of ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l}\left(u\right)=0$ if and only if ⟨u, div^k φ⟩ = 0 for each $\varphi \in {\mathcal{C}}_{\mathrm{c}}^{k}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{k+l}\left({\mathbf{R}}^{d}\right)\right)$ (compare with proposition 3.21).

Finally, ${\mathrm{B}\mathrm{G}\mathrm{V}}_{\alpha }^{k}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ is a Banach space by lemma 3.15. The equivalence for parameter sets α₀, ..., α_k−1 > 0 and ${\tilde {\alpha }}_{0},\dots ,{\tilde {\alpha }}_{k-1}{ >}0$ can be seen as follows. Choosing C > 0 large enough, we can achieve that

$\begin{equation*}{\mathcal{B}}_{{\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l}}\subset {\mathcal{B}}_{{\mathrm{T}\mathrm{G}\mathrm{V}}_{C\tilde {\alpha }}^{k,l}}=C{\mathcal{B}}_{{\mathrm{T}\mathrm{G}\mathrm{V}}_{\tilde {\alpha }}^{k,l}}.\end{equation*}$

This implies

$\begin{equation*}{\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l}{\leqslant}C{\mathrm{T}\mathrm{G}\mathrm{V}}_{\tilde {\alpha }}^{k,l}.\end{equation*}$

Interchanging roles we get ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\tilde {\alpha }}^{k,l}{\leqslant}C{\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l}$ , so the spaces ${\mathrm{B}\mathrm{G}\mathrm{V}}_{\alpha }^{k}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ and ${\mathrm{B}\mathrm{G}\mathrm{V}}_{\tilde {\alpha }}^{k}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ have equivalent norms. □

Remark 5.4. As ${\mathrm{B}\mathrm{G}\mathrm{V}}_{\alpha }^{k}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ are all equivalent for different α, we will drop, in the following, the subscript α.

Proposition 5.5. The scalar total generalised variation, i.e., ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k}$ possesses the following invariance and scaling properties:

(a)
${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k}$ is translation invariant, i.e. for x₀ ∈ ${\mathbf{R}}^{d}$ and u ∈ BGV^k(Ω) we have that given by (x) = u(x + x₀) is in BGV^k(Ω − x₀) and ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k}\left(\tilde{u}\right)={\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k}\left(u\right)$ ,
(b)
${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k}$ is rotationally invariant, i.e. for each orthonormal matrix O ∈ ${\mathbf{R}}^{d{\times}d}$ and u ∈ BGV^k(Ω) we have, defining (x) = u(Ox), that ∈ BGV^k(O^TΩ) with ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k}\left(\tilde{u}\right)={\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k}\left(u\right)$ ,
(c)
For r > 0 and u ∈ BGV^k(Ω), we have, defining (x) = u(rx), that ∈ BGV^k(r⁻¹Ω) with
$\begin{equation*}{\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k}\left(\tilde{u}\right)={r}^{-d}{\mathrm{T}\mathrm{G}\mathrm{V}}_{\tilde {\alpha }}^{k}\left(u\right),\quad {\tilde {\alpha }}_{m}={\alpha }_{m}{r}^{k-m}\quad \text{for}\quad m=0,\dots ,k-1.\end{equation*}$

Proof. See [37]. □

The derivative versus the symmetrised derivative. In both ways to motivate the second-order TGV functional as presented at the beginning of this section, we see that symmetric tensor fields and a symmetrised derivative appear naturally in the penalisation of higher-order derivatives. Indeed, in the motivation via the predual ball $\mathcal{B}$ of the infimal convolution, symmetric tensor fields (resulting in a symmetrised derivative in the primal version) appear as the most economic way to write the predual ball, since for $\varphi \in {\mathcal{C}}_{\mathrm{c}}^{2}\left({\Omega},{\mathcal{T}}^{2}\left({\mathbf{R}}^{d}\right)\right)$ , div² φ = div²(| | |φ). In the primal version, the symmetrised derivative results from writing ${\nabla }^{2}v=\mathcal{E}\nabla v$ and then relaxing ∇v to be an arbitrary vector field w. Nevertheless, also non-symmetric tensor fields and the equality ∇² v = ∇(∇v) could have been used in these motivations. For the TGV functional, this would have resulted in a primal version of second-order TGV according to

$\begin{equation*}{\mathcal{R}}_{\alpha }\left(u\right)=\underset{w\in \mathrm{B}\mathrm{V}\left({\Omega},{\mathbf{R}}^{d}\right)}{\mathrm{inf}}\enspace \enspace {\alpha }_{1}{\Vert}\nabla u-w{{\Vert}}_{\mathcal{M}}+{\alpha }_{0}{\Vert}\nabla w{{\Vert}}_{\mathcal{M}},\end{equation*}$

which is genuinely different from the definition in (35). The following example provides some insight on the differences between using the derivative and the symmetrised derivative of vector fields of bounded variation in a Radon-norm penalty.

Example 5.6. On ${\Omega}=\left\{\left({x}_{1},{x}_{2}\right)\in {\mathbf{R}}^{2}\enspace \vert \enspace {x}_{1}^{2}+{x}_{2}^{2}{< }\frac{1}{4}\right\}$ define, for given $\nu ,n\in {\mathcal{S}}^{1}$ (the unit sphere in R²),

$\begin{equation*}w\left(x\right)=\left\{\begin{array}{cc}{w}_{1}={\nu }_{1}n+{\nu }_{2}{n}^{\perp }\hfill & \text{if}\;x\cdot n{ >}0,\hfill \\ {w}_{2}=-\left({\nu }_{1}n+{\nu }_{2}{n}^{\perp }\right)\hfill & \text{if}\;x\cdot n{< }0,\hfill \end{array}\right.\end{equation*}$

where n^⊥ = (n₂, −n₁). Then, w ∈ BV(Ω, R²) and, with $L=\left\{\lambda {n}^{\perp }\enspace \vert \enspace \lambda \in \left.\right] -\frac{1}{2},\frac{1}{2}\left[\right.\right\}$ ,

$\begin{equation*}\nabla w=\left({w}_{1}-{w}_{2}\right)\otimes n\enspace {\mathcal{H}}^{1}\enspace \llcorner \enspace L.\end{equation*}$

A direct computation shows that

$\begin{equation*}{\Vert}\nabla w{{\Vert}}_{\mathcal{M}}=2,\quad {\Vert}\mathcal{E}w{{\Vert}}_{\mathcal{M}}=2\sqrt{{\nu }_{1}^{2}+\frac{{\nu }_{2}^{2}}{2}},\end{equation*}$

thus, the symmetrised derivative depends on the angle of the vector field relative to the jump set, while the derivative does not. In particular, whenever ν₂ = 0 such that the vector field can be written as the gradient of a function in BV²(Ω), the two notions coincide. See figure 5 for a visualisation of w for different values of ν.

**Figure 5.** Visualisation of the function w of example 5.6 and values of ${\Vert}\nabla w{{\Vert}}_{\mathcal{M}}$ and ${\Vert}\mathcal{E}w{{\Vert}}_{\mathcal{M}}$ for different choices of ν. The blue lines show the level lines of a function v such that w = ∇v.
Download figure:
Standard image High-resolution image

**Figure 5.** Visualisation of the function w of example 5.6 and values of ${\Vert}\nabla w{{\Vert}}_{\mathcal{M}}$ and ${\Vert}\mathcal{E}w{{\Vert}}_{\mathcal{M}}$ for different choices of ν. The blue lines show the level lines of a function v such that w = ∇v.
Download figure:
Standard image High-resolution image

5.2. Functional analytic and regularisation properties

We would like to characterise ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l}$ in terms of a minimisation problem. This characterisation will base on Fenchel–Rockafellar duality. Here, the following theorem by [8] is employed. Recall that the domain of a function $F:X\to \left.\right]- \infty ,\infty \left.\right]$ is defined as dom(F) = {x ∈ X|F(x) < ∞}.

Theorem 5.7. Let X, Y be Banach spaces and Λ : X → Y linear and continuous. Let $F:X\to \left.\right]- \infty ,\infty \left.\right]$ and $G:Y\to \left.\right]- \infty ,\infty \left.\right]$ be proper, convex and lower semi-continuous. Assume that

$\begin{equation}{\bigcup }_{\lambda {\geqslant}0}\lambda \left(\mathrm{d}\mathrm{o}\mathrm{m}\left(G\right)-{\Lambda}\enspace \mathrm{d}\mathrm{o}\mathrm{m}\left(F\right)\right)=Y.\end{equation} \tag{ 39 }$

Then,

$\begin{equation}\underset{x\in X}{\mathrm{inf}}\enspace F\left(x\right)+G\left({\Lambda}x\right)=\underset{{y}^{{\ast}}\in {Y}^{{\ast}}}{\mathrm{max}}\enspace -{F}^{{\ast}}\left(-{{\Lambda}}^{{\ast}}{y}^{{\ast}}\right)-{G}^{{\ast}}\left({y}^{{\ast}}\right).\end{equation} \tag{ 40 }$

In particular, the maximum on the right-hand side is attained.

As a preparation for employing this duality result, we note:

Lemma 5.8. Let l ⩾ 0, i ⩾ 1 and ${w}_{i-1}\in {\mathcal{C}}_{0}^{i-1}{\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l+i-1}\left({\mathbf{R}}^{d}\right)\right)}^{{\ast}}$ , ${w}_{i}\in {\mathcal{C}}_{0}^{i}{\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l+i}\left({\mathbf{R}}^{d}\right)\right)}^{{\ast}}$ be distributions of order i − 1 and i, respectively. Then,

$\begin{equation}{{\Vert}\mathcal{E}{w}_{i-1}-{w}_{i}{\Vert}}_{\mathcal{M}}=\mathrm{sup}\enspace \left\{\langle {w}_{i-1},\enspace \mathrm{d}\mathrm{i}\mathrm{v}\enspace \varphi \rangle +\langle {w}_{i},\enspace \varphi \rangle \enspace \vert \enspace \varphi \in {\mathcal{C}}_{\mathrm{c}}^{i}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l+i}\left({\mathbf{R}}^{d}\right)\right),\enspace {{\Vert}\varphi {\Vert}}_{\infty }{\leqslant}1\right\}\end{equation} \tag{ 41 }$

with the right-hand side being finite if and only if $\mathcal{E}{w}_{i-1}-{w}_{i}\in \mathcal{M}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l+i}\left({\mathbf{R}}^{d}\right)\right)$ in the distributional sense.

Proof. Note that in the distributional sense, $\langle {w}_{i}-\mathcal{E}{w}_{i-1},\enspace \varphi \rangle =\langle {w}_{i-1},\enspace \mathrm{d}\mathrm{i}\mathrm{v}\enspace \varphi \rangle +\langle {w}_{i},\enspace \varphi \rangle$ for all $\varphi \in {\mathcal{C}}_{\mathrm{c}}^{\infty }\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l+i}\left({\mathbf{R}}^{d}\right)\right)$ . Since ${\mathcal{C}}_{\mathrm{c}}^{\infty }\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l+i}\left({\mathbf{R}}^{d}\right)\right)$ is dense in ${\mathcal{C}}_{0}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l+i}\left({\mathbf{R}}^{d}\right)\right)$ , the distribution ${w}_{i}-\mathcal{E}{w}_{i-1}$ can be extended to an element in ${\mathcal{C}}_{0}{\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l+i}\left({\mathbf{R}}^{d}\right)\right)}^{{\ast}}=\mathcal{M}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l+i}\left({\mathbf{R}}^{d}\right)\right)$ if and only if the supremum in (41) is finite. In case of finiteness, it coincides with the Radon norm by definition. □

This enables us to derive the problem which is dual to the maximisation problem in (37). We will refer to the resulting problem as the minimum representation of ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l}$ .

Theorem 5.9. For k ⩾ 1, l ⩾ 0, Ω a bounded Lipschitz domain and ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l}$ according to (37), we have for each u ∈ L¹(Ω, Sym^l(R^d)):

$\begin{equation}{\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l}\left(u\right)=\underset{\genfrac{}{}{0pt}{}{{w}_{i}\in \mathrm{B}\mathrm{D}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l+i}\left({\mathbf{R}}^{d}\right)\right),}{\genfrac{}{}{0pt}{}{i=0,\dots ,k,}{{w}_{0}=u,\enspace {w}_{k}=0}}}{\mathrm{min}}\enspace \sum _{i=1}^{k}{\alpha }_{k-i}{{\Vert}\mathcal{E}{w}_{i-1}-{w}_{i}{\Vert}}_{\mathcal{M}}\end{equation} \tag{ 42 }$

with the minimum being finite if and only if u ∈ BD(Ω, Sym^l( R ^d)) and attained for some w₀, ..., w_k where w_i ∈ BD(Ω, Sym^l+i( R ^d)) for i = 0, ..., k and w₀ = u as well as w_k = 0 in case of u ∈ BD(Ω, Sym^l( R ^d)).

Proof. First, take $u\in {L}_{\mathrm{l}\mathrm{o}\mathrm{c}}^{1}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ such that ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l}\left(u\right){< }\infty$ . We will employ Fenchel–Rockafellar duality. For this purpose, introduce the Banach spaces

$\begin{align*}\hfill X& ={\mathcal{C}}_{0}^{1}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{1+l}\left({\mathbf{R}}^{d}\right)\right){\times}\cdots {\times}{\mathcal{C}}_{0}^{k}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{k+l}\left({\mathbf{R}}^{d}\right)\right),\hfill \\ \hfill Y& ={\mathcal{C}}_{0}^{1}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{1+l}\left({\mathbf{R}}^{d}\right)\right){\times}\cdots {\times}{\mathcal{C}}_{0}^{k-1}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{k-1+l}\left({\mathbf{R}}^{d}\right)\right),\hfill \end{align*}$

the linear operator

$\begin{equation*}{\Lambda}\in \mathcal{L}\left(X,Y\right),\quad {\Lambda}\varphi =\left(\begin{matrix}\hfill -{\varphi }_{1}-\mathrm{d}\mathrm{i}\mathrm{v}\enspace {\varphi }_{2}\hfill \\ \hfill \cdots \hfill \\ \hfill -{\varphi }_{k-1}-\mathrm{d}\mathrm{i}\mathrm{v}\enspace {\varphi }_{k}\hfill \end{matrix}\right),\end{equation*}$

and the proper, convex and lower semi-continuous functionals

$\begin{equation*}\begin{aligned}\hfill F& :X\to \left.\right]- \infty ,\infty \left.\right],\hfill & \hfill F\left(\varphi \right)& =-\langle u,\enspace \mathrm{d}\mathrm{i}\mathrm{v}\enspace {\varphi }_{1}\rangle +\sum _{i=1}^{k}{\mathcal{I}}_{\left\{{{\Vert}\cdot {\Vert}}_{\infty }{\leqslant}{\alpha }_{k-i}\right\}}\left({\varphi }_{i}\right),\hfill \\ \hfill G& :Y\to \left.\right]- \infty ,\infty \left.\right],\hfill & \hfill G\left(\psi \right)& ={\mathcal{I}}_{\left\{\left(0,\dots ,0\right)\right\}}\left(\psi \right).\hfill \end{aligned}\end{equation*}$

With these choices, the identity

$\begin{equation*}{\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l}\left(u\right)=\underset{\varphi \in X}{\mathrm{sup}}\enspace -F\left(\varphi \right)-G\left({\Lambda}\varphi \right)\end{equation*}$

follows from the definition in (37).

In order to show the representation of ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l}\left(u\right)$ as in (42), we would like to obtain

$\begin{equation}{\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l}\left(u\right)=\underset{w\in {Y}^{{\ast}}}{\mathrm{min}}\enspace {F}^{{\ast}}\left(-{{\Lambda}}^{{\ast}}w\right)+{G}^{{\ast}}\left(w\right).\end{equation} \tag{ 43 }$

This follows as soon as (39) is verified. For the purpose of showing (39), let ψ ∈ Y and define backwards recursively: ${\varphi }_{k}=0\in {\mathcal{C}}_{0}^{k}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{k+l}\left({\mathbf{R}}^{d}\right)\right)$ , ${\varphi }_{i}={\psi }_{i}-\mathrm{d}\mathrm{i}\mathrm{v}\enspace {\varphi }_{i+1}\in {\mathcal{C}}_{0}^{i}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{i+l}\left({\mathbf{R}}^{d}\right)\right)$ for i = k − 1, ..., 1. Hence, φ ∈ X and −Λφ = ψ. Moreover, choosing λ > 0 large enough, one can achieve that ${{\Vert}{\lambda }^{-1}\varphi {\Vert}}_{\infty }{\leqslant}{\alpha }_{k-i}$ for all i = 1, ..., k, so λ⁻¹ φ ∈ dom(F) and since 0 ∈ dom(G), we get the representation ψ = λ(0 − Λλ⁻¹ φ). Thus, the identity (43) holds and the minimum is attained in Y*. Now, Y* can be written as

$\begin{equation*}{Y}^{{\ast}}={\mathcal{C}}_{0}^{1}{\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{1+l}\left({\mathbf{R}}^{d}\right)\right)}^{{\ast}}{\times}\cdots {\times}{\mathcal{C}}_{0}^{k-1}{\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{k-1+l}\left({\mathbf{R}}^{d}\right)\right)}^{{\ast}},\end{equation*}$

with elements w = (w₁, ..., w_k−1), ${w}_{i}\in {\mathcal{C}}_{0}^{i}{\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{i+l}\left({\mathbf{R}}^{d}\right)\right)}^{{\ast}}$ , for 1 ⩽ i ⩽ k − 1. Therefore, with w₀ = u and w_k = 0 we get, as G* = 0, that

$\begin{equation*}\begin{aligned}\hfill {F}^{{\ast}}\left(-{{\Lambda}}^{{\ast}}w\right)+{G}^{{\ast}}\left(w\right)& =\underset{\varphi \in X}{\mathrm{sup}}\enspace \left(\langle -{{\Lambda}}^{{\ast}}w,\varphi \rangle +\langle u,\enspace \mathrm{d}\mathrm{i}\mathrm{v}\enspace {\varphi }_{1}\rangle -\sum _{i=1}^{k}{\mathcal{I}}_{\left\{{{\Vert}\cdot {\Vert}}_{\infty }{\leqslant}{\alpha }_{k-i}\right\}}\left({\varphi }_{i}\right)\right)\hfill \\ \hfill & =\underset{\genfrac{}{}{0pt}{}{\varphi \in X,}{\genfrac{}{}{0pt}{}{{\Vert}{\varphi }_{i}{{\Vert}}_{\infty }{\leqslant}{\alpha }_{k-i},}{i=1,\dots ,k}}}{\mathrm{sup}}\enspace \left(\langle u,\enspace \mathrm{d}\mathrm{i}\mathrm{v}\enspace {\varphi }_{1}\rangle +\sum _{i=1}^{k-1}\langle {w}_{i},\enspace \mathrm{d}\mathrm{i}\mathrm{v}\enspace {\varphi }_{i+1}\rangle +\langle {w}_{i},\enspace {\varphi }_{i}\rangle \right)\hfill \\ \hfill & =\sum _{i=1}^{k}{\alpha }_{k-i}\left(\underset{\genfrac{}{}{0pt}{}{{\varphi }_{i}\in {\mathcal{C}}_{0}^{i}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{i+l}\left({\mathbf{R}}^{d}\right)\right),}{{\Vert}{\varphi }_{i}{{\Vert}}_{\infty }{\leqslant}1}}{\mathrm{sup}}\langle {w}_{i-1},\enspace \mathrm{d}\mathrm{i}\mathrm{v}\enspace {\varphi }_{i}\rangle +\langle {w}_{i},\enspace {\varphi }_{i}\rangle \right).\hfill \end{aligned}\end{equation*}$

From lemma 5.8 we obtain that each supremum is finite and coincides with ${{\Vert}\mathcal{E}{w}_{i-1}-{w}_{i}{\Vert}}_{\mathcal{M}}$ if and only if $\mathcal{E}{w}_{i-1}-{w}_{i}\in \mathcal{M}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{k+i}\left({\mathbf{R}}^{d}\right)\right)$ for i = 1, ..., k. Then, as w_k = 0, according to theorem 3.16, this already yields w_k−1 ∈ BD(Ω, Sym^k+l−1(R^d)), in particular ${w}_{k-1}\in \mathcal{M}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{k+l-1}\left({\mathbf{R}}^{d}\right)\right)$ . Proceeding inductively, we see that w_i ∈ BD(Ω, Sym^k+i(R^d)) for each i = 0, ..., k. Hence, it suffices to take the minimum in (43) over all BD-tensor fields which gives (42).

In addition, the minimum in (42) is finite if u ∈ BD(Ω, Sym^l(R^d)). Conversely, if TD(u) = ∞, also ${{\Vert}\mathcal{E}{w}_{0}-{w}_{1}{\Vert}}_{\mathcal{M}}=\infty$ for all w₁ ∈ BD(Ω, Sym^l+1(R^d)). Hence, the minimum in (42) has to be ∞. □

Remark 5.10. In the scalar case, i.e., l = 0 it holds that

$\begin{equation}{\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k}\left(u\right)=\underset{\genfrac{}{}{0pt}{}{{w}_{i}\in \mathrm{B}\mathrm{D}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{i}\left({\mathbf{R}}^{d}\right)\right),}{i=0,\dots ,k,\enspace {w}_{0}=u,\enspace {w}_{k}=0}}{\mathrm{min}}\enspace \sum _{i=1}^{k}{\alpha }_{k-i}{{\Vert}\mathcal{E}{w}_{i-1}-{w}_{i}{\Vert}}_{\mathcal{M}}.\end{equation} \tag{ 44 }$

Remark 5.11. The minimum representation also allows to define ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l}$ recursively:

$\begin{equation}\left\{\begin{array}{cc}\hfill {\mathrm{T}\mathrm{G}\mathrm{V}}_{{\alpha }_{0}}^{1,l}\left(u\right)& ={\alpha }_{0}{{\Vert}\mathcal{E}u{\Vert}}_{\mathcal{M}}\hfill \\ \hfill {\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k+1,l}\left(u\right)& =\underset{w\in \mathrm{B}\mathrm{D}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l+1}\left({\mathbf{R}}^{d}\right)\right)}{\mathrm{min}}{\alpha }_{k}{{\Vert}\mathcal{E}u-w{\Vert}}_{\mathcal{M}}+{\mathrm{T}\mathrm{G}\mathrm{V}}_{{\alpha }^{\prime }}^{k,l+1}\left(w\right)\hfill \end{array}\right.\end{equation} \tag{ 45 }$

where α' = (α₀, ..., α_k−1) if α = (α₀, ..., α_k).

Remark 5.12. For the scalar ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{2}$ , the minimum representation reads as

$\begin{equation*}{\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{2}\left(u\right)=\underset{w\in \mathrm{B}\mathrm{D}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{1}\left({\mathbf{R}}^{d}\right)\right)}{\mathrm{min}}\enspace {\alpha }_{1}{{\Vert}\nabla u-w{\Vert}}_{\mathcal{M}}+{\alpha }_{0}{{\Vert}\mathcal{E}w{\Vert}}_{\mathcal{M}}.\end{equation*}$

This can be interpreted as follows. For u ∈ BV(Ω), ∇u is a measure which can be decomposed into a regular and singular component with respect to the Lebesgue measure. The singular part is always penalised with the Radon norm where from the regular part, an optimal bounded deformation vector field w is extracted. This vector field is penalised by TD which, like TV, implies certain regularity but also allows for jumps. Thus, $\mathcal{E}w$ essentially contains the second-order derivative information of u.

Provided that w is optimal, the total generalised variation of second order then penalises the first-order remainder ∇u − w which essentially contains the jumps of u as well as the second-order information $\mathcal{E}w$ .

The next step is to examine the spaces BGV^k(Ω, Sym^l(R^d)). Our aim is to prove that these space coincide with BD(Ω, Sym^l(R^d)) for fixed l ⩾ 0 and all k ⩾ 1. We will proceed inductively with respect to k and hence vary k, l but leave Ω fixed and assume that Ω is a bounded Lipschitz domain. For what follows, we choose a family of projection operators onto the kernel of ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l}=\mathrm{ker}\left({\mathcal{E}}^{k}\right)$ (see proposition 5.3).

Definition 5.13. For each k ⩾ 1 and l ⩾ 0, denote by ${R}_{k,l}:{L}^{d/\left(d-1\right)}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)\to \mathrm{ker}\left({\mathcal{E}}^{k}\right)$ a linear and continuous projection.

As $\mathrm{ker}\left({\mathcal{E}}^{k}\right)$ (on Ω and for symmetric tensor fields of order l) is finite-dimensional, such a R_k,l always exists but is not necessarily unique. A coercivity estimate in L^d/(d−1)(Ω, Sym^l(R^d)) for ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l}$ will next be formulated and proven in terms of these projections. As we will see, the induction step in the proof requires an intermediate estimate as follows.

Lemma 5.14. For each k ⩾ 1 and l ⩾ 0 there exists a constant C > 0, only depending on Ω, k and l such that for each u ∈ BD(Ω, Sym^l(R^d)) and w ∈ L^d/(d−1)(Ω, Sym^l+1(R^d)),

$\begin{equation*}{\Vert}\mathcal{E}u{{\Vert}}_{\mathcal{M}}{\leqslant}C\left({\Vert}\mathcal{E}u-{R}_{k,l+1}w{{\Vert}}_{\mathcal{M}}+{\Vert}u{{\Vert}}_{1}\right).\end{equation*}$

Proof. If this is not true for some k and l, then there exist {uⁿ} in BD(Ω, Sym^l(R^d)) and {wⁿ} in L^d/(d−1)(Ω, Sym^l+1(R^d)) such that

$\begin{equation*}{\Vert}\mathcal{E}{u}^{n}{{\Vert}}_{\mathcal{M}}=1\quad \text{and}\quad \frac{1}{n}{\geqslant}{\Vert}{u}^{n}{{\Vert}}_{1}+{\Vert}\mathcal{E}{u}^{n}-{R}_{k,l+1}{w}^{n}{{\Vert}}_{\mathcal{M}}.\end{equation*}$

This implies that {R_k,l+1 wⁿ} is bounded in terms of ${{\Vert}\cdot {\Vert}}_{\mathcal{M}}$ in the finite-dimensional space $\mathrm{ker}\left({\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l+1}\right)=\mathrm{ker}\left({\mathcal{E}}^{k}\right)$ , see proposition 5.3. Consequently, there exists a subsequence, again denoted by {wⁿ}, such that R_k,l+1 wⁿ → w as n → ∞ with respect to ||⋅||₁. Hence, $\mathcal{E}{u}^{n}\to w$ as n → ∞. Further, we have that uⁿ → 0 as n → ∞ and thus, by closedness of the weak symmetrised gradient, $\mathcal{E}{u}^{n}\to 0$ as n → ∞ in $\mathcal{M}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l+1}\left({\mathbf{R}}^{d}\right)\right)$ , which contradicts ${\Vert}\mathcal{E}{u}^{n}{{\Vert}}_{\mathcal{M}}=1$ for all n. □

Proposition 5.15. For each k ⩾ 1 and l ⩾ 0, there exists a constant C > 0 such that

$\begin{equation}{{\Vert}\mathcal{E}u{\Vert}}_{\mathcal{M}}{\leqslant}C\left({{\Vert}u{\Vert}}_{1}+\mathrm{min}{\left\{{\alpha }_{0},\dots ,{\alpha }_{k-1}\right\}}^{-1}{\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l}\left(u\right)\right)\enspace \text{as}\enspace \text{well}\;\text{as}\end{equation} \tag{ 46 }$

$\begin{equation}{{\Vert}u-{R}_{k,l}u{\Vert}}_{d/\left(d-1\right)}{\leqslant}C\mathrm{min}{\left\{{\alpha }_{0},\dots ,{\alpha }_{k-1}\right\}}^{-1}{\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l}\left(u\right)\end{equation} \tag{ 47 }$

for all u ∈ BD(Ω, Sym^l(R^d)).

Proof. We prove the result by induction with respect to k. In the case k = 1 and l ⩾ 0 arbitrary, the first inequality is immediate while the second is equivalent to the Sobolev–Korn inequality in BD(Ω, Sym^l(R^d)), see theorem 3.18.

Now assume that both inequalities hold for a fixed k and each l ⩾ 0 and perform an induction step with respect to k, i.e., we fix l ∈ N, α = (α₀, ..., α_k) with α_i > 0 for i = 0, ..., k. We assume that assertion (46) holds for α' = (α₀, ..., α_k−1) and any l' ∈ N.

We will first show the uniform estimate for ${\Vert}\mathcal{E}u{{\Vert}}_{\mathcal{M}}$ for which it suffices to consider u ∈ BD(Ω, Sym^l(R^d)), as otherwise, according to theorem 5.9, ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k+1,l}\left(u\right)=\infty$ . Then, with the projection R_k,l+1, the help of lemma 5.14, the continuous embeddings

$\begin{equation*}\mathrm{B}\mathrm{D}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l+1}\left({\mathbf{R}}^{d}\right)\right)\hookrightarrow {L}^{d/\left(d-1\right)}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l+1}\left({\mathbf{R}}^{d}\right)\right)\hookrightarrow {L}^{1}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l+1}\left({\mathbf{R}}^{d}\right)\right)\end{equation*}$

and the induction hypothesis, we can estimate for arbitrary w ∈ BD(Ω, Sym^l+1(R^d)),

$\begin{align*}\hfill {\Vert}\mathcal{E}u{{\Vert}}_{\mathcal{M}}& {\leqslant}C\left({\Vert}\mathcal{E}u-{R}_{k,l+1}w{{\Vert}}_{\mathcal{M}}+{\Vert}u{{\Vert}}_{1}\right)\hfill \\ \hfill & {\leqslant}C\left({\Vert}\mathcal{E}u-w{{\Vert}}_{\mathcal{M}}+{\Vert}w-{R}_{k,l+1}w{{\Vert}}_{d/\left(d-1\right)}+{\Vert}u{{\Vert}}_{1}\right)\hfill \\ \hfill & {\leqslant}C\left({\Vert}\mathcal{E}u-w{{\Vert}}_{\mathcal{M}}+\mathrm{min}{\left\{{\alpha }_{0},\dots ,{\alpha }_{k-1}\right\}}^{-1}{\mathrm{T}\mathrm{G}\mathrm{V}}_{{\alpha }^{\prime }}^{k,l+1}\left(w\right)+{\Vert}u{{\Vert}}_{1}\right)\hfill \\ \hfill & {\leqslant}C\left(\mathrm{min}{\left\{{\alpha }_{0},\dots ,{\alpha }_{k}\right\}}^{-1}\left({\alpha }_{k}{\Vert}\mathcal{E}u-w{{\Vert}}_{\mathcal{M}}+{\mathrm{T}\mathrm{G}\mathrm{V}}_{{\alpha }^{\prime }}^{k,l+1}\left(w\right)\right)+{\Vert}u{{\Vert}}_{1}\right)\hfill \end{align*}$

for C > 0 suitable generic constants. Taking the minimum over all such w ∈ BD(Ω, Sym^l+1(R^d)) then yields

$\begin{equation*}{\Vert}\mathcal{E}u{{\Vert}}_{\mathcal{M}}{\leqslant}C\left({\Vert}u{{\Vert}}_{1}+\mathrm{min}{\left\{{\alpha }_{0},\dots ,{\alpha }_{k}\right\}}^{-1}{\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k+1,l}\left(u\right)\right)\end{equation*}$

by virtue of the recursive minimum representation (45).

The coercivity estimate can be shown analogously to proposition 3.22 and corollary 3.23. First, assume that the inequality does not hold true for α = (1, ..., 1). Then, there is a sequence {uⁿ} in L^d/(d−1)(Ω, Sym^l(R^d)) such that

$\begin{equation*}{\Vert}{u}^{n}-{R}_{k+1,l}{u}^{n}{{\Vert}}_{d/\left(d-1\right)}=1\quad \text{and}\quad \frac{1}{n}{\geqslant}{\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k+1,l}\left({u}^{n}\right).\end{equation*}$

By $\mathrm{ker}\left({\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k+1,l}\right)=\mathrm{r}\mathrm{g}\left({R}_{k+1,l}\right)$ , we have ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k+1,l}\left({u}^{n}-{R}_{k+1,l}{u}^{n}\right)={\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k+1,l}\left({u}^{n}\right)$ for each n. Thus, since we already know the first estimate in (46) to hold,

$\begin{equation}{\Vert}\mathcal{E}\left({u}^{n}-{R}_{k+1,l}{u}^{n}\right){{\Vert}}_{\mathcal{M}}{\leqslant}C\left({\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k+1,l}\left({u}^{n}\right)+{\Vert}{u}^{n}-{R}_{k+1,l}{u}^{n}{{\Vert}}_{1}\right),\end{equation} \tag{ 48 }$

implying, by continuous embedding, that {uⁿ − R_k+1,l uⁿ} is bounded in BD(Ω, Sym^l(R^d)). By compact embedding (see theorem 3.17), we may therefore conclude that uⁿ − R_k+1,l uⁿ → u in L¹(Ω, Sym^l(R^d)) for some subsequence (not relabelled). Moreover, as R_k+1,l(uⁿ − R_k+1,l uⁿ) = 0 for all n, the limit has to satisfy R_k+1,l u = 0. On the other hand, by lower semi-continuity (see proposition 5.3),

$\begin{equation*}0{\leqslant}{\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k+1,l}\left(u\right){\leqslant}\underset{n\to \infty }{\text{lim inf}}\enspace {\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k+1,l}\left({u}^{n}\right)=0,\end{equation*}$

hence $u\in \mathrm{ker}\left({\mathcal{E}}^{k+1}\right)=\mathrm{r}\mathrm{g}\left({R}_{k+1,l}\right)$ . Consequently, lim_n→∞ uⁿ − R_k+1,l uⁿ = u = R_k+1,l u = 0. From (48) it follows that also $\mathcal{E}\left({u}^{n}-{R}_{k+1,l}{u}^{n}\right)\to 0$ in $\mathcal{M}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l+1}\left({\mathbf{R}}^{d}\right)\right)$ , so uⁿ − R_k+1,l uⁿ → 0 in BD(Ω, Sym^l(R^d)) and by continuous embedding also in L^d/(d−1)(Ω, Sym^l(R^d)). However, this contradicts ||uⁿ − R_k+1,l uⁿ||_d/(d−1) = 1 for all n, and thus, the claimed coercivity for the particular choice α = (1, ..., 1) holds. The result for general α then follows from monotonicity of ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k+1,l}$ with respect to each component of α. □

Corollary 5.16. For k ⩾ 1 and l ⩾ 0 there exist C, c > 0 such that for all u ∈ BD(Ω, Sym^l(R^d)) we have

$\begin{equation}c\left({\Vert}u{{\Vert}}_{1}+{\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l}\left(u\right)\right){\leqslant}{\Vert}u{{\Vert}}_{1}+\mathrm{T}\mathrm{D}\left(u\right){\leqslant}C\left({\Vert}u{{\Vert}}_{1}+{\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l}\left(u\right)\right).\end{equation} \tag{ 49 }$

In particular, BGV^k(Ω, Sym^l(R^d)) = BD(Ω, Sym^l(R^d)) in the sense of Banach space isomorphy.

Proof. The estimate on the right is a consequence of (46) while the estimate on the left follows by the minimum representation (42) which gives ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l}{\leqslant}{\alpha }_{k-1}\mathrm{T}\mathrm{D}$ . □

Tikhonov regularisation. Once again, the second estimate in proposition 5.15 is crucial to transfer the well-posedness result of theorem 3.26 as follows.

Proposition 5.17. With X = L^p(Ω), $p\in \left.\right]1,\infty \left[\right.$ , Ω being a bounded Lipschitz domain, Y a Banach space, K : X → Y linear and continuous, S_f : Y → [0, ∞] proper, convex, lower semi-continuous and coercive, k ⩾ 1, α = (α₀, ..., α_k−1) with α_i > 0 for i = 0, ..., k − 1, the Tikhonov minimisation problem

$\begin{equation}\underset{u\in {L}^{p}\left({\Omega}\right)}{\mathrm{min}}\enspace {S}_{f}\left(Ku\right)+{\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k}\left(u\right).\end{equation} \tag{ 50 }$

is well-posed in the sense of theorem 3.26 whenever p ⩽ d/(d − 1) if d > 1.

Regarding the assumptions of theorem 3.26 on the kernel of the seminorm and the constraint on the exponent p in the underlying L^p-space, we see that, as one would expect, TGV^k resembles the situation of the infimal convolution of TV-type functionals rather than their sum, in particular the constraint p ⩽ d/(d − 1) is the same as with first-order TV regularisation.

This is also true for the following convergence result, which should be compared to the results of theorems 4.4 and 4.12 for the sum and the infimal convolution of higher-order TV functionals, respectively. Here, similar as with the infimal convolution, we extend ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k}$ to weights in $\left.\right]0,\infty \left.\right]$ by using the minimum representation and defining ${\alpha }_{i}{{\Vert}\cdot {\Vert}}_{\mathcal{M}}={\mathcal{I}}_{\left\{0\right\}}$ for α_i = ∞.

Theorem 5.18. In the situation of proposition 5.17 and $p\in \left.\right]1,\infty \left[\right.$ with p ⩽ d/(d − 1) if d > 1, let for each δ > 0 the data f^δ be such that ${S}_{{f}^{\delta }}\left({f}^{{\dagger}}\right){\leqslant}\delta$ , and let the discrepancy functionals $\left\{{S}_{{f}^{\delta }}\right\}$ be equi-coercive and converge to ${S}_{{f}^{{\dagger}}}$ for some data f^† in Y in the sense of (4) and ${S}_{{f}^{{\dagger}}}\left(v\right)=0$ if and only if v = f^†.

Choose the parameters α = (α₀, ..., α_k−1) in dependence of δ such that

$\begin{equation*}\mathrm{min}\left\{{\alpha }_{0},\dots ,{\alpha }_{k-1}\right\}\to 0,\quad \frac{\delta }{\mathrm{min}\left\{{\alpha }_{0},\dots ,{\alpha }_{k-1}\right\}}\to 0,\quad \text{as}\enspace \delta \to 0,\end{equation*}$

and assume that $\left({\tilde {\alpha }}_{0},\dots ,{\tilde {\alpha }}_{k-1}\right)=\left({\alpha }_{0},\dots ,{\alpha }_{k-1}\right)/\mathrm{min}\left\{{\alpha }_{0},\dots ,{\alpha }_{k-1}\right\}\to \left({\alpha }_{0}^{{\dagger}},\dots ,{\alpha }_{k-1}^{{\dagger}}\right)\in \left.\right]0,\infty \left.\right]^{k}$ as δ → 0. Set

$\begin{equation*}m=\mathrm{min}\left\{{m}^{\prime }\in \left\{1,\dots ,k\right\}\enspace \vert \enspace {\alpha }_{k-{m}^{\prime }}^{{\dagger}}\ne \infty \right\}\end{equation*}$

and assume that there exists u₀ ∈ BV^m(Ω) such that Ku₀ = f^†.

Then, up to shifts in ker(K) ∩ P ^k−1, any sequence {u^α,δ}, with each u^α,δ being a solution to (50) with parameters (α₀, ..., α_k−1) and data f^δ, has at least one L^p-weak accumulation point. Each L^p-weak accumulation point is a minimum- ${\mathrm{T}\mathrm{G}\mathrm{V}}_{{\alpha }^{{\dagger}}}^{k}$ -solution of Ku = f^† and ${\mathrm{lim}}_{\delta \to 0}{\mathrm{T}\mathrm{G}\mathrm{V}}_{\tilde {\alpha }}^{k}\left({u}^{\alpha ,\delta }\right)={\mathrm{T}\mathrm{G}\mathrm{V}}_{{\alpha }^{{\dagger}}}^{k}\left({u}^{{\dagger}}\right)$ .

Proof. The proof is analogous to the one of [32, theorem 4.8], which considers the case ${S}_{f}\left(v\right)=\frac{1}{q}{\Vert}v-f{{\Vert}}_{Y}^{q}$ for $q\in \left[\right.1,\infty \left[\right.$ . Alternatively, one can proceed along the lines of the proof of theorem 4.12 with the infimal convolution replaced by TGV to obtain the result. □

Remark 5.19. (parameter choice). Similar to the infimal convolution of TV functionals, also for TGV, the minimum of the involved parameters determines the overall regularisation, while the ratio between the minimum and the different parameters reflect the regularity assumption on the ground-truth data. In practice, this suggests a parameter choice as (α₀, ..., α_k−1) = α(λ₀, ..., λ_k−1), where again α > 0 is chosen in dependence of the noise level and (λ₀, ..., λ_k−1) with min{λ₀, ..., λ_k−1} = 1 is fixed once for a given class of image data and then left constant independent of the concrete forward model or the noise level.

A priori estimates. In case of Hilbert-space data and quadratic norm discrepancy, i.e., ${S}_{f}\left(v\right)=\frac{1}{2}{{\Vert}v-f{\Vert}}_{Y}^{2}$ for Y Hilbert space, one can, in the situation of proposition 5.17 once again find an a-priori bound thanks to the coercivity estimate (47). Let C > 0 be a constant such that ${{\Vert}u-Ru{\Vert}}_{p}{\leqslant}C\mathrm{min}{\left\{{\alpha }_{0},\dots ,{\alpha }_{k-1}\right\}}^{-1}{\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k}\left(u\right)$ for a linear and continuous projection operator R onto P^k−1 for all u ∈ BV(Ω). Further, assume that K is injective on P^k−1 and c > 0 is chosen such that c||Ru||_p ⩽ ||KRu||_Y for all u ∈ L^p(Ω). Then, for a solution u* of the minimisation problem

$\begin{equation*}\underset{u\in {L}^{p}\left({\Omega}\right)}{\mathrm{min}}\enspace \frac{1}{2}{{\Vert}Ku-f{\Vert}}_{Y}^{2}+{\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k}\left(u\right),\end{equation*}$

the norm ${{\Vert}{u}^{{\ast}}{\Vert}}_{p}$ obeys the a priori estimate (17) with α replaced by min{α₀, ..., α_k−1}. Also here, if the discrepancy is replaced by the Kullback–Leibler discrepancy S_f(v) = KL(v, f), then ${{\Vert}{u}^{{\ast}}{\Vert}}_{p}$ can be estimated analogously in terms of (18). Let, again, C_f ⩾ 0 be an a priori estimate for the optimal functional value, analogous to the C_f that leads to (33). Moreover, analogous to the multi-order infimal-convolution case in subsection 4.2, it is possible to estimate each tuple $\left({w}_{1}^{{\ast}},\dots ,{w}_{k-1}^{{\ast}}\right)$ that realises the minimum in the primal representation (42) of ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k}\left({u}^{{\ast}}\right)$ . Now, in order to estimate, for instance, ${{\Vert}{w}_{1}^{{\ast}}{\Vert}}_{1}$ , set ${w}_{0}^{{\ast}}={u}^{{\ast}}$ and note that we already have the bound ${{\Vert}{w}_{0}^{{\ast}}{\Vert}}_{1}{\leqslant}{\left\vert {\Omega}\right\vert }^{1/p}{{\Vert}{u}^{{\ast}}{\Vert}}_{p}$ where (17) or (18) provides an a-priori estimate of the right-hand side. Choosing a C₁ > 0 such that ${{\Vert}\mathcal{E}{w}_{0}{\Vert}}_{\mathcal{M}}{\leqslant}{C}_{1}\left({{\Vert}{w}_{0}{\Vert}}_{1}+\mathrm{min}{\left\{{\alpha }_{0},\dots ,{\alpha }_{k-1}\right\}}^{-1}{\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k}\left({w}_{0}\right)\right)$ for all w₀ ∈ BD(Ω, Sym⁰(R^d)) = BV(Ω), we obtain ${{\Vert}\mathcal{E}{w}_{0}^{{\ast}}{\Vert}}_{\mathcal{M}}{\leqslant}{C}_{1}\left({{\Vert}{w}_{0}^{{\ast}}{\Vert}}_{1}+\mathrm{min}{\left\{{\alpha }_{0},\dots ,{\alpha }_{k-1}\right\}}^{-1}{C}_{f}\right)$ and, consequently,

$\begin{equation*}\begin{aligned}\hfill {\alpha }_{k-1}{{\Vert}{w}_{1}^{{\ast}}{\Vert}}_{1}& {\leqslant}{\alpha }_{k-1}{{\Vert}\mathcal{E}{w}_{0}^{{\ast}}{\Vert}}_{\mathcal{M}}+{\alpha }_{k-1}{{\Vert}\mathcal{E}{w}_{0}^{{\ast}}-{w}_{1}^{{\ast}}{\Vert}}_{\mathcal{M}}{\leqslant}{\alpha }_{k-1}{{\Vert}\mathcal{E}{w}_{0}^{{\ast}}{\Vert}}_{\mathcal{M}}+{\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k}\left({u}^{{\ast}}\right)\hfill \\ \hfill & {\leqslant}{\alpha }_{k-1}{{\Vert}\mathcal{E}{w}_{0}^{{\ast}}{\Vert}}_{\mathcal{M}}+{C}_{f}.\hfill \end{aligned}\end{equation*}$

We thus obtain the bound

$\begin{equation}{{\Vert}{w}_{1}^{{\ast}}{\Vert}}_{1}{\leqslant}{C}_{1}{{\Vert}{w}_{0}^{{\ast}}{\Vert}}_{1}+\frac{\left({C}_{1}+1\right){C}_{f}}{\mathrm{min}\left\{{\alpha }_{0},\dots ,{\alpha }_{k-1}\right\}},\end{equation} \tag{ 51 }$

which is similar to (33), but involves a norm and not a seminorm due to the structure of ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k}$ . Using this line of argumentation, one can now inductively obtain bounds on w₂, ..., w_k−1 according to

$\begin{equation}{{\Vert}{w}_{i}^{{\ast}}{\Vert}}_{1}{\leqslant}{C}_{i}{{\Vert}{w}_{i-1}^{{\ast}}{\Vert}}_{1}+\frac{\left({C}_{i}+1\right){C}_{f}}{\mathrm{min}\left\{{\alpha }_{0},\dots ,{\alpha }_{k-1}\right\}}\end{equation} \tag{ 52 }$

for i = 1, ..., k − 1, where each C_i > 0 is a constant such that ${{\Vert}\mathcal{E}{w}_{i-1}{\Vert}}_{\mathcal{M}}{\leqslant}{C}_{i}\left({{\Vert}{w}_{i-1}{\Vert}}_{1}+\mathrm{min}{\left\{{\alpha }_{0},\dots ,{\alpha }_{k-i}\right\}}^{-1}{\mathrm{T}\mathrm{G}\mathrm{V}}_{\left({\alpha }_{0},\dots ,{\alpha }_{k-i}\right)}^{k-i+1}\left({w}_{i-1}\right)\right)$ for all w_i−1 ∈ BD(Ω, Symⁱ⁻¹(R^d)), whose existence is guaranteed by proposition 5.15. This provides an a-priori estimate for u* and ${w}_{1}^{{\ast}},\dots ,{w}_{k-1}^{{\ast}}$ .

Denoising performance. In figure 6, one can see how second-order TGV regularisation (figure 6(d)) performs in comparison to first-order TV (figure 6(b)) and α₁TV△α₂TV² (figure 6(c)) as regulariser for image denoising. It is apparent that TGV covers higher-order features more accurately than the associated infimal-convolution regulariser with the staircase effect being absent, while at the same time, jump discontinuities are preserved as for first-order TV. This is in particular reflected in the underlying function space for TGV being BV(Ω), see corollary 5.16. In conclusion, the total generalised variation can be seen as an adequate model for piecewise smooth images and will, in the following, be the preferred regulariser for this class of functions.

**Figure 6.** Total generalised variation denoising example. (a) Noisy image (PSNR: 13.9 dB), (b) regularisation with TV (PSNR: 29.3 dB), (c) regularisation with α₁TV△α₂TV² (PSNR: 28.9 dB), (d) regularisation with ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{2}$ (PSNR: 30.4 dB). All parameters were manually tuned via grid search to give highest PSNR with respect to the ground truth (figure 1(a)).
Download figure:
Standard image High-resolution image

5.3. Extensions

TGV for multichannel images. Again, in analogy to TV and higher-order TV, TGV can also be extended to colour and multichannel images represented by functions mapping into the vector space R^m by testing with ${\mathrm{S}\mathrm{y}\mathrm{m}}^{k}{\left({\mathbf{R}}^{d}\right)}^{m}$ -valued tensor fields. This requires to define pointwise norms on ${\mathrm{S}\mathrm{y}\mathrm{m}}^{l}{\left({\mathbf{R}}^{d}\right)}^{m}$ for l = 1, ..., k where, apart from the standard Frobenius norm, one can take any norm $\vert \cdot {\vert }_{{{\circ}}_{l}}$ on ${\mathrm{S}\mathrm{y}\mathrm{m}}^{l}{\left({\mathbf{R}}^{d}\right)}^{m}$ , noting that different norms imply different types of coupling of the multiple channels. With each $\vert \cdot {\vert }_{{{\ast}}_{l}}$ denoting the dual norm of $\vert \cdot {\vert }_{{{\circ}}_{l}}$ , ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k}$ can be extended to functions $u\in {L}_{\mathrm{l}\mathrm{o}\mathrm{c}}^{1}\left({\Omega},{\mathbf{R}}^{m}\right)$ as

$\begin{equation}{\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k}\left(u\right)=\mathrm{sup}\enspace \left\{\left.{\int }_{{\Omega}}u\cdot {\mathrm{d}\mathrm{i}\mathrm{v}}^{k}\varphi \enspace \mathrm{d}x\enspace \right\vert \enspace \varphi \in {\mathcal{C}}_{\mathrm{c}}^{k}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{k}{\left({\mathbf{R}}^{d}\right)}^{m}\right),\enspace \underset{l=0,\dots ,k-1}{\underbrace{{{\Vert}{\mathrm{d}\mathrm{i}\mathrm{v}}^{l}\varphi {\Vert}}_{\infty ,{{\ast}}_{l}}{\leqslant}{\alpha }_{l}}}\right\},\end{equation} \tag{ 53 }$

where ${{\Vert}\psi {\Vert}}_{\infty ,{{\ast}}_{l}}$ is the pointwise supremum of the scalar function $x{\mapsto}\vert \psi \left(x\right){\vert }_{{{\ast}}_{l}}$ on Ω for $\psi \in {\mathcal{C}}_{\mathrm{c}}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ . As before, by equivalence of norms in finite dimensions, the functional-analytic and regularisation properties of TGV transfer to its multichannel extension, see e.g. [27, 33]. Rotationally invariance holds whenever all tensor norms $\vert \cdot {\vert }_{{{\ast}}_{l}}$ are unitarily invariant. For k = 2, particular instances that are unitarily invariant can be constructed by choosing ${\left\vert \cdot \right\vert }_{{{\ast}}_{1}}$ as a unitarily invariant matrix norm and ${\left\vert \cdot \right\vert }_{{{\ast}}_{2}}$ as either the Frobenius tensor norm or ${\left\vert \xi \right\vert }_{{{\ast}}_{2}}={\sum }_{i=1}^{m}{\left\vert {\xi }_{i}\right\vert }_{{{\ast}}_{1}}$ , i.e., a decoupled norm. This allows, for instance, to penalise the nuclear norm of first-order derivatives and the Frobenius tensor norm of the second order component, as it was done, e.g., in [123].

Infimal-convolution TGV. Beyond the realisation of different couplings of multiple colour channels, the extension to arbitrary pointwise tensor norms in the definition of TGV can also be beneficial in the context of scalar-valued functions. In [109], the infimal convolution of different TGV functionals with different, anisotropic norms was considered in the context of dynamic data as well as anisotropic regularisation for still images. With ${\mathrm{T}\mathrm{G}\mathrm{V}}_{{\beta }_{i}}^{{k}_{i}}$ for i = 1, ..., n denoting TGV functionals according to (53) for m = 1 of order k_i and each β_i denoting a tuple of pointwise norms, the functional ${\mathrm{I}\mathrm{C}\mathrm{T}\mathrm{G}\mathrm{V}}_{\beta }^{n}$ can be defined for $u\in {L}_{\mathrm{l}\mathrm{o}\mathrm{c}}^{1}\left({\Omega}\right)$ as

$\begin{equation*}{\mathrm{I}\mathrm{C}\mathrm{T}\mathrm{G}\mathrm{V}}_{\beta }^{n}\left(u\right)=\underset{\genfrac{}{}{0pt}{}{{u}_{i}\in {L}_{\mathrm{l}\mathrm{o}\mathrm{c}}^{1}\left({\Omega}\right),}{{u}_{0}=u,\enspace {u}_{n}=0}}{\mathrm{inf}}\enspace \sum _{i=1}^{n}{\mathrm{T}\mathrm{G}\mathrm{V}}_{{\beta }_{i}}^{{k}_{i}}\left({u}_{i-1}-{u}_{i}\right).\end{equation*}$

As shown in [109], this functional is equivalent to ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k}$ for k = max{k_i} and α any parameter vector, and, in case ${\mathrm{I}\mathrm{C}\mathrm{T}\mathrm{G}\mathrm{V}}_{\beta }^{n}\left(u\right){< }\infty$ , the minimum is attained for u_i ∈ L^d/(d−1)(Ω) for i = 1, ..., n − 1. Hence, the coercivity estimate on ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k}$ transfers to ${\mathrm{I}\mathrm{C}\mathrm{T}\mathrm{G}\mathrm{V}}_{\beta }^{n}$ and again, all results in the context of Tikhonov regularisation apply.

For applications in the context of dynamic data, the norms for the different ${\mathrm{T}\mathrm{G}\mathrm{V}}_{{\beta }_{i}}^{{k}_{i}}$ can be chosen to realise different weightings of spatial and temporal derivatives. This allows, in a convex setting, for an adaptive regularisation of video data via a motion-dependent separation into different components, see, for instance, figure 7.

**Figure 7.** Frame of an image sequence showing a juggler (left), and three frames of a decomposition into components capturing slow (top right images) and fast (bottom right images) motion that was achieved with ICTGV regularisation. Reprinted from [35] by permission from Springer Nature Customer Service Centre GmbH: Springer © 2015.
Download figure:
Standard image High-resolution image

Similarly, for still image regularisation, one can choose ${\mathrm{T}\mathrm{G}\mathrm{V}}_{{\beta }_{1}}^{{k}_{1}}$ to employ isotropic norms and correspond to the usual total generalised variation, and each ${\mathrm{T}\mathrm{G}\mathrm{V}}_{{\beta }_{i}}^{{k}_{i}}$ for i = 2, ..., n to employ different anisotropic norms that favour one particular direction. This yields again an adaptive regularisation of image data via a decomposition into an isotropic and several anisotropic parts and can be employed, for instance, to recover certain line structures for denoising [109] or applications in CT imaging [125].

Oscillation TGV and its infimal convolution. The total generalised variation model can also be extended to account for functions with piecewise oscillatory behaviour, which is, for instance, useful to model texture in images [90]. The basic idea to include oscillations is to fix a direction ω ∈ R^d, ω ≠ 0 and to modify the definition of second-order TGV such that its kernel corresponds to oscillatory functions in the $\omega /\left\vert \omega \right\vert$ -direction with frequency $\left\vert \omega \right\vert$ :

$\begin{equation*}\begin{aligned}\hfill {\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha ,\omega }^{\mathrm{o}\mathrm{s}\mathrm{c}\mathrm{i}}\left(u\right)=\mathrm{sup}\enspace \left\{\left.{\int }_{{\Omega}}u\left({\mathrm{d}\mathrm{i}\mathrm{v}}^{2}\varphi +\varphi \cdot \omega \otimes \omega \right)\enspace \mathrm{d}x\right\vert \enspace \varphi \in {\mathcal{C}}_{\mathrm{c}}^{2}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{2}\left({\mathbf{R}}^{d}\right)\right),\right.& {{\Vert}\varphi {\Vert}}_{\infty }{\leqslant}{\alpha }_{0},\hfill \\ \hfill & \hfill \left.{{\Vert}\mathrm{d}\mathrm{i}\mathrm{v}\enspace \varphi {\Vert}}_{\infty }{\leqslant}{\alpha }_{1}\right\},\end{aligned}\end{equation*}$

where, as before, α = (α₀, α₁), α₀, α₁ > 0. Indeed, the kernel of ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha ,\omega }^{\mathrm{o}\mathrm{s}\mathrm{c}\mathrm{i}}$ is spanned by the functions x ↦ sin(x ⋅ ω) and x ↦ cos(x ⋅ ω). Further, the functional is proper, convex and lower semi-continuous in each L^p(Ω), and admits the minimum representation

$\begin{equation*}{\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha ,\omega }^{\mathrm{o}\mathrm{s}\mathrm{c}\mathrm{i}}\left(u\right)=\underset{w\in \mathrm{B}\mathrm{D}\left({\Omega}\right)}{\mathrm{min}}\enspace {\alpha }_{1}{{\Vert}\nabla u-w{\Vert}}_{\mathcal{M}}+{\alpha }_{0}{{\Vert}\mathcal{E}w+\left(\omega \otimes \omega \right)u{\Vert}}_{\mathcal{M}}.\end{equation*}$

With ${R}_{\omega }:{L}^{d/\left(d-1\right)}\left({\Omega}\right)\to \mathrm{ker}\left({\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha ,\omega }^{\mathrm{o}\mathrm{s}\mathrm{c}\mathrm{i}}\right)$ a linear and continuous projection, a coercivity estimate holds as follows:

$\begin{equation*}{{\Vert}u-{R}_{\omega }u{\Vert}}_{d/\left(d-1\right)}{\leqslant}C{\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha ,\omega }^{\mathrm{o}\mathrm{s}\mathrm{c}\mathrm{i}}\left(u\right)\end{equation*}$

for all u ∈ BV(Ω), see [90]. The functional can therefore be used as a regulariser in all cases where TV is applicable.

In order to obtain a texture-aware image model, one can now take the infimal convolution of ${\mathrm{T}\mathrm{G}\mathrm{V}}_{{\alpha }_{0}}^{2}$ with parameter vector ${\alpha }_{0}\in { \left.\right]0,\infty \left[\right. }^{2}$ and ${\mathrm{T}\mathrm{G}\mathrm{V}}_{{\alpha }_{i},{\omega }_{i}}^{\mathrm{o}\mathrm{s}\mathrm{c}\mathrm{i}}$ for parameter vectors ${\alpha }_{1},\dots ,{\alpha }_{n}\in {\left.\right]0,\infty \left[\right. }^{2}$ and directions ω₁, ..., ω_n ∈ R^d with ω_i ≠ 0 for i = 1, ..., n, i.e.,

$\begin{equation*}{\mathrm{I}\mathrm{C}\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha ,\omega }^{\mathrm{o}\mathrm{s}\mathrm{c}\mathrm{i}}={\mathrm{T}\mathrm{G}\mathrm{V}}_{{\alpha }_{0}}^{2}{\triangle}{\mathrm{T}\mathrm{G}\mathrm{V}}_{{\alpha }_{1},{\omega }_{1}}^{\mathrm{o}\mathrm{s}\mathrm{c}\mathrm{i}}{\triangle}\cdots {\triangle}{\mathrm{T}\mathrm{G}\mathrm{V}}_{{\alpha }_{n},{\omega }_{n}}^{\mathrm{o}\mathrm{s}\mathrm{c}\mathrm{i}},\end{equation*}$

which again yields a proper, convex and lower semi-continuous regulariser on each L^p(Ω) which is coercive in the sense that ${{\Vert}u-Ru{\Vert}}_{d/\left(d-1\right)}{\leqslant}C{\mathrm{I}\mathrm{C}\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha ,\omega }^{\mathrm{o}\mathrm{s}\mathrm{c}\mathrm{i}}\left(u\right)$ for a linear and continuous projection $R:{L}^{d/\left(d-1\right)}\left({\Omega}\right)\to \mathrm{ker}\left({\mathrm{I}\mathrm{C}\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha ,\omega }^{\mathrm{o}\mathrm{s}\mathrm{c}\mathrm{i}}\right)=\mathrm{ker}\left({\mathrm{T}\mathrm{G}\mathrm{V}}_{{\alpha }_{0}}^{2}\right)+\mathrm{ker}\left({\mathrm{T}\mathrm{G}\mathrm{V}}_{{\alpha }_{1},{\omega }_{1}}^{\mathrm{o}\mathrm{s}\mathrm{c}\mathrm{i}}\right)+\cdots +\mathrm{ker}\left({\mathrm{T}\mathrm{G}\mathrm{V}}_{{\alpha }_{n},{\omega }_{n}}^{\mathrm{o}\mathrm{s}\mathrm{c}\mathrm{i}}\right)$ , see again [90]. It is therefore again applicable as a regulariser for inverse problems whenever TV is applicable. See figure 8 for an example of ICTGV^osci-based denoising and its benefits for capturing and reconstructing textured regions.

**Figure 8.** Example of ICTGV^osci denoising. In the top row, the whole image is depicted, while a closeup of the respective marked region is shown in the bottom row. (a) A noisy image (PSNR: 26.0 dB). (b) Results of TGV²-denoising (PSNR: 34.8 dB). (c) Results of ICTGV^osci denoising (PSNR: 36.6 dB). Parameters were manually optimised via grid search towards best peak signal-to-noise ratio (PSNR) with respect to the ground truth (not shown). Figure taken from [90]. Copyright © 2018 Society for Industrial and Applied Mathematics. Reprinted with permission. All rights reserved.
Download figure:
Standard image High-resolution image

TGV for manifold-valued data. In different applications in inverse problems and imaging, the data of interest takes values not in a vector space but rather a non-linear space such as a manifold. Examples are sphere-valued data in synthetic aperture radar (SAR) imaging or data with values in the space of positive matrices, equipped with the Fisher–Rao metric, which is used diffusion tensor imaging. Motivated by such applications, TV regularisation has been extended to cope with manifold-valued data, using different approaches and numerical algorithms [68, 98, 129, 192]. A rather simple extension of TV for discrete and finite, univariate signals ${\left({u}_{i}\right)}_{i}$ living in a complete Riemannian manifold $\mathcal{M}\subset {\mathbf{R}}^{d}$ with metric ${d}_{\mathcal{M}}$ is given as

$\begin{equation*}\mathrm{T}\mathrm{V}\left(u\right)=\sum _{i}{d}_{\mathcal{M}}\left({u}_{i+1},{u}_{i}\right).\end{equation*}$

For this setting, and an extension to bivariate signals, the work [192] provides simple numerical algorithms which yield, in case $\mathcal{M}$ is a Hadamard space, globally optimal solutions of variational TV denoising for manifold-valued data. While this allows in particular to extend edge-preserving regularisation to non-linear geometric data, it can again be observed that TV regularisation has a tendency towards piecewise constant solutions with artificial jump discontinuities. To overcome this, different works have proposed extensions of this approach to higher-order TV [10], the (TV–TV²)-infimal convolution [14, 15] and second-order TGV [15, 36]. Here we briefly sketch the main underlying ideas, presented in [36], for an extension of TGV to manifold-valued data. For simplicity, we consider only the case of univariate signals ${\left({u}_{i}\right)}_{i}$ and assume that length-minimising geodesics are unique (see [36] for the general case and details on the involved differential-geometric concepts).

From the continuous perspective, a natural approach to extend TGV for manifold-valued data, at least in a smooth setting, would be to use tangent spaces for first-order derivatives and, for the second-order term, invoke a connection on the manifold for the differentiation of vector fields. In contrast to that, the motivation for the definition of TGV as in [36] was to exploit a discrete setting in order to avoid high-level differential-geometric concepts but rather to come up with a definition of TGV that can be written only in terms of the distance function on the manifold. To this aim, we identify tangential vectors $v\in {T}_{a}\mathcal{M}$ , with ${T}_{a}\mathcal{M}$ denoting the tangent space at a point $a\in \mathcal{M}$ , with point tuples [a, b] via the exponential map b = exp_a(v). A discrete gradient operator then maps a signal ${\left({u}_{i}\right)}_{i}$ to a sequence of point-tuples ${\left(\left[{u}_{i},{u}_{i+1}\right]\right)}_{i}$ , where we regard (∇u)_i = [u_i, u_i+1], which generalises first-order differences in vector spaces, since in this case, ${\mathrm{exp}}_{{u}_{i}}\left({u}_{i+1}-{u}_{i}\right)={u}_{i+1}$ . Vector fields whose base points are ${\left({u}_{i}\right)}_{i}$ can then be identified with a sequence ${\left(\left[{u}_{i},{y}_{i}\right]\right)}_{i}$ with each ${y}_{i}\in \mathcal{M}$ and, assuming $D:{\mathcal{M}}^{2}{\times}{\mathcal{M}}^{2}\to \left[\right.0,\infty \left[\right.$ to be an appropriate distance-type function for such tuples, an extension of second-order TGV can be given as

$\begin{equation*}\text{M}\text{-}\mathrm{T}\mathrm{G}\mathrm{V}\left({\left({u}_{i}\right)}_{i}\right)=\underset{{\left({y}_{i}\right)}_{i}}{\mathrm{min}}\enspace \sum _{i}{\alpha }_{1}D\left(\left[{u}_{i},{u}_{i+1}\right],\left[{u}_{i},{y}_{i}\right]\right)+{\alpha }_{0}D\left(\left[{u}_{i},{y}_{i}\right],\left[{u}_{i-1},{y}_{i-1}\right]\right).\end{equation*}$

The difficulty here is in particular how to define D for two point tuples with different base points, as those represent vectors in different tangent spaces. To overcome this, a variant for D as proposed in [36] uses the Schild's ladder [119] construction as a discrete approximation of the parallel transport of vector fields between different tangent spaces. In order to describe this construction, denote by [u,v]_t for $u,v\in \mathcal{M}$ and t ∈ R the point reached at time t after travelling on the geodesic from u to v, i.e., [u,v]_t = exp_u(t log_u(v)), where log is the inverse exponential map. Then, the parallel transport of [u, v] (which represents ${\mathrm{log}}_{u}\left(v\right)\in {T}_{u}\mathcal{M}$ ) to the base point $x\in \mathcal{M}$ is approximated by [x, y'] where ${y}^{\prime }={\left[u,{\left[x,v\right]}_{\frac{1}{2}}\right]}_{2}$ (which represents ${\mathrm{log}}_{x}\left({y}^{\prime }\right)\in {T}_{x}\mathcal{M}$ ). Using this, a distance on point tuples, denoted by D_S, can be given as

$\begin{equation*}{D}_{S}\left(\left[x,y\right],\left[u,v\right]\right)={d}_{\mathcal{M}}\left({y}^{\prime },y\right)\quad \text{with}\quad {y}^{\prime }={\left[u,{\left[x,v\right]}_{\frac{1}{2}}\right]}_{2}.\end{equation*}$

Exploiting the fact that ${D}_{S}\left(\left[u,v\right],\left[u,w\right]\right)={d}_{\mathcal{M}}\left(v,w\right)$ for tuples having the same base point, a concrete realisation of discrete second order TGV for manifold-valued data is then given as

$\begin{equation*}\text{S}\text{-}\mathrm{T}\mathrm{G}\mathrm{V}\left({\left({u}_{i}\right)}_{i}\right)=\underset{{\left({y}_{i}\right)}_{i}}{\mathrm{min}}\enspace \sum _{i}{\alpha }_{1}{d}_{\mathcal{M}}\left({u}_{i+1},{y}_{i}\right)+{\alpha }_{0}{D}_{S}\left(\left[{u}_{i},{y}_{i}\right],\left[{u}_{i-1},{y}_{i-1}\right]\right).\end{equation*}$

The S-TGV denoising problem for ${\left({f}_{i}\right)}_{i}$ some given data with ${f}_{i}\in \mathcal{M}$ then reads as

$\begin{equation*}\underset{{\left({u}_{i}\right)}_{i}}{\mathrm{min}}\enspace \text{S}\text{-}\mathrm{T}\mathrm{G}\mathrm{V}\left({\left({u}_{i}\right)}_{i}\right)+\lambda \sum _{i}{d}_{\mathcal{M}}{\left({u}_{i},{f}_{i}\right)}^{2},\end{equation*}$

and a numerical solution (which can only be guaranteed to deliver stationary points due to non-convexity) can be obtained, for instance, using the cyclic proximal point algorithm [9, 36]. Figure 9 shows the results for this setting using both TV and second-order TGV regularisation for the denoising of ${\mathcal{S}}^{2}$ valued image data, which is composed of different blocks of smooth data with sharp interfaces. It can be seen that both TV and TGV are able to recover the sharp interfaces, but TV suffers from piecewise-constancy artefacts which are not present with TGV.

**Figure 9.** Example of variational denoising for manifold-valued data. The images show noisy ${\mathcal{S}}^{2}$ -valued data (left) which is denoised with TV-regulariser (middle) and TGV-regulariser (right). The sphere ${\mathcal{S}}^{2}$ is colour-coded with hue and value representing the longitude and latitude, respectively. All parameters were optimised via grid search towards the best result with respect to a suitable distance measure for manifold-valued data, see [36] for details.
Download figure:
Standard image High-resolution image

**Figure 9.** Example of variational denoising for manifold-valued data. The images show noisy ${\mathcal{S}}^{2}$ -valued data (left) which is denoised with TV-regulariser (middle) and TGV-regulariser (right). The sphere ${\mathcal{S}}^{2}$ is colour-coded with hue and value representing the longitude and latitude, respectively. All parameters were optimised via grid search towards the best result with respect to a suitable distance measure for manifold-valued data, see [36] for details.
Download figure:
Standard image High-resolution image

Image-driven TGV. In case of denoising problems, i.e., f ∈ L^p(Ω), second-order TGV can be modified to incorporate directional information obtained from f, resulting in image-driven TGV (ITGV) [152]. The latter is defined by introducing a diffusion tensor field into the functional:

$\begin{equation*}{\mathrm{I}\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{2}\left(u\right)=\underset{w\in \mathrm{B}\mathrm{D}\left({\Omega}\right)}{\mathrm{min}}\enspace {\alpha }_{1}{\int }_{{\Omega}}\enspace \mathrm{d}\left\vert {D}^{1/2}\nabla u-w\right\vert +{\alpha }_{0}{\int }_{{\Omega}}\enspace \mathrm{d}\left\vert \mathcal{E}w\right\vert \end{equation*}$

where $D:\overline{{\Omega}}\to {\mathrm{S}\mathrm{y}\mathrm{m}}^{2}\left({\mathbf{R}}^{d}\right)$ is assumed to be continuous and positive semi-definite in each point. Denoting by f_σ = f*G_σ a smoothed version of the data f obtained by convolution with a Gaussian kernel G_σ of variance σ > 0 and suitable extension outside of Ω, the diffusion tensor field D may be chosen according to

$\begin{equation*}D=\mathrm{i}\mathrm{d}-\left(1-{\mathrm{e}}^{-\gamma {\left\vert \nabla {f}_{\sigma }\right\vert }^{\beta }}\right)\frac{\nabla {f}_{\sigma }}{\left\vert \nabla {f}_{\sigma }\right\vert }\otimes \frac{\nabla {f}_{\sigma }}{\left\vert \nabla {f}_{\sigma }\right\vert }\end{equation*}$

with parameters γ > 0 and β > 0. If the smallest eigenvalue of D is uniformly bounded away from 0 in $\overline{{\Omega}}$ , then ITGV admits the same functional-analytic and regularisation properties as second-order TGV. We refer to [152] for an application and numerical results regarding this regularisation approach in stereo estimation.

Non-local TGV. The concept of non-local total variation (NLTV) [94] can also be transferred to the total generalised variation. Recall that instead of taking the derivative, non-local total variation penalises the differences of the function values of u for each pair of points by virtue of a weight function:

$\begin{equation*}\mathrm{N}\mathrm{L}\mathrm{T}\mathrm{V}\left(u\right)={\int }_{{\Omega}}{\int }_{{\Omega}}a\left(x,y\right)\left\vert u\left(y\right)-u\left(x\right)\right\vert \enspace \mathrm{d}y\enspace \mathrm{d}x,\end{equation*}$

where the weight function a : Ω × Ω → [0, ∞] is measurable and a.e. bounded from below by a positive constant. We note that, alternatively, the weight function a may also be chosen as a(x, y) = |x − y|^−(θ+d) with θ ∈ ]0, 1[ such that low-order Sobolev–Slobodeckij seminorms can be realised [76]. In the context of non-local total variation, a allows to incorporate a priori information for the image to reconstruct. For instance, if one already knows disjoint segments Ω₁, ..., Ω_n where the solution is piecewise constant, one can set

$\begin{equation*}a\left(x,y\right)=\begin{cases}_{1}\hfill & \text{if}\enspace x,y\in {{\Omega}}_{i}\enspace \text{for}\;\text{some}\enspace i,\hfill \\ {c}_{0}\hfill & \text{else},\hfill \end{cases}\end{equation*}$

where c₁ ≫ c₀ > 0. This way, the difference between two function values of u in Ω_i is forced to 0, meaning u is constant in Ω_i.

Non-local total generalised variation now gives the possibility to enforce piecewise linearity of u in the segments by incorporating the vector field w corresponding to the slope of the linear part in a non-local cascade. This results in

$\begin{equation*}\begin{aligned}\hfill {\mathrm{N}\mathrm{L}\mathrm{T}\mathrm{G}\mathrm{V}}^{2}\left(u\right)& =\underset{w\in {L}^{1}\left({\Omega},{\mathbf{R}}^{d}\right)}{\mathrm{inf}}{\int }_{{\Omega}}{\int }_{{\Omega}}{a}_{1}\left(x,y\right)\left\vert u\left(y\right)-u\left(x\right)-w\left(x\right)\cdot \left(y-x\right)\right\vert \enspace \mathrm{d}y\enspace \mathrm{d}x\hfill \\ \hfill & \quad +{\int }_{{\Omega}}{\int }_{{\Omega}}{a}_{0}\left(x,y\right)\left\vert w\left(y\right)-w\left(x\right)\right\vert \enspace \mathrm{d}y\enspace \mathrm{d}x.\hfill \end{aligned}\end{equation*}$

with two weight functions a₀, a₁ : Ω × Ω → [0, ∞], again measurable and bounded a.e. away from zero [154]. In analogy to NLTV, a priori information on, for instance, disjoint segments where the sought solution is piecewise linear, allows to choose weight functions such that the associated NLTGV² regulariser properly reflects this information. See figure 10 for a denoising example where non-local TGV turns out to be beneficial, in particular in the regions near the jump discontinuities of sought solution.

**Figure 10.** Example for non-local total generalised variation denoising. (a) An noisy piecewise linear image. (b) Results of ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{2}$ -denoising together with a surface plot of its graph. (c) Results of non-local TGV-denoising together with a surface plot of its graph. Images taken from [154]. Reprinted by permission from Springer Nature.
Download figure:
Standard image High-resolution image

6. Numerical algorithms

Tikhonov regularisation with higher-order total variation, its combination via addition or infimal convolution, as well as total generalised variation poses a non-smooth optimisation problem in an appropriate Lebesgue function space. In practice, these minimisation problems are discretised and solved by optimisation algorithms that exploit the structure of the discrete problem. While there are many possibilities for a discretisation of the considered regularisation functionals as well as for numerical optimisation, most of the algorithms that can be found in the literature base on finite-difference discretisation and first-order proximal optimisation methods. In the following, we provide an overview of the building blocks necessary to solve the considered Tikhonov functional minimisation problems numerically. We will exemplarily discuss the derivation of respective algorithms on the basis of the popular primal-dual algorithm with extragradient [60] and, as alternative, briefly address implicit and preconditioned optimisation methods.

6.1. Discretisation of higher-order TV functionals

We discretise the discussed functionals in 2D, higher dimensions follow by analogy. Moreover, for the sake of simplicity, we assume a rectangular domain, i.e., ${\Omega}=\left.\right]0,{N}_{1}\left[\right.{\times}\left.\right]0,{N}_{2}\left[\right.\subset {\mathbf{R}}^{2}$ for some positive N₁, N₂ ∈ N. A generalisation to non-rectangular domains will be straightforward.

Following essentially the presentation in [37] we first replace Ω by the discretised grid

$\begin{equation*}{{\Omega}}_{h}=\left\{\left(i,j\right)\enspace \vert \enspace i,j\in \mathbf{N},\enspace 1{\leqslant}i{\leqslant}{N}_{1},\enspace 1{\leqslant}j{\leqslant}{N}_{2}\right\}.\end{equation*}$

One consistent way of discretising higher-order derivatives is to define partial derivatives as follows: a discrete partial derivative takes the difference between two neighbouring elements in the grid with respect to a specified axis. This difference is associated with the midpoint between the two grid elements, resulting in staggered grids. For a finite sequence of directions p ∈ ⋃_k⩾0{1,2}^k, this results, on the one hand, in the recursively defined grids

$\begin{equation}\begin{aligned}\hfill {{\Omega}}_{h}^{\left(\right)}& ={{\Omega}}_{h},\hfill \\ \hfill {{\Omega}}_{h}^{\left(1,{p}_{k},\dots ,{p}_{1}\right)}& =\left\{\left(i+\frac{1}{2},j\right)\enspace \vert \enspace \left(i,j\right),\left(i+1,j\right)\in {{\Omega}}_{h}^{\left({p}_{k},\dots ,{p}_{1}\right)}\right\},\hfill \\ \hfill {{\Omega}}_{h}^{\left(2,{p}_{k},\dots ,{p}_{1}\right)}& =\left\{\left(i,j+\frac{1}{2}\right)\enspace \vert \enspace \left(i,j\right),\left(i,j+1\right)\in {{\Omega}}_{h}^{\left({p}_{k},\dots ,{p}_{1}\right)}\right\}.\hfill \end{aligned}\end{equation} \tag{ 54 }$

Note that ${{\Omega}}_{h}^{p}$ does not depend on the order of the p_i and one could use multiindices in N² instead. Likewise, the discrete partial derivatives recursively given by

$\begin{equation}\begin{aligned}\hfill {\partial }^{\left(\right)}u& =u,\hfill \\ \hfill {\left({\partial }_{h}^{\left(1,{p}_{k},\dots ,{p}_{1}\right)}u\right)}_{i,j}& ={\left({\partial }_{h}^{\left({p}_{k},\dots ,{p}_{1}\right)}u\right)}_{i+\frac{1}{2},j}-{\left({\partial }_{h}^{\left({p}_{k},\dots ,{p}_{1}\right)}u\right)}_{i-\frac{1}{2},j},\hfill \\ \hfill {\left({\partial }_{h}^{\left(2,{p}_{k},\dots ,{p}_{1}\right)}u\right)}_{i,j}& ={\left({\partial }_{h}^{\left({p}_{k},\dots ,{p}_{1}\right)}u\right)}_{i,j+\frac{1}{2}}-{\left({\partial }_{h}^{\left({p}_{k},\dots ,{p}_{1}\right)}u\right)}_{i,j-\frac{1}{2}},\hfill \end{aligned}\end{equation} \tag{ 55 }$

yield well-defined functions ${\partial }_{h}^{p}u:{{\Omega}}_{h}^{p}\to \mathbf{R}$ for u : Ω_h → R which do not depend on the order of the entries in p. The discrete gradient ${\nabla }_{h}^{k}u$ of order k ⩾ 1 for u : Ω_h → R is then the tuple that collects all the partial derivatives of order k:

Note that due to this construction, the partial derivatives ${\partial }_{h}^{p}u$ are generally defined on different grids. However, in order to define the Frobenius norm of ${\nabla }_{h}^{k}$ , and, consequently, an ℓ¹-type norm, a common grid is needed. There are several possibilities for this task (such as interpolation) which has been studied mainly for the first-order total variation. Here, we discuss a strategy that results in a simple definition of a discrete higher-order total variation. It bases on collecting, for (i, j) ∈ Z², all the nearby points in the different ${{\Omega}}_{h}^{p}$ . This can, for instance, be done by moving half-steps forward and backward in the directions indicated by p ∈ {1,2}^k:

$\begin{equation}\left({i}_{p},{j}_{p}\right)=\left(i,j\right)+\frac{1}{2}\sum _{m=1}^{k}{\left(-1\right)}^{m+1}{e}_{{p}_{m}}\end{equation} \tag{ 56 }$

where e₁, e₂ are the unit vectors in R². The Frobenius norm in a point (i, j) ∈ Z² is then given by

$\begin{equation}{\left\vert {\nabla }_{h}^{k}u\right\vert }_{i,j}={\left(\sum _{p\in {\left\{1,2\right\}}^{k}}{\left\vert {\left({\partial }_{h}^{p}u\right)}_{{i}_{p},{j}_{p}}\right\vert }^{2}\right)}^{1/2},\end{equation} \tag{ 57 }$

where ${\partial }_{h}^{p}u$ is extended by zero outside of ${{\Omega}}_{h}^{p}$ . Note that here, although ${\partial }_{h}^{p}$ does not depend on the order of discrete differentiation, the point (i_p, j_p) does. Thus, a different Frobenius norm for the kth discrete derivative would be constituted by symmetrisation, which means symmetrising ${\nabla }_{h}^{k}u$ and taking the Frobenius norm afterwards. In this context, is makes sense to average over the grid points as follows. Denoting by α(p) ∈ N² the multiindex associated with p ∈ {1,2}^k, i.e., α(p)_i = #{m|p_m = i}, we define

$\begin{equation}\left({i}_{\alpha },{j}_{\alpha }\right)={\left(\genfrac{}{}{0pt}{}{\left\vert \alpha \right\vert }{\alpha }\right)}^{-1}\sum _{p\in {\left\{1,2\right\}}^{\left\vert \alpha \right\vert },\alpha \left(p\right)=\alpha }\left({i}_{p},{j}_{p}\right)\end{equation} \tag{ 58 }$

for (i, j) ∈ Z² and α ∈ N², where $\left(\genfrac{}{}{0pt}{}{\left\vert \alpha \right\vert }{\alpha }\right)=\frac{\left({\alpha }_{1}+{\alpha }_{2}\right)!}{{\alpha }_{1}!{\alpha }_{2}!}$ . Then, the grid associated with an α ∈ N² reads as

$\begin{equation*}\quad {{\Omega}}_{h}^{\alpha }=\left\{\left({i}_{\alpha },{j}_{\alpha }\right)\enspace \vert \enspace \left(i,j\right)\in {\mathbf{Z}}^{2},\enspace \left({i}_{p},{j}_{p}\right)\in {{\Omega}}_{h}^{p}\quad \text{for}\enspace \text{some}\quad p\in {\left\{1,2\right\}}^{k}\;\;\;\text{with}\quad \alpha \left(p\right)=\alpha \right\},\end{equation*}$

while the α-component of the symmetrised gradient is given by

$\begin{equation*}{\left(\vert \vert \vert {\partial }_{h}^{\alpha }u\right)}_{{i}_{\alpha },{j}_{\alpha }}={\left(\genfrac{}{}{0pt}{}{\left\vert \alpha \right\vert }{\alpha }\right)}^{-1}\sum _{p\in {\left\{1,2\right\}}^{\left\vert \alpha \right\vert },\alpha \left(p\right)=\alpha }{\left({\partial }_{h}^{p}u\right)}_{{i}_{p},{j}_{p}}\end{equation*}$

where (i, j) ∈ Z² is chosen such that $\left({i}_{\alpha },{j}_{\alpha }\right)\in {{\Omega}}_{h}^{\alpha }$ and ${\left({\partial }_{h}^{p}u\right)}_{{i}_{p},{j}_{p}}$ is zero for points outside of ${{\Omega}}_{h}^{p}$ . This results in the symmetrised derivative as follows:

The Frobenius norm of ${\mathcal{E}}_{h}^{k}u$ in a point (i, j) ∈ Z² can finally be obtained by

$\begin{equation}{\left\vert {\mathcal{E}}_{h}^{k}u\right\vert }_{i,j}={\left(\sum _{\alpha \in {\mathbf{N}}^{2},\left\vert \alpha \right\vert =k}\left(\genfrac{}{}{0pt}{}{\left\vert \alpha \right\vert }{\alpha }\right){\left\vert {\left(\vert \vert \vert {\partial }_{h}^{\alpha }u\right)}_{{i}_{\alpha },{j}_{\alpha }}\right\vert }^{2}\right)}^{1/2}.\end{equation} \tag{ 59 }$

Remark 6.1. For α ∈ N² with $\left\vert \alpha \right\vert$ even, we have (i_α, j_α) = (i, j) for each (i, j) ∈ Z. Indeed, for $p\in {\left\{1,2\right\}}^{\left\vert \alpha \right\vert }$ and the reversed tuple $\overline{p}=\left({p}_{\left\vert \alpha \right\vert },\dots ,{p}_{1}\right)$ it holds $\alpha \left(p\right)=\alpha \left(\overline{p}\right)$ . Further, either $p=\overline{p}$ leading to (i_p, j_p) = (i, j) or $p\ne \overline{p}$ leading to $\left({i}_{p},{j}_{p}\right)+\left({i}_{\overline{p}},{j}_{\overline{p}}\right)=2\left(i,j\right)$ . Consequently, (i_α, j_α) = (i, j) according to the definition. In other words, the symmetrisation of the discrete gradient is a natural way of aligning the different grids ${{\Omega}}_{h}^{p}$ to a common grid in this case.

For $\left\vert \alpha \right\vert$ odd, the grid points still do not align. However, we can say that for (i, j), the point (i_α, j_α) lies on the line connecting $\left(i+\frac{1}{2},j\right)$ and $\left(i,j+\frac{1}{2}\right)$ . Indeed, for $p\in {\left\{1,2\right\}}^{\left\vert \alpha \right\vert }$ with α(p) = α we can consider $\overline{p}=\left({p}_{\left\vert \alpha \right\vert -1},\dots ,{p}_{1},{p}_{\left\vert \alpha \right\vert }\right)$ . If $p=\overline{p}$ , then (i_p, j_p) is either $\left(i+\frac{1}{2},j\right)$ or $\left(i,j+\frac{1}{2}\right)$ . In the case $p\ne \overline{p}$ , the point $\frac{1}{2}\left({i}_{p},{j}_{p}\right)+\frac{1}{2}\left({i}_{\overline{p}},{j}_{\overline{p}}\right)$ is either $\left(i+\frac{1}{2},j\right)$ or $\left(i,j+\frac{1}{2}\right)$ . As (i_α, j_α) is a convex combination of such points, it lies on the line connecting $\left(i+\frac{1}{2},j\right)$ and $\left(i,j+\frac{1}{2}\right)$ . Hence, the symmetrisation of the gradient leads to more localised grid points.

We now have everything at hand to define two versions of a discrete total variation of arbitrary order.

Definition 6.2. Let k ∈ N, k ⩾ 1 a differentiation order. Then, for u : Ω_h → R, the discrete total variation is defined as

$\begin{equation*}{\mathrm{T}\mathrm{V}}_{h}^{k}\left(u\right)={{\Vert}{\nabla }_{h}^{k}u{\Vert}}_{1}=\sum _{\left(i,j\right)\in {\mathbf{Z}}^{2}}{\left\vert {\nabla }_{h}^{k}u\right\vert }_{i,j}\end{equation*}$

with ${\left\vert {\nabla }_{h}^{k}u\right\vert }_{i,j}$ according to (57), and discrete total variation for the symmetrised gradient is defined as

$\begin{equation*}{\mathrm{T}\mathrm{V}}_{h,\mathrm{s}\mathrm{y}\mathrm{m}}^{k}\left(u\right)={{\Vert}{\mathcal{E}}_{h}^{k}u{\Vert}}_{1}=\sum _{\left(i,j\right)\in {\mathbf{Z}}^{2}}{\left\vert {\mathcal{E}}_{h}^{k}u\right\vert }_{i,j},\end{equation*}$

with ${\left\vert {\mathcal{E}}_{h}^{k}u\right\vert }_{i,j}$ according to (59).

In order to define a discrete version of the total generalised variation, we still need to discuss the discretisation of the total deformation for discrete symmetric tensor fields. For this purpose, we say that the components of a discrete symmetric tensor field of order l ∈ N, live on the grids ${{\Omega}}_{h}^{\alpha }$ , resulting in

realising a discrete symmetric tensor field of order l. Its Frobenius norm is given in the points (i, j) ∈ Z² according to

$\begin{equation}{\left\vert u\right\vert }_{i,j}={\left(\sum _{\alpha \in {\mathbf{N}}^{2},\left\vert \alpha \right\vert =l}\left(\genfrac{}{}{0pt}{}{\left\vert \alpha \right\vert }{\alpha }\right){\left\vert {\left({u}_{\alpha }\right)}_{{i}_{\alpha },{j}_{\alpha }}\right\vert }^{2}\right)}^{1/2},\end{equation} \tag{ 60 }$

which is compatible with (59) if one plugs in ${\mathcal{E}}_{h}^{l}u$ for some u : Ω_h → R. The partial derivative of order k described by p ∈ {1,2}^k applied to u_α is then also given by (55), but acts on the grid ${{\Omega}}_{h}^{\alpha }$ and results in a discrete function on the grid ${{\Omega}}_{h}^{\alpha ,p}$ which is given in analogy to (54) by replacing Ω_h with ${{\Omega}}_{h}^{\alpha }$ . The symmetrised derivative ${\mathcal{E}}_{h}^{k}u$ , whose components are indexed by β ∈ N², $\left\vert \beta \right\vert =k+l$ , is then defined in a point $\left({i}_{\beta },{j}_{\beta }\right)\in {{\Omega}}_{h}^{\beta }$ where (i, j) ∈ Z² by

$\begin{equation}{\left({\left({\mathcal{E}}_{h}^{k}u\right)}_{\beta }\right)}_{{i}_{\beta },{j}_{\beta }}={\left(\genfrac{}{}{0pt}{}{\left\vert \beta \right\vert }{\beta }\right)}^{-1}\sum _{\alpha \in {\mathbf{N}}^{2},p\in {\left\{1,2\right\}}^{k},\alpha +\alpha \left(p\right)=\beta }\left(\genfrac{}{}{0pt}{}{\left\vert \alpha \right\vert }{\alpha }\right){\left({\partial }_{h}^{p}{u}_{\alpha }\right)}_{{i}_{\alpha ,p},{j}_{\alpha ,p}}\end{equation} \tag{ 61 }$

where

$\begin{equation*}\left({i}_{\alpha ,p},{j}_{\alpha ,p}\right)=\left({i}_{\alpha },{j}_{\alpha }\right)+\frac{1}{2}\sum _{m=1}^{k}{\left(-1\right)}^{l+m+1}{e}_{{p}_{m}}.\end{equation*}$

This is sufficient to define a discrete total deformation.

Definition 6.3. Let k, l ∈ N, k ⩾ 1 and l ⩾ 0. Then, for $u={\left({u}_{\alpha }\right)}_{\alpha \in {\mathbf{N}}^{2},\left\vert \alpha \right\vert =l}$ , the discrete total deformation of order k is defined as

$\begin{equation*}{\mathrm{T}\mathrm{D}}_{h}^{k}\left(u\right)={{\Vert}{\mathcal{E}}_{h}^{k}u{\Vert}}_{1}=\sum _{\left(i,j\right)\in {\mathbf{Z}}^{2}}{\left\vert {\mathcal{E}}_{h}^{k}u\right\vert }_{i,j},\end{equation*}$

with ${\mathcal{E}}_{h}^{k}u$ according to (61) and ${\left\vert \cdot \right\vert }_{i,j}$ according to (60).

For the sake of completeness, the respective definitions for non-symmetric tensor fields of order l read as

and the pth component, p ∈ {1,2}^k+l, of the discrete gradient of order k is given as

$\begin{equation}{\left({\nabla }_{h}^{k}u\right)}_{p}={\partial }_{h}^{\left({p}_{k+l},\dots ,{p}_{l+1}\right)}{u}_{\left({p}_{l},\dots ,{p}_{1}\right)}.\end{equation} \tag{ 63 }$

For numerical algorithms, it is necessary to write the discrete total variation and total deformation as the one-norm of a (symmetric) tensor with respect to the respective discrete differentiation operator. We therefore introduce the underlying spaces.

Definition 6.4. Let l ∈ N, and q ∈ [1, ∞]. The ℓ^q-space of discrete l-tensors on Ω_h is given by

$\begin{equation*}{\ell }^{q}\left({{\Omega}}_{h},{\mathcal{T}}^{l}\left({\mathbf{R}}^{2}\right)\right)=\left\{u\enspace \vert \enspace u={\left({u}_{p}\right)}_{p\in {\left\{1,2\right\}}^{l}},\enspace {u}_{p}:{{\Omega}}_{h}^{p}\to \mathbf{R}\enspace \text{for}\;\text{all}\enspace p\in {\left\{1,2\right\}}^{l}\right\}\end{equation*}$

with ${{\Omega}}_{h}^{p}$ according to (54), and norm

$\begin{equation*}\begin{aligned}\hfill {{\Vert}u{\Vert}}_{q}={\left(\sum _{\left(i,j\right)\in {\mathbf{Z}}^{2}}{\left\vert u\right\vert }_{i,j}^{q}\right)}^{1/q}\quad \text{if}\quad q{< }\infty ,\quad {{\Vert}u{\Vert}}_{\infty }=\underset{\left(i,j\right)\in {\mathbf{Z}}^{2}}{\mathrm{max}}{\left\vert u\right\vert }_{i,j},\end{aligned}\end{equation*}$

with pointwise norm according to (62). The space ${\ell }^{2}\left({{\Omega}}_{h},{\mathcal{T}}^{l}\left({\mathbf{R}}^{2}\right)\right)$ is equipped with the scalar product

$\begin{equation*}{\langle u,\enspace v\rangle }_{{\ell }^{2}\left({{\Omega}}_{h},{\mathcal{T}}^{l}\left({\mathbf{R}}^{2}\right)\right)}=\sum _{\left(i,j\right)\in {\mathbf{Z}}^{2}}\sum _{p\in {\left\{1,2\right\}}^{l}}{\left({u}_{p}\right)}_{{i}_{p},{j}_{p}}{\left({v}_{p}\right)}_{{i}_{p},{j}_{p}}.\end{equation*}$

Analogously, the ℓ^q-space of discrete symmetric l-tensors on Ω_h is defined as

$\begin{equation*}{\ell }^{q}\left({{\Omega}}_{h},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{2}\right)\right)=\left\{u\enspace \vert \enspace u={\left({u}_{\alpha }\right)}_{\alpha \in {\mathbf{N}}^{2},\left\vert \alpha \right\vert =l},\enspace {u}_{\alpha }:{{\Omega}}_{h}^{\alpha }\to \mathbf{R}\quad \text{for}\;\text{all}\enspace \alpha \in {\mathbf{N}}^{2},\left\vert \alpha \right\vert =l\right\}\end{equation*}$

with an analogous norm using (60) as pointwise norm. The scalar product on ℓ²(Ω_h, Sym^l(R²)) is given by

$\begin{equation*}{\langle u,\enspace v\rangle }_{{\ell }^{2}\left({{\Omega}}_{h},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{2}\right)\right)}=\sum _{\left(i,j\right)\in {\mathbf{Z}}^{2}}\sum _{\alpha \in {\mathbf{N}}^{2},\left\vert \alpha \right\vert =l}\left(\genfrac{}{}{0pt}{}{\left\vert \alpha \right\vert }{\alpha }\right){\left({u}_{\alpha }\right)}_{{i}_{\alpha },{j}_{\alpha }}{\left({v}_{\alpha }\right)}_{{i}_{\alpha },{j}_{\alpha }}.\end{equation*}$

For k ∈ N, equation (61) then defines a linear operator mapping

$\begin{equation*}{\mathcal{E}}_{h}^{k}:{\ell }^{2}\left({{\Omega}}_{h},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{2}\right)\right)\to {\ell }^{2}\left({{\Omega}}_{h},{\mathrm{S}\mathrm{y}\mathrm{m}}^{k+l}\left({\mathbf{R}}^{2}\right)\right)\end{equation*}$

and (63) induces a linear operator mapping

$\begin{equation*}{\nabla }_{h}^{k}:{\ell }^{2}\left({{\Omega}}_{h},{\mathcal{T}}^{l}\left({\mathbf{R}}^{2}\right)\right)\to {\ell }^{2}\left({{\Omega}}_{h},{\mathcal{T}}^{k+l}\left({\mathbf{R}}^{2}\right)\right).\end{equation*}$

The norm of these operators can easily be estimated:

Lemma 6.5. We have ${\Vert}{\nabla }_{h}^{k}{\Vert}{\leqslant}{8}^{k/2}$ and ${\Vert}{\mathcal{E}}_{h}^{k}{\Vert}{\leqslant}{8}^{k/2}$ independent of l.

Proof. As ${\nabla }_{h}^{k}={\nabla }_{h}\cdots {\nabla }_{h}$ on the respective discrete tensor fields, it is sufficient to prove the statement for k = 1 and l arbitrary. For this purpose, observe that for $u:{{\Omega}}_{h}^{p}\to \mathbf{R}$ , p ∈ {1,2}^l, we have

$\begin{align*}\hfill {{\Vert}{\partial }_{h}^{\left(1,{p}_{l},\dots ,{p}_{1}\right)}u{\Vert}}_{2}^{2}& =\sum _{\left(i,j\right)\in {\mathbf{Z}}^{2}}{\left\vert {\partial }_{h}^{\left(1,{p}_{l},\dots ,{p}_{1}\right)}u\right\vert }_{{i}_{p}+1/2,{j}_{p}}^{2}{\leqslant}2\sum _{\left(i,j\right)\in {\mathbf{Z}}^{2}}{\left\vert u\right\vert }_{i+1,j}^{2}+{\left\vert u\right\vert }_{i,j}^{2}\hfill \\ \hfill & {\leqslant}4\sum _{\left(i,j\right)\in {\mathbf{Z}}^{2}}{\left\vert u\right\vert }_{i,j}^{2}=4{{\Vert}u{\Vert}}_{2}^{2},\hfill \end{align*}$

and an analogous estimate for ${{\Vert}{\partial }_{h}^{\left(2,{p}_{l},\dots ,{p}_{1}\right)}u{\Vert}}_{2}^{2}$ . Consequently, ${{\Vert}{\nabla }_{h}u{\Vert}}_{2}^{2}{\leqslant}8{{\Vert}u{\Vert}}_{2}^{2}$ for such u. If $u\in {\ell }^{2}\left({{\Omega}}_{h},{\mathcal{T}}^{l}\left({\mathbf{R}}^{2}\right)\right)$ , then

$\begin{equation*}{{\Vert}{\nabla }_{h}u{\Vert}}_{2}^{2}=\sum _{p\in {\left\{1,2\right\}}^{l}}{{\Vert}{\nabla }_{h}{u}_{p}{\Vert}}_{2}^{2}{\leqslant}8\sum _{p\in {\left\{1,2\right\}}^{l}}{{\Vert}{u}_{p}{\Vert}}_{2}^{2}=8{{\Vert}u{\Vert}}_{2}^{2},\end{equation*}$

so the claim follows.

For the symmetrised gradient, it is possible to pursue the same strategy since ${\mathcal{E}}_{h}^{k}={\mathcal{E}}_{h}\cdots {\mathcal{E}}_{h}$ on the respective discrete symmetric tensor fields. Indeed, with the Cauchy–Schwarz inequality and Vandermonde's identity (which reduces to the standard recurrence relation for binomial coefficients in most cases), one obtains

$\begin{align*}\hfill {\left\vert {\mathcal{E}}_{h}u\right\vert }_{i,j}^{2}& =\sum _{\beta \in {\mathbf{N}}^{2},\left\vert \beta \right\vert =l+1}\left(\genfrac{}{}{0pt}{}{\left\vert \beta \right\vert }{\beta }\right){\left\vert {\left({\left({\mathcal{E}}_{h}u\right)}_{\beta }\right)}_{{i}_{\beta },{j}_{\beta }}\right\vert }^{2}\hfill \\ \hfill & {\leqslant}\sum _{\beta \in {\mathbf{N}}^{2},\left\vert \beta \right\vert =l+1}\enspace \enspace \sum _{\alpha \in {\mathbf{N}}^{2},\left\vert \alpha \right\vert =l,p\in \left\{1,2\right\},\alpha +\alpha \left(p\right)=\beta }\left(\genfrac{}{}{0pt}{}{\left\vert \alpha \right\vert }{\alpha }\right){\left\vert {\left({\partial }_{h}^{p}{u}_{\alpha }\right)}_{{i}_{\alpha ,p},{j}_{\alpha ,p}}\right\vert }^{2}\hfill \\ \hfill & =\sum _{p=1}^{2}\sum _{\left\vert \alpha \right\vert =l}\left(\genfrac{}{}{0pt}{}{\left\vert \alpha \right\vert }{\alpha }\right){\left\vert {\left({\partial }_{h}^{p}{u}_{\alpha }\right)}_{{i}_{\alpha ,p},{j}_{\alpha ,p}}\right\vert }^{2}=\sum _{p=1}^{2}{\left\vert {\partial }_{h}^{p}u\right\vert }_{{i}_{p},{j}_{p}}^{2}.\hfill \end{align*}$

It is then easy to see that ${{\Vert}{\mathcal{E}}_{h}u{\Vert}}_{2}^{2}{\leqslant}{\sum }_{p=1}^{2}{{\Vert}{\partial }_{h}^{p}u{\Vert}}_{2}^{2}{\leqslant}8{{\Vert}u{\Vert}}_{2}^{2}$ . □

Remark 6.6. For p ∈ {1,2}^l and p₀ ∈ {1, 2}, consider the discrete partial derivative ${\partial }_{h}^{{p}_{0}}:{{\Omega}}_{h}^{p}\to {{\Omega}}_{h}^{\left({p}_{0},p\right)}$ and its negative adjoint ${\partial }_{h,0}^{{p}_{0}}$ , i.e., $\langle {\partial }_{h}^{{p}_{0}}u,\enspace v\rangle =-\langle u,\enspace {\partial }_{h,0}^{{p}_{0}}v\rangle$ for $u:{{\Omega}}_{h}^{p}\to \mathbf{R}$ , $v:{{\Omega}}_{h}^{\left({p}_{0},p\right)}\to \mathbf{R}$ . For $u:{{\Omega}}_{h}^{\left({p}_{0},p\right)}\to \mathbf{R}$ , this results in

$\begin{equation*}{\left({\partial }_{h,0}^{{p}_{0}}u\right)}_{{i}_{p},{j}_{p}}=\begin{cases}_{{i}_{p}+\frac{1}{2},{j}_{p}}-{u}_{{i}_{p}-\frac{1}{2},{j}_{p}}\hfill & \text{if}\enspace {p}_{0}=1,\hfill \\ {u}_{{i}_{p},{j}_{p}+\frac{1}{2}}-{u}_{{i}_{p},{j}_{p}-\frac{1}{2}}\hfill & \text{if}\enspace {p}_{0}=2,\hfill \end{cases}\end{equation*}$

for $\left({i}_{p},{j}_{p}\right)\in {{\Omega}}_{h}^{p}$ , where u is extended by 0 outside of ${{\Omega}}_{h}^{\left({p}_{0},p\right)}$ . (In contrast, ${\partial }_{h}^{{p}_{0}}u$ is only defined for $\left({i}_{p},{j}_{p}\right)\in {{\Omega}}_{h}^{\left({p}_{0},{p}_{0},p\right)}$ .)

Consequently, the negative adjoint of the discrete gradient induces a divergence for discrete tensor fields $u\in {\ell }^{2}\left({{\Omega}}_{h},{\mathcal{T}}^{l+1}\left({\mathbf{R}}^{2}\right)\right)$ such that ${\mathrm{d}\mathrm{i}\mathrm{v}}_{h}u\in {\ell }^{2}\left({{\Omega}}_{h},{\mathcal{T}}^{l}\left({\mathbf{R}}^{2}\right)\right)$ and

$\begin{equation*}{\left({\mathrm{d}\mathrm{i}\mathrm{v}}_{h}u\right)}_{p}={\partial }_{h,0}^{1}{u}_{\left(1,p\right)}+{\partial }_{h,0}^{2}{u}_{\left(2,p\right)}\end{equation*}$

for p ∈ {1,2}^l. For the discrete divergence that arises as the negative adjoint of the symmetrised gradient on ℓ²(Ω_h, Sym^l(R^d)), one has to take the symmetrisation into account: for u ∈ ℓ²(Ω_h, Sym^l+1(R²)), we have

$\begin{equation*}{\left({\mathrm{d}\mathrm{i}\mathrm{v}}_{h}u\right)}_{\alpha }={\partial }_{h,0}^{1}{u}_{\alpha +{e}_{1}}+{\partial }_{h,0}^{2}{u}_{\alpha +{e}_{2}}.\end{equation*}$

Here, the operators ${\partial }_{h,0}^{{p}_{0}}$ act on functions on the grid ${{\Omega}}_{h}^{\alpha +\alpha \left({p}_{0}\right)}$ and yield functions on the grid ${{\Omega}}_{h}^{\alpha }$ . Note that in the grid point (i_α, j_α), these partial derivatives have to be evaluated in the grid points $\left({i}_{\alpha +{e}_{1}}+\frac{1}{2}{\left(-1\right)}^{l},{j}_{\alpha +{e}_{1}}\right)$ and $\left({i}_{\alpha +{e}_{2}},{j}_{\alpha +{e}_{2}}+\frac{1}{2}{\left(-1\right)}^{l}\right)$ , respectively. This way, the discrete divergence operator is consistently defined.

Remark 6.7. While the above approach provides a way of discretising higher-order regularisation approaches with finite differences up to an arbitrary order of differentiation, fundamental questions of numerical analysis such as consistency and stability of such discretisations, and consequently, of the existence of error bounds that converge to zero as the discretisations level becomes finer, were not addressed. In view of the non-smooth minimisation problems associated with regularisation approaches in imaging, a first approach to answer such questions is typically to consider convergence of minimisers of the discrete energies to minimisers of the continuous counterpart, e.g., by ensuring gamma-convergence. While for first-order TV regularisation convergence of minimisers is known [57], an extension of such results to higher-order approaches still seems to be open. With respect to error bounds, the situation is similar and we refer to [191] for results on finite-difference discretisations of first-order TV in certain Lipschitz spaces.

6.2. A general saddle-point framework

Having appropriately discretised versions of higher-order regularisation functionals available, we now deal with the numerical solution of corresponding Tikhonov approaches for inverse problems. To this aim, we first consider a general framework and then derive concrete realisations for different regularisation approaches.

Let Ω_h be the discretised grid of subsection 6.1 and define U_h = ℓ²(Ω_h). We assume a discrete linear forward operator K_h : U_h → Y_h, with $\left({Y}_{h},{\Vert}\cdot {{\Vert}}_{{Y}_{h}}\right)$ a finite-dimensional Hilbert space, and a proper, convex, lower semi-continuous and coercive discrepancy term ${S}_{{f}_{h}}:{Y}_{h}\to \left[0,\infty \right]$ with corresponding discrete data f_h to be given. Further, we define ${\mathcal{R}}_{\alpha }:{U}_{h}\to \left[0,\infty \right]$ to be a regularisation functional given in a general form as ${\mathcal{R}}_{\alpha }\left(u\right)={\mathrm{min}}_{w\in {W}_{h}}{\Vert}{D}_{h}\left(u,w\right){{\Vert}}_{1,\alpha }$ , with D_h : U_h × W_h → V_h, $\left(u,w\right){\mapsto}{D}_{h}^{1}u+{D}_{h}^{2}w$ a discrete differential operator and W_h, V_h finite-dimensional Hilbert spaces. The expression ||⋅||_1,α here denotes an appropriate ℓ¹-type norm weighted using the parameters α and will be specified later for concrete examples. Its dual norm is denoted by ${{\Vert}\cdot {\Vert}}_{\infty ,{\alpha }^{-1}}$ . We consider the general minimisation problem

$\begin{equation}\underset{u\in {U}_{h}}{\mathrm{min}}\enspace {S}_{{f}_{h}}\left({K}_{h}u\right)+{\mathcal{R}}_{\alpha }\left(u\right),\end{equation} \tag{ 64 }$

for which we will numerically solve the equivalent reformulation

$\begin{equation}\underset{\left(u,w\right)\in {U}_{h}{\times}{W}_{h}}{\mathrm{min}}\enspace {S}_{{f}_{h}}\left({K}_{h}u\right)+{\Vert}{D}_{h}\left(u,w\right){{\Vert}}_{1,\alpha }.\end{equation} \tag{ 65 }$

Remark 6.8. Note that here, the auxiliary variable w and the space W_h are used to include balancing-type regularisation approaches such as the infimal convolution of two functionals. Setting, for example, W_h = U_h, ${V}_{h}={\ell }^{2}\left({\Omega},{\mathcal{T}}^{1}\left({\mathbf{R}}^{2}\right)\right){\times}{\ell }^{2}\left({\Omega},{\mathcal{T}}^{2}\left({\mathbf{R}}^{2}\right)\right)$ , ${D}_{h}\left(u,w\right)=\left({\nabla }_{h}u-{\nabla }_{h}w,{\nabla }_{h}^{2}w\right)$ and ||(v₁, v₂)||_1,α = α₁||v₁||₁ + α₂||v₂||₁ for positive α = (α₁, α₂) yields

$\begin{equation*}{\mathcal{R}}_{\alpha }\left(u\right)=\underset{w\in {W}_{h}}{\mathrm{min}}\enspace {\alpha }_{1}{\Vert}{\nabla }_{h}u-{\nabla }_{h}w{{\Vert}}_{1}+{\alpha }_{2}{\Vert}{\nabla }_{h}^{2}w{{\Vert}}_{1}=\left({\alpha }_{1}\mathrm{T}\mathrm{V}{\triangle}{\alpha }_{2}{\mathrm{T}\mathrm{V}}^{2}\right)\left(u\right).\end{equation*}$

Total-variation regularisation can, on the other hand, be obtained by choosing W_h = {0}, ${V}_{h}={\ell }^{2}\left({{\Omega}}_{h},{\mathcal{T}}^{1}\left({\mathbf{R}}^{2}\right)\right)$ , D_h(u, 0) = ∇_h u and ||v||_1,α = α||v||₁ for α > 0.

Remark 6.9. The dual norm ${{\Vert}\cdot {\Vert}}_{\infty ,{\alpha }^{-1}}$ will become relevant in the context of primal-dual algorithms via its Fenchel dual. Indeed, we have the identity ${\left({{\Vert}\cdot {\Vert}}_{X}\right)}^{{\ast}}={\mathcal{I}}_{\left\{{{\Vert}\cdot {\Vert}}_{{X}^{{\ast}}}{\leqslant}1\right\}}$ for ||⋅||_X the norm of a general Banach space X and ${{\Vert}\cdot {\Vert}}_{{X}^{{\ast}}}$ its dual norm on X*. This is a consequence of $\langle w,\enspace u\rangle -{{\Vert}w{\Vert}}_{{X}^{{\ast}}}{{\Vert}u{\Vert}}_{X}{\leqslant}0$ for all u ∈ X and w ∈ X*, so ${\left({{\Vert}\cdot {\Vert}}_{X}\right)}^{{\ast}}\left(w\right)=0$ for ${{\Vert}w{\Vert}}_{{X}^{{\ast}}}{\leqslant}1$ . For ${{\Vert}w{\Vert}}_{{X}^{{\ast}}}{ >}1$ one can find a u ∈ X, ||u||_X ⩽ 1 such that, for a c > 0, ⟨w, u⟩ ⩾ 1 + c ⩾ ||u||_X + c. For each t > 0, we get ⟨w, tu⟩ − ||tu||_X ⩾ tc, hence ${\left({{\Vert}\cdot {\Vert}}_{X}\right)}^{{\ast}}\left(w\right)=\infty$ .

Remark 6.10. While the setting of (64) allows to include rather general forward operators K_h and discrepancy terms ${S}_{{f}_{h}}$ , it will still not capture all applications of higher-order regularisation that we consider later in subsections 7 and 8. It rather comprises a balance between general applicability and uniform presentation, and we will comment on possible extensions later on such that the interested reader should be able to adapt the setting presented here to the concrete problem setting at hand.

From a more general perspective, the reformulation (65) of (64) constitutes a non-smooth, convex optimisation problem of the form

$\begin{equation}\underset{x\in \mathcal{X}}{\mathrm{min}}\enspace \mathcal{F}\left(\mathcal{K}x\right)+\mathcal{G}\left(x\right),\end{equation} \tag{ 66 }$

with $\mathcal{X},\mathcal{Y}$ Hilbert spaces, $\mathcal{F}:\mathcal{Y}\to \left[0,\infty \right]$ , $\mathcal{G}:\mathcal{X}\to \left[0,\infty \right]$ proper, convex and lower semi-continuous functionals and $\mathcal{K}:\mathcal{X}\to \mathcal{Y}$ linear and continuous. For this class of problems, duality-based first-order optimisation algorithms of ascent/descent-type have become very popular in the past years as they are rather generally applicable and yield algorithms for the solution of (66) that provably converge to a global optimum, while allowing a simple implementation and practical stepsize choices, such as constant stepsizes. The algorithm of [56], for instance, constitutes a relatively early step in this direction, as it solves the TV-denoising problem with constant stepsizes in terms of a dual problem.

For problems of the type (66), it is often beneficial to consider a primal-dual saddle-point reformulation instead of the dual problem alone, in particular in view of general applicability. This is given as

$\begin{equation}\underset{x\in \mathrm{d}\mathrm{o}\mathrm{m}\mathcal{G}}{\mathrm{min}}\enspace \underset{y\in \mathrm{d}\mathrm{o}\mathrm{m}{\mathcal{F}}^{{\ast}}}{\mathrm{max}}\enspace {\langle \mathcal{K}x,y\rangle }_{\mathcal{Y}}+\mathcal{G}\left(x\right)-{\mathcal{F}}^{{\ast}}\left(y\right),\end{equation} \tag{ 67 }$

with ${\langle \cdot ,\cdot \rangle }_{\mathcal{Y}}$ denoting the inner product in $\mathcal{Y}$ . By interchanging minimum and maximum and minimising with respect to x, one further arrives at the dual problem which reads as

$\begin{equation}\underset{y\in \mathcal{Y}}{\mathrm{max}}\enspace -{\mathcal{F}}^{{\ast}}\left(y\right)-{\mathcal{G}}^{{\ast}}\left(-{\mathcal{K}}^{{\ast}}y\right).\end{equation} \tag{ 68 }$

Under certain conditions, the minimum in (66) and maximum in (68) admit the same value and primal/dual solution pairs for (66) an (68) correspond to solutions of the saddle-point problem (67), see below.

Now, indeed, many different algorithmic approaches for solving (67) are nowadays available (see for instance [41, 42, 60, 61, 124]) and which one of them delivers the best performance typically depends on the concrete problem instance. Here, as exemplary algorithmic framework, we consider the popular primal-dual algorithm of [60] (see also [144, 203]), which has the advantage of being simple and yet rather generally applicable.

Conceptually, the algorithm of [60] solves the saddle-point problem (67) via implicit gradient descent and ascent steps with respect to the primal and dual variables, respectively. With $\mathcal{L}\left(x,y\right)={\langle \mathcal{K}x,y\rangle }_{\mathcal{Y}}+\mathcal{G}\left(x\right)-{\mathcal{F}}^{{\ast}}\left(y\right)$ , carrying out these implicit steps simultaneously in both variables would correspond to computing the iterates {(xⁿ, yⁿ)} via

$\begin{equation}\begin{cases}^{n+1}={y}^{n}+\sigma {\partial }_{y}\mathcal{L}\left({x}^{n+1},{y}^{n+1}\right),\hfill \\ {x}^{n+1}={x}^{n}-\tau {\partial }_{x}\mathcal{L}\left({x}^{n+1},{y}^{n+1}\right),\hfill \end{cases}\end{equation} \tag{ 69 }$

where ∂_x and ∂_y denotes the subgradient with respect to the first and second variable, respectively, and σ, τ are positive constants. To obtain computationally feasible iterations, the implicit step for yⁿ⁺¹ in the primal-dual algorithm uses an extrapolation ${\overline{x}}^{n}=2{x}^{n}-{x}^{n-1}$ of the previous iterate instead of xⁿ⁺¹, such that the descent and ascent steps decouple and can be re-written as

$\begin{equation}\begin{cases}^{n+1}={\left(\mathrm{i}\mathrm{d}+\sigma \partial {\mathcal{F}}^{{\ast}}\right)}^{-1}\left({y}^{n}+\sigma \mathcal{K}{\overline{x}}^{n}\right),\hfill \\ {x}^{n+1}={\left(\mathrm{i}\mathrm{d}+\tau \partial \mathcal{G}\right)}^{-1}\left({x}^{n}-\tau {\mathcal{K}}^{{\ast}}{y}^{n+1}\right),\hfill \\ {\overline{x}}^{n+1}=2{x}^{n+1}-{x}^{n}.\hfill \end{cases}\end{equation} \tag{ 70 }$

The mappings ${\left(\mathrm{i}\mathrm{d}+\sigma \partial {\mathcal{F}}^{{\ast}}\right)}^{-1}$ and ${\left(\mathrm{i}\mathrm{d}+\tau \partial \mathcal{G}\right)}^{-1}$ used here are so-called proximal mappings of ${\mathcal{F}}^{{\ast}}$ and $\mathcal{G}$ , respectively, which, as noted in proposition 6.11 below, are well-defined and single valued whenever $\mathcal{G},{\mathcal{F}}^{{\ast}}$ are proper, convex and lower semi-continuous. The resulting algorithm can then be interpreted as proximal-point algorithm (see [102, 161]) and weak convergence in the sense that (xⁿ, yⁿ) ⇀ (x*, y*) for (x*, y*) being a solution to the saddle-point problem (67) can be ensured for positive stepsize choices σ, τ such that $\sigma \tau {\Vert}\mathcal{K}{{\Vert}}^{2}{< }1$ , see for instance [61, 143], or [60] for the finite-dimensional case. In contrast, explicit methods for non-smooth optimisation problems such as subgradient descent, for instance, usually require stepsizes that converge to zero [138] and could stagnate numerically.

Overall, the efficiency of the iteration steps in (70) crucially depends on the ability to evaluate $\mathcal{K}$ and ${\mathcal{K}}^{{\ast}}$ and to compute the proximal mappings efficiently. Regarding the latter, this is possible for a large class of functionals, in particular for many functionals that are defined pointwise, which is one of the reasons for the high popularity of these kind of algorithms. We now consider proximal mappings in more detail and provide concrete examples later on.

Proposition 6.11. Let H be a Hilbert space, $F:H\to \left.\right]- \infty ,\infty \left.\right]$ proper, convex and lower semi-continuous, and σ > 0.

(a)
Then, the mapping
$\begin{equation}{\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\sigma F}:H\to H,\quad u{\mapsto}\underset{\bar{u}\in H}{\text{arg}\,\text{min}}\enspace \frac{{{\Vert}\bar{u}-u{\Vert}}_{H}^{2}}{2}+\sigma F\left(\bar{u}\right)\end{equation} \tag{ 71 }$
is well-defined.
(b)
For u ∈ H, u* = prox_σF(u) solves the inclusion relation
$\begin{equation*}u\in {u}^{{\ast}}+\sigma \partial F\left({u}^{{\ast}}\right),\end{equation*}$
i.e., prox_σF = (id + σ∂F)⁻¹.
(c)
The mapping prox_σF is Lipschitz-continuous with constant not exceeding 1.

Proof. See, for instance, [178, proposition IV.1.5, corollary IV.1.3], or [13, proposition 12.15, example 23.3, corollary 23.10]. □

In general, the computation of proximal mappings can be as difficult as solving the original optimisation problem itself. However, if, for instance, the corresponding functional can be 'well separated' into simple building blocks, then proximal mappings can be reduced to some basic ones which are simple and easy to compute.

Lemma 6.12. Let H = H₁ ⊥⋯⊥ H_n with closed subspaces H₁, ..., H_n, the mappings P₁, ..., P_n their orthogonal projectors,

$\begin{equation*}F\left(u\right)=\sum _{i=1}^{n}{F}_{i}\left({P}_{i}u\right)\end{equation*}$

with each ${F}_{i}:{H}_{i}\to \left.\right]- \infty ,\infty \left.\right]$ proper, convex and lower semi-continuous. Then,

$\begin{equation*}{\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\sigma F}\left(u\right)=\sum _{i=1}^{n}{\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\sigma {F}_{i}}\left({P}_{i}u\right).\end{equation*}$

Proof. This is immediate since the corresponding minimisation problem decouples. □

Furthermore, Moreau's identity (see [160], for instance) provides a relation between the proximal mapping of a function F and the proximal mapping of its dual F* according to

$\begin{equation}u={\left(\mathrm{i}\mathrm{d}+\sigma \partial F\right)}^{-1}\left(u\right)+\sigma {\left(\mathrm{i}\mathrm{d}+\frac{1}{\sigma }\partial {F}^{{\ast}}\right)}^{-1}\left(\frac{u}{\sigma }\right).\end{equation} \tag{ 72 }$

This immediately implies that for general σ > 0, the computation of (id + σ∂F)⁻¹ is essentially as difficult as the computation of ${\left(\mathrm{i}\mathrm{d}+\sigma \partial {F}^{{\ast}}\right)}^{-1}$ , in particular the latter can be obtained from the former as follows.

Lemma 6.13. Let H be a Hilbert space and $F:H\to \left.\right]- \infty ,\infty \left.\right]$ be proper, convex and lower semi-continuous. Then, for σ > 0,

$\begin{equation*}{\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\sigma F}\left(u\right)=u-\sigma \enspace {\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\frac{1}{\sigma }{F}^{{\ast}}}\left(\frac{u}{\sigma }\right),\quad {\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\sigma {F}^{{\ast}}}\left(u\right)=u-\sigma \enspace {\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\frac{1}{\sigma }F}\left(\frac{u}{\sigma }\right).\end{equation*}$

In some situations, the computation of the proximal mappings of the sum of two functions decouples into the composition of two mappings.

Lemma 6.14. Let H be a Hilbert space, $F:H\to \left.\right]- \infty ,\infty \left.\right]$ be proper, convex and lower semi-continuous and σ > 0.

(a)
If $F\left(u\right)=G\left(u\right)+\frac{\alpha }{2}{\Vert}u-{u}_{0}{{\Vert}}_{H}^{2}$ with $G:H\to \left.\right]- \infty ,\infty \left.\right]$ proper, convex and lower semi-continuous, u₀ ∈ H and α > 0, then
$\begin{equation*}{\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\sigma F}\left(u\right)={\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\frac{\sigma }{1+\sigma \alpha }G}{\circ}{\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\sigma \frac{\alpha }{2}{\Vert}\cdot -{u}_{0}{{\Vert}}_{H}^{2}}\left(u\right)={\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\frac{\sigma }{1+\sigma \alpha }G}\left(\frac{u+\sigma \alpha {u}_{0}}{1+\sigma \alpha }\right).\end{equation*}$
(b)
If H = ${\mathbf{R}}^{M}$ equipped with the Euclidean norm and $F\left(u\right)={\sum }_{m=1}^{M}{\mathcal{I}}_{\left[{a}_{m},{b}_{m}\right]}\left({u}_{m}\right)+{F}_{m}\left({u}_{m}\right)$ with dom(F_m) ∩ [a_m, b_m] ≠ $\mathrm{\varnothing}$ for each m = 1, ..., M, then
$\begin{equation*}{\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\sigma F}{\left(u\right)}_{m}={\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{j}}_{\left[{a}_{m},{b}_{m}\right]}{\circ}{\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\sigma {F}_{m}}\left({u}_{m}\right)\end{equation*}$
for m = 1, ..., M, where
$\begin{equation*}{\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{j}}_{\left[{a}_{m},{b}_{m}\right]}\left(t\right)={\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\sigma {\mathcal{I}}_{\left[{a}_{m},{b}_{m}\right]}}\left(t\right)=\begin{cases}t\hfill & \text{if}\enspace t\in \left[{a}_{m},{b}_{m}\right],\hfill \\ {a}_{m}\hfill & \text{if}\enspace t{< }{a}_{m},\hfill \\ {b}_{m}\quad \hfill & \text{if}\enspace t{ >}{b}_{m},\hfill \end{cases}\end{equation*}$
is the projection to [a_m, b_m] in $\mathbf{R}$ .

Proof. Regarding (a), we note that by first-order optimality conditions (additivity of the subdifferential follows from [83, proposition I.5.6]), we have the following equivalences:

$\begin{align*}\hfill {u}^{{\ast}}={\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\sigma F}\left(u\right)& \quad {\Leftrightarrow}\quad 0\in {u}^{{\ast}}-u+\sigma \alpha \left({u}^{{\ast}}-{u}_{0}\right)+\sigma \partial G\left(u\right)\hfill \\ \hfill & \quad {\Leftrightarrow}\quad 0\in {u}^{{\ast}}-\frac{u+\sigma \alpha {u}_{0}}{1+\sigma \alpha }+\frac{\sigma }{1+\sigma \alpha }\partial G\left(u\right)\hfill \\ \hfill & \quad {\Leftrightarrow}\quad {u}^{{\ast}}={\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\frac{\sigma }{1+\sigma \alpha }G}\left(\frac{u+\sigma \alpha {u}_{0}}{1+\sigma \alpha }\right),\hfill \end{align*}$

which proves the explicit form of prox_σF. The intermediate equality follows from ${\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\sigma \frac{\alpha }{2}{\Vert}\cdot -{u}_{0}{{\Vert}}_{H}^{2}}\left(u\right)=\frac{u+\sigma \alpha {u}_{0}}{1+\sigma \alpha }$ , which can again be seen from the optimality conditions.

In order to show (b), first note that, using lemma 6.12, it suffices to show the assertion for $F\left(u\right)={\mathcal{I}}_{\left[a,b\right]}\left(u\right)+f\left(u\right)$ with u ∈ R, a ⩽ b and $f:\mathbf{R}\to \left.\right]- \infty ,\infty \left.\right]$ proper, convex and lower semi-continuous such that dom(f) ∩ [a, b] ≠ Ø. Also, the identity ${\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\sigma {\mathcal{I}}_{\left[a,b\right]}}={\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{j}}_{\left[a,b\right]}$ as well as the explicit form of the projection is immediate. Now, denote by u* = proj_[a,b]◦prox_σf(u) and write it as u* = νu_F + (1 − ν)u_f with u_F = prox_σF(u) and u_f = prox_σf(u) and ν ∈ [0, 1]. To see that this is possible, note that in case of u_f ∈ [a, b], u* = u_f, in case u_f < a we have that u_f < a = u* ⩽ u_F and similarly in case of u_f > b. But, with $E\left(\bar{u}\right)=\frac{\vert \bar{u}-u{\vert }^{2}}{2}+\sigma f\left(\bar{u}\right)$ , convexity and minimality of u_f implies that

$\begin{equation*}E\left({u}^{{\ast}}\right){\leqslant}\nu E\left({u}_{F}\right)+\left(1-\nu \right)E\left({u}_{f}\right){\leqslant}\nu E\left({u}_{F}\right)+\left(1-\nu \right)E\left({u}_{F}\right)=E\left({u}_{F}\right).\end{equation*}$

Since both u* and u_F are in [a, b], the result follows from uniqueness of minimisers. □

In the following we provide some examples of explicit proximal mappings for some particular choices of F that are relevant for applications. For additional examples and further, general results on proximal mappings, we refer to [13, 66].

Lemma 6.15. Let H be a Hilbert space, σ > 0 and $F:H\to \left.\right]- \infty ,\infty \left.\right]$ . Then, the following identities hold.

(a)
For $F\left(u\right)=\frac{\alpha }{2}{{\Vert}u-f{\Vert}}_{H}^{2}$ with f ∈ H, α > 0,
$\begin{equation*}{\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\sigma F}\left(u\right)=\frac{u+\alpha \sigma f}{1+\alpha \sigma },\quad {\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\sigma {F}^{{\ast}}}\left(u\right)=\frac{u-\sigma f}{1+\sigma /\alpha }.\end{equation*}$
(b)
For $F={\mathcal{I}}_{C}$ for some non-empty, convex and closed set C ⊂ H,
$\begin{equation*}{\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\sigma F}={\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{j}}_{C},\end{equation*}$
with proj_C denoting the orthogonal projection onto C.
(c)
For F(u) = G(u − u₀) with $G:H\to \left.\right]- \infty ,\infty \left.\right]$ and u₀ ∈ H,
$\begin{equation*}{\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\sigma F}\left(u\right)={\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\sigma G}\left(u-{u}_{0}\right)+{u}_{0}.\end{equation*}$
(d)
For $H={\mathcal{H}}_{1}{\times}\cdots {\times}{\mathcal{H}}_{M}$ product of the Hilbert spaces ${\mathcal{H}}_{1},\dots ,{\mathcal{H}}_{M}$ , ${{\Vert}u{\Vert}}_{H}^{2}={\sum }_{m=1}^{M}\vert {u}_{m}{\vert }_{{\mathcal{H}}_{m}}^{2}$ with each $\vert \cdot {\vert }_{{\mathcal{H}}_{m}}$ denoting the norm on ${\mathcal{H}}_{m}$ , $\alpha \in {\left.\right]0,\infty \left[\right.}^{ M}$ , and $F={\mathcal{I}}_{\left\{{\Vert}\cdot {{\Vert}}_{\infty ,{\alpha }^{-1}}{\leqslant}1\right\}}$ where ${\Vert}u{{\Vert}}_{\infty ,{\alpha }^{-1}}={\mathrm{max}}_{m=1,\dots ,M}{\alpha }_{m}^{-1}\vert {u}_{m}{\vert }_{\mathcal{H}}$ ,
$\begin{equation*}{\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\sigma F}{\left(u\right)}_{m}={\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{j}}_{\left\{\vert \cdot {\vert }_{{\mathcal{H}}_{m}}{\leqslant}{\alpha }_{m}\right\}}\left({u}_{m}\right)=\frac{{u}_{m}}{\mathrm{max}\left\{1,{\alpha }_{m}^{-1}{\left\vert {u}_{m}\right\vert }_{{\mathcal{H}}_{m}}\right\}}.\end{equation*}$
(e)
In the situation of (d) and $F\left(u\right)={\sum }_{m=1}^{M}{\alpha }_{m}\vert {u}_{m}{\vert }_{{\mathcal{H}}_{m}}$ ,
$\begin{equation*}{\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\sigma F}{\left(u\right)}_{m}=\begin{cases}\left(1-\sigma {\alpha }_{m}\vert {u}_{m}{\vert }_{{\mathcal{H}}_{m}}^{-1}\right){u}_{m}\hfill & \text{if}\;\vert {u}_{m}{\vert }_{{\mathcal{H}}_{m}}{ >}\sigma {\alpha }_{m},\hfill \\ 0\hfill & \text{else,}\;\hfill \end{cases}\end{equation*}$
and
$\begin{equation*}{\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\sigma {F}^{{\ast}}}{\left(u\right)}_{m}={\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{j}}_{\left\{\vert \cdot {\vert }_{{\mathcal{H}}_{m}}{\leqslant}{\alpha }_{m}\right\}}\left({u}_{m}\right)\end{equation*}$
with ${\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{j}}_{\left\{\vert \cdot {\vert }_{{\mathcal{H}}_{m}}{\leqslant}{\alpha }_{m}\right\}}$ as in (d).

Proof. The assertion on prox_σF in (a) follows from first-order optimality conditions as in lemma 6.14, the assertion on ${\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\sigma {F}^{{\ast}}}$ is a consequence of lemma 6.13. Assertion (b) is immediate from the definition of the orthogonal projection in Hilbert spaces and (c) follows from a simple change of variables for the minimisation problem in the definition of the proximal mapping. Regarding (d), using lemma 6.12 and noting that ${{\Vert}u{\Vert}}_{\infty ,{\alpha }^{-1}}{\leqslant}1$ if and only if $\vert {u}_{m}{\vert }_{{\mathcal{H}}_{m}}{\leqslant}{\alpha }_{m}$ for m = 1, ..., M, it suffices to show that for each ${u}_{m}\in {\mathcal{H}}_{m}$ and m = 1, ..., M,

$\begin{equation*}{\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{j}}_{\left\{\vert \cdot {\vert }_{{\mathcal{H}}_{m}}{\leqslant}{\alpha }_{m}\right\}}\left({u}_{m}\right)=\frac{{u}_{m}}{\mathrm{max}\left\{1,{\alpha }_{m}^{-1}{\left\vert {u}_{m}\right\vert }_{{\mathcal{H}}_{m}}\right\}}.\end{equation*}$

To this aim, observe that by definition of the projection in Hilbert spaces,

$\begin{align*}\hfill {\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{j}}_{\left\{\vert \cdot {\vert }_{{\mathcal{H}}_{m}}{\leqslant}{\alpha }_{m}\right\}}\left({u}_{m}\right)& =\underset{\vert \bar{u}{\vert }_{{\mathcal{H}}_{m}}{\leqslant}{\alpha }_{m}}{\text{arg}\,\text{min}}\vert \bar{u}-{u}_{m}{\vert }_{{\mathcal{H}}_{m}}=\underset{\vert \bar{u}{\vert }_{{\mathcal{H}}_{m}}^{2}{\leqslant}{\alpha }_{m}^{2}}{\text{arg}\,\text{min}}\vert \bar{u}-{u}_{m}{\vert }_{{\mathcal{H}}_{m}}^{2}\hfill \\ \hfill & =\underset{\genfrac{}{}{0pt}{}{{\bar{u}}_{1}\in \mathrm{s}\mathrm{p}\mathrm{a}\mathrm{n}\left\{{u}_{m}\right\},\enspace {\bar{u}}_{2}\in \mathrm{s}\mathrm{p}\mathrm{a}\mathrm{n}{\left\{{u}_{m}\right\}}^{\perp },}{\vert {\bar{u}}_{1}{\vert }_{{\mathcal{H}}_{m}}^{2}+\vert {\bar{u}}_{2}{\vert }_{{\mathcal{H}}_{m}}^{2}{\leqslant}{\alpha }_{m}^{2}}}{\text{arg}\,\text{min}}\vert {\bar{u}}_{1}-{u}_{m}{\vert }_{{\mathcal{H}}_{m}}^{2}+\vert {\bar{u}}_{2}{\vert }_{{\mathcal{H}}_{m}}^{2}\hfill \\ \hfill & ={u}_{m}\left(\underset{t\in \mathbf{R},\enspace \vert t{u}_{m}{\vert }_{\mathcal{H}}^{2}{\leqslant}{\alpha }_{m}^{2}}{\text{arg}\,\text{min}}\vert t-1{\vert }^{2}\vert {u}_{m}{\vert }_{{\mathcal{H}}_{m}}^{2}\right).\hfill \end{align*}$

From this, it is easy to see that the minimum in the last line is achieved for t = 1 in case $\vert {u}_{m}{\vert }_{{\mathcal{H}}_{m}}{\leqslant}{\alpha }_{m}$ and $t={\alpha }_{m}/\vert {u}_{m}{\vert }_{{\mathcal{H}}_{m}}$ otherwise, from which the explicit form of the projection follows. Considering assertion (e), we have by remark 6.9 that

$\begin{equation*}{F}^{{\ast}}\left(u\right)=\sum _{m=1}^{M}{\left({\alpha }_{m}\vert \cdot {\vert }_{{\mathcal{H}}_{m}}\right)}^{{\ast}}\left({u}_{m}\right)=\sum _{m=1}^{M}{\mathcal{I}}_{\left\{\vert \cdot {\vert }_{{\mathcal{H}}_{m}}{\leqslant}{\alpha }_{m}\right\}}\left({u}_{m}\right)={\mathcal{I}}_{\left\{{{\Vert}\cdot {\Vert}}_{\infty ,{\alpha }^{-1}}{\leqslant}1\right\}}\left(u\right),\end{equation*}$

so the statements follow by lemma 6.13 and assertion (d). □

Now considering the minimisation problem (65), our approach is to rewrite it as a saddle-point problem such that, when applying the iteration (70), the involved proximal mappings decouple into simple and explicit mappings. To achieve this while allowing for general forward operators K_h, we dualise both the regularisation and the data-fidelity term and arrive at the following a saddle-point reformulation of (65):

$\begin{equation}\underset{\left(u,w\right)\in {U}_{h}{\times}{W}_{h}}{\mathrm{min}}\enspace \underset{\left(v,\lambda \right)\in {V}_{h}{\times}{Y}_{h}}{\mathrm{max}}\enspace {\langle {D}_{h}\left(u,w\right),v\rangle }_{{V}_{h}}+{\langle {K}_{h}u,\lambda \rangle }_{{Y}_{h}}-{\mathcal{I}}_{\left\{{{\Vert}\cdot {\Vert}}_{\infty ,{\alpha }^{-1}}{\leqslant}1\right\}}\left(v\right)-{S}_{{f}_{h}}^{{\ast}}\left(\lambda \right),\end{equation} \tag{ 73 }$

recalling that the dual norm of ||⋅||_1,α is denoted by ${{\Vert}\cdot {\Vert}}_{\infty ,{\alpha }^{-1}}$ and that ${\left({\Vert}\cdot {{\Vert}}_{1,\alpha }\right)}^{{\ast}}={\mathcal{I}}_{\left\{{\Vert}\cdot {{\Vert}}_{\infty ,{\alpha }^{-1}}{\leqslant}1\right\}}$ , see remark 6.9. The following lemma provides some instances of ||⋅||_1,α that arise in the context of higher-order TV regularisers and its generalisations. In particular, for these instances, the corresponding dual norm ${{\Vert}\cdot {\Vert}}_{\infty ,{\alpha }^{-1}}$ and the proximal mappings ${\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\sigma {\mathcal{I}}_{\left\{{{\Vert}\cdot {\Vert}}_{\infty ,{\alpha }^{-1}}{\leqslant}1\right\}}}={\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{j}}_{\left\{{{\Vert}\cdot {\Vert}}_{\infty ,{\alpha }^{-1}}{\leqslant}1\right\}}$ will be provided. Concrete examples will be discussed in example 6.19 below.

Lemma 6.16. With M ∈ N, $\alpha \in {\left.\right]0,\infty \left[\right.}^{ M}$ , l₁, ..., l_M ∈ N, and ${\mathcal{H}}_{m}\in \left\{{\mathcal{T}}^{{l}_{m}}\left({\mathbf{R}}^{2}\right),{\mathrm{S}\mathrm{y}\mathrm{m}}^{{l}_{m}}\left({\mathbf{R}}^{2}\right)\right\}$ for m = 1, ..., M, let

for v = (v₁, ..., v_M) ∈ V_h, where V_h is equipped with the induced inner product ${\langle u,v\rangle }_{{V}_{h}}={\sum }_{m=1}^{M}{\langle {u}_{m},{v}_{m}\rangle }_{{\ell }^{2}\left({{\Omega}}_{h},{\mathcal{H}}_{m}\right)}$ and norm, and the one-norm on each ${\ell }^{2}\left({\Omega},{\mathcal{H}}_{M}\right)$ is given in definition 6.4. Then, the dual norm ${{\Vert}\cdot {\Vert}}_{\infty ,{\alpha }^{-1}}$ satisfies

$\begin{equation*}{\Vert}v{{\Vert}}_{\infty ,{\alpha }^{-1}}=\underset{m=1,\dots ,M}{\mathrm{max}}\enspace {\alpha }_{m}^{-1}{\Vert}{v}_{m}{{\Vert}}_{\infty },\end{equation*}$

with the ∞-norm again according to definition 6.4. Further, we have, for m = 1, ..., M, that

$\begin{equation*}{\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\sigma {\left({\Vert}\cdot {{\Vert}}_{1,\alpha }\right)}^{{\ast}}}{\left(v\right)}_{m}={\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{j}}_{\left\{{\Vert}\cdot {{\Vert}}_{\infty ,{\alpha }^{-1}}{\leqslant}1\right\}}{\left(v\right)}_{m}=\frac{{v}_{m}}{\mathrm{max}\left\{1,{\alpha }_{m}^{-1}\vert {v}_{m}{\vert }_{{\mathcal{H}}_{m}}\right\}},\end{equation*}$

where the right-hand side has to be interpreted in the pointwise sense, i.e., for $u\in {\ell }^{2}\left({{\Omega}}_{h},{\mathcal{H}}_{m}\right)$ and (i, j) ∈ Ω_h, it holds that ${\left(\mathrm{max}{\left\{1,{\alpha }_{m}^{-1}\vert u{\vert }_{{\mathcal{H}}_{m}}\right\}}^{-1}u\right)}_{i,j}=\mathrm{max}{\left\{1,{\alpha }_{m}^{-1}\vert {u}_{i,j}{\vert }_{{\mathcal{H}}_{m}}\right\}}^{-1}{u}_{i,j}$ .

Proof. By definition, we have for u, v ∈ V_h with ||u||_1,α ⩽ 1 that

$\begin{equation*}{\langle v,\enspace u\rangle }_{{V}_{h}}{\leqslant}\sum _{m=1}^{M}{\alpha }_{m}^{-1}{{\Vert}{v}_{m}{\Vert}}_{\infty }{\alpha }_{m}{{\Vert}{u}_{m}{\Vert}}_{1}{\leqslant}\underset{m=1,\dots ,M}{\mathrm{max}}{\alpha }_{m}^{-1}{{\Vert}{v}_{m}{\Vert}}_{\infty }=\underset{\genfrac{}{}{0pt}{}{m=1,\dots ,M,}{\left(i,j\right)\in {{\Omega}}_{h}}}{\mathrm{max}}{\alpha }_{m}^{-1}\vert {\left({v}_{m}\right)}_{i,j}{\vert }_{{\mathcal{H}}_{m}},\end{equation*}$

hence, ${{\Vert}v{\Vert}}_{\infty ,{\alpha }^{-1}}{\leqslant}{\mathrm{max}}_{m=1,\dots ,M}{\alpha }_{m}^{-1}{{\Vert}{v}_{m}{\Vert}}_{\infty }$ . With (m, i, j) a maximising argument of the right-hand side above, equality follows, in case of v ≠ 0, from choosing u according to ${\left({u}_{m}\right)}_{i,j}={\alpha }_{m}^{-1}{\left({v}_{m}\right)}_{i,j}/\vert {\left({v}_{m}\right)}_{i,j}{\vert }_{{\mathcal{H}}_{m}}$ and 0 everywhere else. The case v = 0 is trivial. Also, since Ω_h = {1, ..., N₁} × {1, ..., N₂} is finite, one can interpret each ${\ell }^{2}\left({\Omega},{\mathcal{H}}_{m}\right)$ as ${\mathcal{H}}_{m}^{{N}_{1}{N}_{2}}$ , such that lemma 6.15(e) applied to $H={V}_{h}={\mathcal{H}}_{1}^{{N}_{1}{N}_{2}}{\times}\cdots {\times}{\mathcal{H}}_{M}^{{N}_{1}{N}_{2}}$ immediately yields the stated pointwise identity for the proximal mapping/projection. □

Under mild assumptions on ${S}_{{f}_{h}}$ , equivalence of the primal problem (65) and the saddle-point problem (73) indeed holds and existence of a solution to both (as well as a corresponding dual problem) can be ensured.

Proposition 6.17. Under the assumptions stated for problem (65), there exists a solution. Further, if ${S}_{{f}_{h}}$ is such that ${Y}_{h}={\bigcup }_{t{\geqslant}0}t\left(\mathrm{d}\mathrm{o}\mathrm{m}\left({S}_{{f}_{h}}\right)+\mathrm{r}\mathrm{g}\left({K}_{h}\right)\right)$ , then there exists a solution to the dual problem

$\begin{equation}\underset{\left(v,\lambda \right)\in {V}_{h}{\times}{Y}_{h}}{\mathrm{max}}-{\mathcal{I}}_{\left\{0\right\}}\left({\left({D}_{h}^{1}\right)}^{{\ast}}v+{K}_{h}^{{\ast}}\lambda \right)-{\mathcal{I}}_{\left\{0\right\}}\left({\left({D}_{h}^{2}\right)}^{{\ast}}v\right)-{\mathcal{I}}_{\left\{{{\Vert}\cdot {\Vert}}_{\infty ,{\alpha }^{-1}}{\leqslant}1\right\}}\left(v\right)-{S}_{{f}_{h}}^{{\ast}}\left(\lambda \right)\end{equation} \tag{ 74 }$

and to the saddle-point problem (73). Further, strong duality holds and the problems are equivalent in the sense that ((u, w), (v, λ)) is a solution to (73) if and only if (u, w) solves (65) and (v, λ) solves (74).

Proof. At first note that existence for (65) can be shown as in theorem 3.26.

By virtue of theorem 5.7, choosing X = U_h × W_h, Y = V_h × Y_h, $F:X\to \left.\right]- \infty ,\infty \left.\right]$ as F = 0, $G:Y\to \left.\right]- \infty ,\infty \left.\right]$ as $G\left(v,\lambda \right)={{\Vert}v{\Vert}}_{1,\alpha }+{S}_{{f}_{h}}\left(\lambda \right)$ , and Λ : X → Y as ${\Lambda}\left(u,w\right)=\left({D}_{h}\left(u,w\right),{K}_{h}u\right)$ , we only need to verify (39) to obtain existence of dual solutions and strong duality. But since dom(F) = X and $\mathrm{d}\mathrm{o}\mathrm{m}\left(G\right)={V}_{h}{\times}\mathrm{d}\mathrm{o}\mathrm{m}\left({S}_{{f}_{h}}\right)$ , the latter condition is equivalent to ${Y}_{h}={\bigcup }_{t{\geqslant}0}t\left(\mathrm{d}\mathrm{o}\mathrm{m}\left({S}_{{f}_{h}}\right)+\mathrm{r}\mathrm{g}\left({K}_{h}\right)\right)$ . Also, since ${F}^{{\ast}}={\mathcal{I}}_{\left\{0\right\}}$ and ${\left({{\Vert}\cdot {\Vert}}_{1,\alpha }\right)}^{{\ast}}={\mathcal{I}}_{\left\{{{\Vert}\cdot {\Vert}}_{\infty ,{\alpha }^{-1}}{\leqslant}1\right\}}$ , see remark 6.9, the maximisation problem in (40) corresponds to (74). Finally, the equivalence of the saddle-point problem (73) to (65) and (74) then follows from [83, proposition III.3.1]. □

Remark 6.18. In some applications, it is beneficial to add an additional penalty term on u in form of Φ : U_h → [0, ∞] proper, convex and lower semi-continuous to the energy of (65), whereas in other situations when $u{\mapsto}{S}_{{f}_{h}}\left({K}_{h}u\right)$ has a suitable structure, a dualisation of the data term is not necessary, see the discussion below. Regarding the former, the differences when extending proposition 6.17 is that existence for the primal problem needs to be shown differently and that the domain of Φ needs to be taken into account for obtaining strong duality. Existence can, for instance, be proved when assuming that either Φ is the indicator function of a polyhedral set (see [35, proposition 1]), or that ker(K_h) ∩ ker(D_h) = {0}. Duality is further obtained when ${Y}_{h}={\bigcup }_{t{\geqslant}0}t\left(\mathrm{d}\mathrm{o}\mathrm{m}\left({S}_{{f}_{h}}\right)-{K}_{h}\enspace \mathrm{d}\mathrm{o}\mathrm{m}\left({\Phi}\right)\right)$ . Regarding the latter, a version of proposition 6.17 without the dualisation of the data term ${S}_{{f}_{h}}\left({K}_{h}u\right)$ holds even without the assumption on the domain of ${S}_{{f}_{h}}$ , however, with a different associated dual problem and saddle-point problem.

In particular, not dualising the data term has impact on the primal-dual optimisation algorithms. In view of the iteration (70), the evaluation of the proximal mapping for $u{\mapsto}{S}_{{f}_{h}}\left({K}_{h}u\right)$ then becomes necessary, so this dualisation strategy is only practical if the latter proximal mapping can easily be computed. Furthermore, in case of a sufficiently smooth data term, dualisation of ${S}_{{f}_{h}}$ can also be avoided by using explicit descent steps for ${S}_{{f}_{h}}$ instead of proximal mappings, where the Lipschitz constant of the derivative of ${S}_{{f}_{h}}$ usually enters in the stepsize bound. See [61] for an extension of the primal-dual algorithm in that direction.

In view of proposition 6.17, we now address the numerical solution of the saddle-point problem (73). Applying the iteration (70), this results in algorithm 1, which is given in a general form. A concrete implementation still requires an explicit form of the proximal mapping ${\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\sigma {S}_{{f}_{h}}^{{\ast}}}$ , a concrete choice of V_h, W_h and D_h as well as an estimate on ||(D_h, K_h)|| for the stepsize choice and a suitable stopping criterion. These building blocks will now be provided for different choices of ${\mathcal{R}}_{\alpha }$ and ${S}_{{f}_{\alpha }}$ in a way that they can be combined to an arrive at a concrete, application-specific algorithm. After that, two examples will be discussed.

Algorithm 1. Primal-dual scheme for the numerical solution of (73).

1: function Tikhonov(K_h, f_h, α)

2: $\left(u,w,\overline{u},\overline{w}\right){\leftarrow}\left(0,0,0,0\right),\left(v,\lambda \right){\leftarrow}\left(0,0\right)$ $\left(u,w,\overline{u},\overline{w}\right){\leftarrow}\left(0,0,0,0\right),\left(v,\lambda \right){\leftarrow}\left(0,0\right)$

3: Choose σ, τ > 0 such that $\sigma \tau {{\Vert}\left(\begin{matrix}\hfill {D}_{h}^{1}\hfill & \hfill {D}_{h}^{2}\hfill \\ \hfill {K}_{h}\hfill & \hfill 0\hfill \end{matrix}\right){\Vert}}^{2}{< }1$ $\sigma \tau {{\Vert}\left(\begin{matrix}\hfill {D}_{h}^{1}\hfill & \hfill {D}_{h}^{2}\hfill \\ \hfill {K}_{h}\hfill & \hfill 0\hfill \end{matrix}\right){\Vert}}^{2}{< }1$

4: repeat

5: Dual updates

6: $v{\leftarrow}{\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{j}}_{\left\{{\Vert}\cdot {{\Vert}}_{\infty ,{\alpha }^{-1}}{\leqslant}1\right\}}\left(v+\sigma \left({D}_{h}^{1}\overline{u}+{D}_{h}^{2}\overline{w}\right)\right)$ $v{\leftarrow}{\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{j}}_{\left\{{\Vert}\cdot {{\Vert}}_{\infty ,{\alpha }^{-1}}{\leqslant}1\right\}}\left(v+\sigma \left({D}_{h}^{1}\overline{u}+{D}_{h}^{2}\overline{w}\right)\right)$

7: $\lambda {\leftarrow}{\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\sigma {S}_{{f}_{h}}^{{\ast}}}\left(\lambda +\sigma {K}_{h}\overline{u}\right)$ $\lambda {\leftarrow}{\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\sigma {S}_{{f}_{h}}^{{\ast}}}\left(\lambda +\sigma {K}_{h}\overline{u}\right)$

8: Primal updates

9: ${u}_{+}{\leftarrow}u-\tau \left({\left({D}_{h}^{1}\right)}^{{\ast}}v+{K}_{h}^{{\ast}}\lambda \right)$ ${u}_{+}{\leftarrow}u-\tau \left({\left({D}_{h}^{1}\right)}^{{\ast}}v+{K}_{h}^{{\ast}}\lambda \right)$

10: ${w}_{+}{\leftarrow}w-\tau \left({\left({D}_{h}^{2}\right)}^{{\ast}}v\right)$ ${w}_{+}{\leftarrow}w-\tau \left({\left({D}_{h}^{2}\right)}^{{\ast}}v\right)$

11: Extrapolation and update

12: $\left(\overline{u},\overline{w}\right){\leftarrow}2\left({u}_{+},{w}_{+}\right)-\left(u,w\right)$ $\left(\overline{u},\overline{w}\right){\leftarrow}2\left({u}_{+},{w}_{+}\right)-\left(u,w\right)$

13: (u, w) ← (u₊, w₊)

14: until stopping criterion fulfiled

15: return u

16: end function

Proximal mapping of ${S}_{{f}_{h}}^{{\ast}}$ . Depending on the application of interest, and in particular on the assumption on the underlying measurement noise, different choices of ${S}_{{f}_{h}}$ are reasonable. The one which is probably most relevant in practice is

$\begin{equation*}{S}_{{f}_{h}}\left(\lambda \right)=\frac{1}{2}{\Vert}\lambda -{f}_{h}{{\Vert}}_{2}^{2},\end{equation*}$

which, from a statistical perspective, is the right choice under the assumption of Gaussian noise. In this case, as discussed in lemma 6.15, the proximal mapping of the dual is given as

$\begin{equation*}{\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\sigma {S}_{{f}_{h}}^{{\ast}}}\left(\lambda \right)=\frac{\lambda -\sigma {f}_{h}}{1+\sigma }.\end{equation*}$

A second, practically relevant choice is the Kullback–Leibler divergence as in (2). For discrete data ${\left({\left({f}_{h}\right)}_{i}\right)}_{i}$ satisfying ${\left({f}_{h}\right)}_{i}{\geqslant}0$ for each i, and a corresponding discrete signal ${\left({\lambda }_{i}\right)}_{i}$ , this corresponds to

$\begin{equation}{S}_{{f}_{h}}\left(\lambda \right)=\mathrm{K}\mathrm{L}\left(\lambda ,{f}_{h}\right)=\begin{cases}\sum _{i}{\lambda }_{i}-{\left({f}_{h}\right)}_{i}-{\left({f}_{h}\right)}_{i}\enspace \mathrm{log}\left(\frac{{\lambda }_{i}}{{\left({f}_{h}\right)}_{i}}\right)\hfill & \text{if}\enspace {\lambda }_{i}\in \left[\right.0,\infty \left[\right. \enspace \text{for}\enspace \text{all}\enspace i,\hfill \\ \infty \hfill & \text{else,}\hfill \end{cases}\end{equation} \tag{ 75 }$

where we again use the convention ${\left({f}_{h}\right)}_{i}\enspace \mathrm{log}\left(0\right)=- \infty$ for ${\left({f}_{h}\right)}_{i}{ >}0$ and $0\enspace \mathrm{log}\left(\frac{{\lambda }_{i}}{0}\right)=0$ whenever λ_i ⩾ 0. A direct computation (see for instance [122]) shows that in this case

$\begin{equation*}{\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\sigma {S}_{{f}_{h}}^{{\ast}}}{\left(\lambda \right)}_{i}={\lambda }_{i}-\frac{{\lambda }_{i}-1+\sqrt{{\left({\lambda }_{i}-1\right)}^{2}+4\sigma {\left({f}_{h}\right)}_{i}}}{2}.\end{equation*}$

Another choice that is relevant in the presence of strong data outliers (e.g., due to transmission errors) is

$\begin{equation*}{S}_{{f}_{h}}\left(\lambda \right)={\Vert}\lambda -{f}_{h}{{\Vert}}_{1}\end{equation*}$

in which case

$\begin{equation*}{\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\sigma {S}_{{f}_{h}}^{{\ast}}}{\left(\lambda \right)}_{i}=\frac{{\lambda }_{i}-\sigma {\left({f}_{h}\right)}_{i}}{\mathrm{max}\left\{1,\vert {\lambda }_{i}-\sigma {\left({f}_{h}\right)}_{i}\vert \right\}}\end{equation*}$

can be obtained from lemmas 6.13 and 6.15.

As already mentioned in remark 6.18, in case the discrepancies term is not dualised, a corresponding version of the algorithm of [60] requires the proximal mappings of $\tau {S}_{{f}_{h}}$ which can either be computed directly or obtained from ${\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\sigma {S}_{{f}_{h}}^{{\ast}}}$ using Moreau's identity as in lemma 6.13. Further, there are many other choices of ${S}_{{f}_{h}}$ for which the ${\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\sigma {S}_{{f}_{h}}^{{\ast}}}$ is simple and explicit, such as, for instance, equality constraints on a subdomain in the case of image inpainting or box constraints in case of dequantisation or image decompression.

Choice of ${\mathcal{R}}_{\alpha }$ and proximal mapping. As we show now, the general form ${\mathcal{R}}_{\alpha }\left(u\right)={\mathrm{min}}_{w\in {W}_{h}}{\Vert}{D}_{h}\left(u,w\right){{\Vert}}_{1,\alpha }$ covers all higher-order regularisation approaches discussed in the previous sections.

Example 6.19.

Higher-order total variation. The choice ${\mathcal{R}}_{\alpha }\left(u\right)=\alpha {\Vert}{\nabla }^{k}u{{\Vert}}_{1}$ , with k ⩾ 1 the order of differentiation, can be realised with
$\begin{equation*}{W}_{h}=\left\{0\right\},\quad {V}_{h}={\ell }^{2}\left({{\Omega}}_{h},{\mathcal{T}}^{k}\left({\mathbf{R}}^{2}\right)\right),\quad {D}_{h}={\nabla }_{h}^{k},\quad {\Vert}v{{\Vert}}_{1,\alpha }=\alpha {\Vert}v{{\Vert}}_{1},\end{equation*}$
which yields, according to lemma 6.16, for (i, j) ∈ Ω_h that
$\begin{equation}{\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{j}}_{\left\{{\Vert}\cdot {{\Vert}}_{\infty ,{\alpha }^{-1}}{\leqslant}1\right\}}{\left(v\right)}_{i,j}={\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{j}}_{\left\{{\Vert}\cdot {{\Vert}}_{\infty }{\leqslant}\alpha \right\}}{\left(v\right)}_{i,j}=\frac{{v}_{i,j}}{\mathrm{max}\left\{1,{\alpha }^{-1}\vert v{\vert }_{i,j}\right\}}.\end{equation} \tag{ 76 }$
Here, we used that whenever W_h = {0}, one can ignore the second argument of D_h : U_h × W_h → V_h and regard it as operator D_h : U_h → V_h.
Sum of higher-order TV functionals. The choice ${\mathcal{R}}_{\alpha }\left(u\right)={\alpha }_{1}{\Vert}{\nabla }_{h}^{{k}_{1}}u{{\Vert}}_{1}+{\alpha }_{2}{\Vert}{\nabla }_{h}^{{k}_{2}}u{{\Vert}}_{1}$ , with k₂ > k₁ ⩾ 1 and α_i > 0 for i = 1, 2 differentiation orders and weighting parameters, respectively, can be realised with
$\begin{equation*}\begin{gathered}{c}{W}_{h}=\left\{0\right\},\quad {V}_{h}={\ell }^{2}\left({{\Omega}}_{h},{\mathcal{T}}^{{k}_{1}}\left({\mathbf{R}}^{2}\right)\right){\times}{\ell }^{2}\left({{\Omega}}_{h},{\mathcal{T}}^{{k}_{2}}\left({\mathbf{R}}^{2}\right)\right),\quad {D}_{h}=\left(\begin{matrix}\hfill {\nabla }_{h}^{{k}_{1}}\hfill \\ \hfill {\nabla }_{h}^{{k}_{2}}\hfill \end{matrix}\right),\\ {\Vert}\left({v}_{1},{v}_{2}\right){{\Vert}}_{1,\alpha }={\alpha }_{1}{\Vert}{v}_{1}{{\Vert}}_{1}+{\alpha }_{2}{\Vert}{v}_{2}{{\Vert}}_{1},\end{gathered}\end{equation*}$
and yields, according to lemma 6.16,
$\begin{equation}{\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{j}}_{\left\{{\Vert}\cdot {{\Vert}}_{\infty ,{\alpha }^{-1}}{\leqslant}1\right\}}\left({v}_{1},{v}_{2}\right)=\left({\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{j}}_{\left\{{\Vert}\cdot {{\Vert}}_{\infty }{\leqslant}{\alpha }_{1}\right\}}\left({v}_{1}\right),\enspace {\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{j}}_{\left\{{\Vert}\cdot {{\Vert}}_{\infty }{\leqslant}{\alpha }_{2}\right\}}\left({v}_{2}\right)\right)\end{equation} \tag{ 77 }$
with ${\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{j}}_{\left\{{\Vert}\cdot {{\Vert}}_{\infty }{\leqslant}{\alpha }_{i}\right\}}$ as in (76).
Infimal convolution of higher-order TV functionals. The infimal convolution
$\begin{equation*}{\mathcal{R}}_{\alpha }\left(u\right)=\underset{w\in {\ell }^{2}\left({{\Omega}}_{h}\right)}{\mathrm{min}}{\alpha }_{1}{\Vert}{\nabla }_{h}^{{k}_{1}}u-{\nabla }_{h}^{{k}_{1}}w{{\Vert}}_{1}+{\alpha }_{2}{\Vert}{\nabla }_{h}^{{k}_{2}}w{{\Vert}}_{1}\end{equation*}$
can be realised via
$\begin{equation*}\begin{aligned}\hfill {W}_{h}& ={\ell }^{2}\left({{\Omega}}_{h}\right),\quad {V}_{h}={\ell }^{2}\left({{\Omega}}_{h},{\mathcal{T}}^{{k}_{1}}\left({\mathbf{R}}^{2}\right)\right){\times}{\ell }^{2}\left({{\Omega}}_{h},{\mathcal{T}}^{{k}_{2}}\left({\mathbf{R}}^{2}\right)\right),\hfill \\ \hfill {D}_{h}& =\left({D}_{h}^{1}\enspace \vert \enspace {D}_{h}^{2}\right)=\left(\begin{matrix}\hfill {\nabla }_{h}^{{k}_{1}}\hfill \\ \hfill 0\hfill \end{matrix}\left\vert \begin{matrix}\hfill -{\nabla }_{h}^{{k}_{1}}\hfill \\ \hfill {\nabla }_{h}^{{k}_{2}}\hfill \end{matrix}\right)\right.,\quad {\Vert}\left({v}_{1},{v}_{2}\right){{\Vert}}_{1,\alpha }={\alpha }_{1}{\Vert}{v}_{1}{{\Vert}}_{1}+{\alpha }_{2}{\Vert}{v}_{2}{{\Vert}}_{1},\hfill \end{aligned}\end{equation*}$
where ${\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{j}}_{\left\{{\Vert}\cdot {{\Vert}}_{\infty ,{\alpha }^{-1}}{\leqslant}1\right\}}$ is given as in (77).
Second-order total generalised variation. Let α₀, α₁ > 0. The choice
$\begin{equation*}{\mathcal{R}}_{\alpha }\left(u\right)={\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{2}\left(u\right)=\underset{w\in {\ell }^{2}\left({{\Omega}}_{h},{\mathrm{S}\mathrm{y}\mathrm{m}}^{1}\left({\mathbf{R}}^{2}\right)\right)}{\mathrm{min}}{\alpha }_{1}{\Vert}{\nabla }_{h}u-w{{\Vert}}_{1}+{\alpha }_{0}{\Vert}{\mathcal{E}}_{h}w{{\Vert}}_{1}\end{equation*}$
can be realised via
$\begin{equation*}\begin{aligned}\hfill {W}_{h}={\ell }^{2}\left({{\Omega}}_{h},{\mathrm{S}\mathrm{y}\mathrm{m}}^{1}\left({\mathbf{R}}^{2}\right)\right),\quad {V}_{h}={\ell }^{2}\left({{\Omega}}_{h},{\mathrm{S}\mathrm{y}\mathrm{m}}^{1}\left({\mathbf{R}}^{2}\right)\right){\times}{\ell }^{2}\left({{\Omega}}_{h},{\mathrm{S}\mathrm{y}\mathrm{m}}^{2}\left({\mathbf{R}}^{2}\right)\right),\\ \hfill {D}_{h}=\left({D}_{h}^{1}\enspace \vert \enspace {D}_{h}^{2}\right)=\left(\left.\begin{matrix}\hfill {\nabla }_{h}\hfill \\ \hfill 0\hfill \end{matrix}\right\vert \begin{matrix}\hfill -\mathrm{i}\mathrm{d}\hfill \\ \hfill {\mathcal{E}}_{h}\hfill \end{matrix}\right),\quad {\Vert}\left({v}_{1},{v}_{2}\right){{\Vert}}_{1,\alpha }={\alpha }_{1}{\Vert}{v}_{1}{{\Vert}}_{1}+{\alpha }_{0}{\Vert}{v}_{2}{{\Vert}}_{1},\end{aligned}\end{equation*}$
where ${\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{j}}_{\left\{{\Vert}\cdot {{\Vert}}_{\infty ,{\alpha }^{-1}}{\leqslant}1\right\}}$ is given again as in (77) with α₂ replaced by α₀.
Total generalised variation of order k . The total generalised variation functional of arbitrary order k ∈ N, k ⩾ 1, and weights $\alpha =\left({\alpha }_{0},\dots ,{\alpha }_{k-1}\right)\in {\left.\right]0,\infty \left[\right.}^{k}$ , i.e.,
$\begin{equation*}{\mathcal{R}}_{\alpha }\left(u\right)={\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k}\left(u\right)=\underset{\genfrac{}{}{0pt}{}{w=\left({w}_{1},\dots ,{w}_{k-1}\right)\in {W}_{h},}{{w}_{0}=u,\enspace {w}_{k}=0}}{\mathrm{min}}\sum _{m=1}^{k}{\alpha }_{k-m}{\Vert}{\mathcal{E}}_{h}{w}_{m-1}-{w}_{m}{{\Vert}}_{1}\end{equation*}$
can be realised via
In this case,
$\begin{equation*}{\left({\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{j}}_{\left\{{\Vert}\cdot {{\Vert}}_{\infty ,{\alpha }^{-1}}{\leqslant}1\right\}}\left(v\right)\right)}_{m}={\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{j}}_{\left\{{\Vert}\cdot {{\Vert}}_{\infty }{\leqslant}{\alpha }_{k-m}\right\}}\left({v}_{m}\right),\quad m=1,\dots ,k,\end{equation*}$
where ${\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{j}}_{\left\{{\Vert}\cdot {{\Vert}}_{\infty }{\leqslant}{\alpha }_{m}\right\}}$ is given as in (76).

Stepsize choice and stopping rule. Algorithm 1 requires to choose stepsizes σ, τ > 0 such that $\sigma \tau {\Vert}\mathcal{K}{{\Vert}}^{2}{< }1$ where $\mathcal{K}=\left(\begin{matrix}\hfill {D}_{h}^{1}\hfill & \hfill {D}_{h}^{2}\hfill \\ \hfill {K}_{h}\hfill & \hfill 0\hfill \end{matrix}\right)$ . This, in turn, requires to estimate ${\Vert}\mathcal{K}{\Vert}$ which we discuss on the following. The operator K_h is application-dependent and we assume an upper bound for its norm to be given. Regarding the differential operator ${D}_{h}=\left({D}_{h}^{1},{D}_{h}^{2}\right)$ , an estimate on the norm of its building blocks ${\nabla }_{h}^{k}$ , ${\mathcal{E}}_{h}^{k}$ is provided in lemma 6.5. As the following proposition shows, an upper bound on ${\Vert}\mathcal{K}{\Vert}$ as well as on the norm of more general block-operators, can then be obtained by computing a simple singular value decomposition of a usually low-dimensional matrix.

Lemma 6.20. Assume that $\mathcal{K}:\mathcal{X}\to \mathcal{Y}$ with $\mathcal{X}={\mathcal{X}}_{1}{\times}\cdots {\times}{\mathcal{X}}_{N}$ , $\mathcal{Y}={\mathcal{Y}}_{1}{\times}\cdots {\times}{\mathcal{Y}}_{M}$ is given as

$\begin{equation*}\mathcal{K}=\left(\begin{matrix}\hfill {\mathcal{K}}_{1,1}\hfill & \hfill \cdots \hfill & \hfill {\mathcal{K}}_{1,N}\hfill \\ \hfill {\vdots}\hfill & \hfill \hfill & \hfill {\vdots}\hfill \\ \hfill {\mathcal{K}}_{M,1}\hfill & \hfill \cdots \hfill & \hfill {\mathcal{K}}_{M,N}\hfill \end{matrix}\right)\end{equation*}$

and that ${\Vert}{\mathcal{K}}_{m,n}{\Vert}{\leqslant}{L}_{m,n}$ for each m = 1, ..., M, n = 1, ..., N. Then,

$\begin{equation*}{\Vert}\mathcal{K}{\Vert}{\leqslant}{\sigma }_{\mathrm{max}}\left(\left(\begin{matrix}\hfill {L}_{1,1}\hfill & \hfill \cdots \hfill & \hfill {L}_{1,N}\hfill \\ \hfill {\vdots}\hfill & \hfill \hfill & \hfill {\vdots}\hfill \\ \hfill {L}_{M,1}\hfill & \hfill \cdots \hfill & \hfill {L}_{M,N}\hfill \end{matrix}\right)\right)\end{equation*}$

where σ_max denotes the largest singular value of a matrix.

Proof. For $x=\left({x}_{1},\dots ,{x}_{N}\right)\in \mathcal{X}$ we estimate

$\begin{align*}\hfill {\Vert}\mathcal{K}x{{\Vert}}_{2}^{2}& =\sum _{m=1}^{M}{{\Vert}\sum _{n=1}^{N},{\mathcal{K}}_{m,n},{x}_{n}{\Vert}}_{2}^{2}{\leqslant}\sum _{m=1}^{M}{\left(\sum _{n=1}^{N}{L}_{m,n}{\Vert}{x}_{n}{{\Vert}}_{2}\right)}^{2}\hfill \\ \hfill & ={{\Vert}\left(\begin{matrix}\hfill {L}_{1,1}\hfill & \hfill \cdots \hfill & \hfill {L}_{1,N}\hfill \\ \hfill {\vdots}\hfill & \hfill \hfill & \hfill {\vdots}\hfill \\ \hfill {L}_{M,1}\hfill & \hfill \cdots \hfill & \hfill {L}_{M,N}\hfill \end{matrix}\right),\left(\begin{matrix}\hfill {\Vert}{x}_{1}{{\Vert}}_{2}\hfill \\ \hfill {\vdots}\hfill \\ \hfill {\Vert}{x}_{N}{{\Vert}}_{2}\hfill \end{matrix}\right){\Vert}}_{2}^{2},\hfill \end{align*}$

from which the claimed assertion follows since the matrix norm induced by the two-norm corresponds to the largest singular value. □

This result can be applied in the setting (73), i.e., $\mathcal{K}=\left(\begin{matrix}\hfill {D}_{h}^{1}\hfill & \hfill {D}_{h}^{2}\hfill \\ \hfill {K}_{h}\hfill & \hfill 0\hfill \end{matrix}\right)$ , leading to

$\begin{equation*}{{\Vert}\mathcal{K}{\Vert}}^{2}{\leqslant}\frac{{{\Vert}{D}_{h}^{1}{\Vert}}^{2}+{{\Vert}{D}_{h}^{2}{\Vert}}^{2}+{{\Vert}{K}_{h}{\Vert}}^{2}+\sqrt{{\left({{\Vert}{D}_{h}^{1}{\Vert}}^{2}+{{\Vert}{D}_{h}^{2}{\Vert}}^{2}+{{\Vert}{K}_{h}{\Vert}}^{2}\right)}^{2}-4{{\Vert}{D}_{h}^{2}{\Vert}}^{2}{{\Vert}{K}_{h}{\Vert}}^{2}}}{2}.\end{equation*}$

Alternatively, one could use the result when ${D}_{h}^{1}$ or ${D}_{h}^{2}$ have block structures and a norm estimate is known for each block in addition to an estimate on ||K_h||. Two concrete examples will be provided at the end of this section below.

Remark 6.21. In practice, provided that L_m,n is a good upper bound for ${\Vert}{\mathcal{K}}_{m,n}{\Vert}$ , the norm estimate of lemma 6.20 is rather tight such that, depending on ${\Vert}\mathcal{K}{\Vert}$ , the admissible stepsizes can be sufficiently large. Furthermore, the constraint $\sigma \tau {\Vert}\mathcal{K}{{\Vert}}^{2}{< }1$ still allows to choose an arbitrary positive ratio θ = σ/τ and, in our experience, often a choice θ ≪ 1 or θ ≫ 1 can accelerate convergence significantly. Finally, we also note that in case no estimate on ${\Vert}\mathcal{K}{\Vert}$ can be obtained, or in case an explicit estimate only allows for prohibitively small stepsizes, also an adaptive stepsize choice without prior knowledge of ${\Vert}\mathcal{K}{\Vert}$ is possible, see for instance [34].

Remark 6.22. It is worth mentioning that, in case of a uniformly convex functional in the saddle-point formulation (73) (which is not the case in the setting considered here), a further acceleration can be achieved by adaptive stepsize choices, see for instance [60].

Remark 6.23. Regarding a suitable stopping criterion, we note that often, the primal-dual gap, i.e., the gap between the energy of the primal and dual problem (66) and (68) evaluated at the current iterates, provides a good measure for optimality. Indeed, with

$\begin{equation*}\mathfrak{G}\left(x,y\right)=\mathcal{F}\left(\mathcal{K}x\right)+\mathcal{G}\left(x\right)+{\mathcal{G}}^{{\ast}}\left(-{\mathcal{K}}^{{\ast}}y\right)+{\mathcal{F}}^{{\ast}}\left(y\right),\end{equation*}$

$\mathfrak{G}\left(x,y\right){\geqslant}0$ and $\mathfrak{G}\left(x,y\right)=0$ if and only if (x, y) is optimal such that, in principle, the condition $\mathfrak{G}\left({x}^{n},{y}^{n}\right){< }\varepsilon$ with (xⁿ, yⁿ) the iterates of (70) can be used as stopping criterion. In case this condition is met, xⁿ as well as yⁿ are both optimal up to an ɛ-tolerance in terms of the objective functionals for the primal and dual problem, respectively.

In the present situation of (73), however, the primal and dual problem (65) and (74) yield

$\begin{equation*}\mathfrak{G}\left(u,w,v,\lambda \right)=\begin{cases}_{{f}_{h}}\left({K}_{h}u\right)+{{\Vert}{D}_{h}\left(u,w\right){\Vert}}_{1,\alpha }\quad \hfill & \enspace \text{if}\;{\left({D}_{h}^{1}\right)}^{{\ast}}v+{K}_{h}^{{\ast}}\lambda =0\hfill \\ \quad \enspace \enspace \enspace \enspace +\enspace {\mathcal{I}}_{\left\{{{\Vert}\cdot {\Vert}}_{\infty ,{\alpha }^{-1}}{\leqslant}1\right\}}\left(v\right)+{S}_{{f}_{h}}^{{\ast}}\left(\lambda \right)\quad \hfill & \enspace \text{and}\;{\left({D}_{h}^{2}\right)}^{{\ast}}v=0,\hfill \\ \infty \quad \hfill & \enspace \text{else}.\hfill \end{cases}\end{equation*}$

While for the iterates (uⁿ, wⁿ, vⁿ, λⁿ), we always have ${{\Vert}{v}^{n}{\Vert}}_{\infty ,{\alpha }^{-1}}{\leqslant}1$ as well as ${\lambda }^{n}\in \mathrm{d}\mathrm{o}\mathrm{m}\left({S}_{{f}_{h}}^{{\ast}}\right)$ , algorithm 1 does not guarantee that ${K}_{h}{u}^{n}\in \mathrm{d}\mathrm{o}\mathrm{m}\left({S}_{{f}_{h}}\right)$ and $\left({D}_{h}^{1}\right){v}^{n}+{K}_{h}^{{\ast}}{\lambda }^{n}=0$ as well as ${\left({D}_{h}^{2}\right)}^{{\ast}}{v}^{n}=0$ , such that the primal-dual gap is always infinite in practice and the stopping criterion is never met. With some adaptations, however, it is sometimes still possible to obtain a primal-dual gap that converges to zero and hence, to deduce a stopping criterion with optimality guarantees. There are several possibilities for achieving this. Let us, for simplicity, assume that both ${S}_{{f}_{h}}$ and ${S}_{{f}_{h}}^{{\ast}}$ are finite everywhere and hence, continuous. This is, for example, the case for ${S}_{{f}_{h}}=\frac{1}{2}{{\Vert}\cdot -{f}_{h}{\Vert}}^{2}$ . Next, assume that a priori norm estimates are available for all solution pairs (u*, w*), say ${{\Vert}{u}^{{\ast}}{\Vert}}_{{\tilde{U}}_{h}}{\leqslant}{C}_{u}$ and ${{\Vert}{w}^{{\ast}}{\Vert}}_{{\tilde {W}}_{h}}{\leqslant}{C}_{w}$ for norms ${{\Vert}\cdot {\Vert}}_{{\tilde{U}}_{h}}$ , ${{\Vert}\cdot {\Vert}}_{{\tilde {W}}_{h}}$ on U_h, W_h that do not necessarily correspond to the Hilbert space norms. Such estimates may, for instance, be obtained from the observation that ${S}_{{f}_{h}}\left({u}^{{\ast}}\right)+{{\Vert}{D}_{h}\left({u}^{{\ast}},{w}^{{\ast}}\right){\Vert}}_{1,\alpha }{\leqslant}{S}_{{f}_{h}}\left(0\right)$ and suitable coercivity estimates, as discussed in sections 3–5. Then, the primal problem can, for instance, be replaced by

$\begin{equation*}\underset{\left(u,w\right)\in {U}_{h}{\times}{W}_{h}}{\mathrm{min}}\enspace {S}_{{f}_{h}}\left({K}_{h}u\right)+{{\Vert}{D}_{h}\left(u,w\right){\Vert}}_{1,\alpha }+\frac{1}{2}{\left({{\Vert}u{\Vert}}_{{\tilde{U}}_{h}}-{C}_{u}\right)}_{+}^{2}+\frac{1}{2}{\left({{\Vert}w{\Vert}}_{{\tilde {W}}_{h}}-{C}_{w}\right)}_{+}^{2}\end{equation*}$

where (t)₊ = max{0, t} for t ∈ R, which has, by construction, the same minimisers as the original problem (65), but a dual problem that reads as

$\begin{equation*}\begin{aligned}\hfill & \underset{\left(v,\lambda \right)\in {V}_{h}{\times}{Y}_{h}}{\mathrm{max}}-\frac{1}{2}{\left({{\Vert}{\left({D}_{h}^{1}\right)}^{{\ast}}v+{K}_{h}^{{\ast}}\lambda {\Vert}}_{{\tilde{U}}_{h}^{{\ast}}}^{2}+{C}_{u}\right)}^{2}+\frac{{C}_{u}^{2}}{2}-\frac{1}{2}{\left({{\Vert}{\left({D}_{h}^{2}\right)}^{{\ast}}v{\Vert}}_{{\tilde {W}}_{h}^{{\ast}}}^{2}+{C}_{w}\right)}^{2}+\frac{{C}_{w}^{2}}{2}\hfill \\ \hfill & \quad \quad \quad \quad -{\mathcal{I}}_{\left\{{{\Vert}\cdot {\Vert}}_{\infty ,{\alpha }^{-1}}{\leqslant}1\right\}}\left(v\right)-{S}_{{f}_{h}}^{{\ast}}\left(\lambda \right),\hfill \end{aligned}\end{equation*}$

where ${{\Vert}\cdot {\Vert}}_{{\tilde{U}}_{h}^{{\ast}}}$ and ${{\Vert}\cdot {\Vert}}_{{\tilde {W}}_{h}^{{\ast}}}$ denote the respective dual norms. By duality and since the minimum of the primal problem did not change, the modified dual problem also has the same solutions as the original dual problem. Now, as the iterates (vⁿ, λⁿ) satisfy ${{\Vert}{v}^{n}{\Vert}}_{\infty ,{\alpha }^{-1}}{\leqslant}1$ and ${\lambda }^{n}\in \mathrm{d}\mathrm{o}\mathrm{m}\enspace {S}_{{f}_{h}}^{{\ast}}$ , the dual objective is finite for the iterates and converges to the maximum as n → ∞. Analogously, plugging in the sequence (uⁿ, wⁿ) into the modified primal problem yields convergence to the minimum, hence, the respective primal-dual gap converges to zero for the primal-dual iterates (uⁿ, wⁿ, vⁿ, λⁿ). In summary, the functional

$\begin{equation*}\begin{aligned}\hfill \tilde {\mathfrak{G}}\left(u,w,v,\lambda \right)& ={S}_{{f}_{h}}\left({K}_{h}u\right)+{{\Vert}{D}_{h}\left(u,w\right){\Vert}}_{1,\alpha }+{\mathcal{I}}_{\left\{{{\Vert}\cdot {\Vert}}_{\infty ,{\alpha }^{-1}}{\leqslant}1\right\}}\left(v\right)+{S}_{{f}_{h}}^{{\ast}}\left(\lambda \right)\hfill \\ \hfill & \quad +\frac{1}{2}{\left({{\Vert}u{\Vert}}_{{\tilde{U}}_{h}}-{C}_{u}\right)}_{+}^{2}+\frac{1}{2}{\left({{\Vert}w{\Vert}}_{{\tilde {W}}_{h}}-{C}_{w}\right)}_{+}^{2}\hfill \\ \hfill & \quad +\frac{1}{2}{\left({{\Vert}{\left({D}_{h}^{1}\right)}^{{\ast}}v+{K}_{h}^{{\ast}}\lambda {\Vert}}_{{\tilde{U}}_{h}^{{\ast}}}^{2}+{C}_{u}\right)}^{2}-\frac{{C}_{u}^{2}}{2}+\frac{1}{2}{\left({{\Vert}{\left({D}_{h}^{2}\right)}^{{\ast}}v{\Vert}}_{{\tilde {W}}_{h}^{{\ast}}}^{2}+{C}_{w}\right)}^{2}-\frac{{C}_{w}^{2}}{2}\hfill \end{aligned}\end{equation*}$

yields the stopping criterion $\tilde {\mathfrak{G}}\left({u}^{n},{w}^{n},{v}^{n},{\lambda }^{n}\right){< }\varepsilon$ which will be met for some n and gives ɛ-optimality of (uⁿ, wⁿ) for the original primal problem (65).

The examples below show how this primal-dual gap reads for specific applications. For other strategies of modifying the primal-dual gap to a functional that is positive and finite, converges to zero and possibly provides an upper bound on optimality of the iterates in terms of the objective functional, see, for instance [31, 34, 41].

Concrete examples.

Example 6.24. As first example, we consider the minimisation problem

$\begin{equation}\underset{u\in {U}_{h}}{\mathrm{min}}\enspace \frac{1}{2}{\Vert}{K}_{h}u-{f}_{h}{{\Vert}}_{2}^{2}+\alpha {\Vert}{\nabla }_{h}^{2}u{{\Vert}}_{1},\end{equation} \tag{ 78 }$

i.e., second order-TV regularisation for a linear inverse problem with Gaussian measurement noise. In this setting, we choose W_h = {0}, V_h = ℓ²(Ω_h, Sym²(R²)), ${D}_{h}={\nabla }_{h}^{2}$ and ||v||_1,α = α||v||₁. Assuming that ||K_h|| ⩽ 1 (after possible scaling of K_h), lemma 6.20 together with the estimate ${\Vert}{\nabla }_{h}^{2}{\Vert}{\leqslant}8$ from lemma 6.5 yields

$\begin{equation*}{\Vert}\left(\begin{matrix}\hfill {\nabla }_{h}^{2}\hfill \\ \hfill {K}_{h}\hfill \end{matrix}\right){\Vert}{\leqslant}{\sigma }_{\mathrm{max}}\left(\left(\begin{matrix}\hfill 8\hfill \\ \hfill 1\hfill \end{matrix}\right)\right)=\sqrt{65}.\end{equation*}$

The resulting concrete realisation of algorithm 1 can be found in algorithm 2. Here, ${\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{j}}_{\left\{{\Vert}\cdot {{\Vert}}_{\infty }{\leqslant}\alpha \right\}}$ is given explicitly in (76), ${\mathrm{d}\mathrm{i}\mathrm{v}}_{h}^{2}={\mathrm{d}\mathrm{i}\mathrm{v}}_{h}{\mathrm{d}\mathrm{i}\mathrm{v}}_{h}$ is the adjoint of ${\nabla }_{h}^{2}$ and the modified primal-dual gap $\tilde {\mathfrak{G}}$ evaluated on the iterates (u, v, λ) of the algorithm reduces to

$\begin{equation*}\begin{aligned}\hfill \tilde {\mathfrak{G}}\left(u,v,\lambda \right)& =\frac{1}{2}{{\Vert}{K}_{h}u-{f}_{h}{\Vert}}^{2}+\alpha {{\Vert}{\nabla }_{h}^{2}u{\Vert}}_{1}+{\mathcal{I}}_{\left\{{{\Vert}\cdot {\Vert}}_{\infty }{\leqslant}\alpha \right\}}\left(v\right)+\frac{1}{2}{{\Vert}\lambda +{f}_{h}{\Vert}}^{2}-\frac{1}{2}{{\Vert}{f}_{h}{\Vert}}^{2}\hfill \\ \hfill & \quad +\frac{1}{2}{\left({{\Vert}u{\Vert}}_{2}-{C}_{u}\right)}_{+}^{2}+\frac{1}{2}{\left({{\Vert}{\mathrm{d}\mathrm{i}\mathrm{v}}_{h}^{2}v+{K}_{h}^{{\ast}}\lambda {\Vert}}_{2}+{C}_{u}\right)}^{2}-\frac{{C}_{u}^{2}}{2}\hfill \end{aligned}\end{equation*}$

where C_u > 0 is an a priori bound on ${{\Vert}{u}^{{\ast}}{\Vert}}_{2}$ for solutions u* according to (17), for instance.

Algorithm 2. Implementation for solving the L²–TV² problem (78).

1: function L²–TV²-Tikhonov(K_h, f_h, α)	⊳ Requirement: \|\|K_h\|\| ⩽ 1
2: $\left(u,\overline{u}\right){\leftarrow}\left(0,0\right),\left(v,\lambda \right){\leftarrow}\left(0,0\right)$
3: Choose σ, τ > 0 such that $\sigma \tau {< }\frac{1}{65}$
4: repeat
5: Dual updates
6: $v{\leftarrow}{\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{j}}_{\left\{{\Vert}\cdot {{\Vert}}_{\infty }{\leqslant}\alpha \right\}}\left(v+\sigma {\nabla }_{h}^{2}\overline{u}\right)$
7: $\lambda {\leftarrow}\left(\lambda +\sigma \left({K}_{h}\overline{u}-{f}_{h}\right)\right)/\left(1+\sigma \right)$
8: Primal updates
9: ${u}_{+}{\leftarrow}u-\tau \left({\mathrm{d}\mathrm{i}\mathrm{v}}_{h}^{2}v+{K}_{h}^{{\ast}}\lambda \right)$
10: Extrapolation and update
11: $\overline{u}{\leftarrow}2{u}_{+}-u$
12: u ← u₊
13: until stopping criterion fulfiled
14: return u
15: end function

Example 6.25. As second example, we consider ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{2}$ regularisation for an inverse problem with Poisson noise and discrete non-negative data ${\left({\left({f}_{h}\right)}_{i}\right)}_{i}$ , which corresponds to solving

$\begin{equation}\underset{u\in {U}_{h}}{\mathrm{min}}\enspace \mathrm{K}\mathrm{L}\left({K}_{h}u,{f}_{h}\right)+{\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{2}\left(u\right),\end{equation} \tag{ 79 }$

with KL being the discrete Kullback–Leibler divergence as in (75).

In this setting, we choose W_h = ℓ²(Ω_h, Sym¹(R²)), V_h = ℓ²(Ω_h, Sym¹(R²)) × ℓ²(Ω, Sym²(R²)), ${D}_{h}=\left(\begin{matrix}\hfill {\nabla }_{h}\hfill & \hfill -\mathrm{i}\mathrm{d}\hfill \\ 0\hfill & \hfill {\mathcal{E}}_{h}\hfill \end{matrix}\right)$ and ||(v₁, v₂)||_1,α = α₁||v₁||₁ + α₀||v₂||₁. Setting

$\begin{equation*}\mathcal{K}=\left(\begin{matrix}\hfill {\nabla }_{h}\hfill & \hfill -\mathrm{i}\mathrm{d}\hfill \\ \hfill 0\hfill & \hfill {\mathcal{E}}_{h}\hfill \\ \hfill {K}_{h}\hfill & \hfill 0\hfill \end{matrix}\right)\end{equation*}$

and again assuming ||K_h|| ⩽ 1, lemma 6.20 together with the estimates ${\Vert}{\nabla }_{h}{\Vert}{\leqslant}\sqrt{8}$ and ${\Vert}{\mathcal{E}}_{h}{\Vert}{\leqslant}\sqrt{8}$ from lemma 6.5 yields

$\begin{equation*}{\Vert}\mathcal{K}{\Vert}{\leqslant}{\sigma }_{\mathrm{max}}\left(\left(\begin{matrix}\hfill \sqrt{8}\hfill & \hfill 1\hfill \\ \hfill 0\hfill & \hfill \sqrt{8}\hfill \\ \hfill 1\hfill & \hfill 0\hfill \end{matrix}\right)\right)=\sqrt{\sqrt{8}+9}\approx \sqrt{\frac{71}{6}}.\end{equation*}$

The resulting, concrete implementation of algorithm 1 can be found in algorithm 3. Here, again ${\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{j}}_{\left\{{\Vert}\cdot {{\Vert}}_{\infty }{\leqslant}{\alpha }_{i}\right\}}$ is given explicitly in (76), and, abusing notation, div_h is both the negative adjoint of ∇_h and ${\mathcal{E}}_{h}$ , depending on the input. The modified primal-dual gap $\tilde {\mathfrak{G}}$ evaluated on the iterates (u, w, v₁, v₂, λ) of the algorithm reduces to

$\begin{equation*}\begin{aligned}\hfill \tilde {\mathfrak{G}}\left(u,w,{v}_{1},{v}_{2},\lambda \right)& =\mathrm{K}\mathrm{L}\left({K}_{h}u,{f}_{h}\right)+{\alpha }_{1}{{\Vert}{\nabla }_{h}u-w{\Vert}}_{1}+{\alpha }_{0}{{\Vert}{\mathcal{E}}_{h}w{\Vert}}_{1}+{\mathcal{I}}_{\left\{{{\Vert}\cdot {\Vert}}_{\infty }{\leqslant}{\alpha }_{1}\right\}}\left({v}_{1}\right)\hfill \\ \hfill & \quad +{\mathcal{I}}_{\left\{{{\Vert}\cdot {\Vert}}_{\infty }{\leqslant}{\alpha }_{0}\right\}}\left({v}_{2}\right)+{\mathrm{K}\mathrm{L}}^{{\ast}}\left(\lambda ,{f}_{h}\right)+\frac{1}{2}{\left({{\Vert}u{\Vert}}_{2}-{C}_{u}\right)}_{+}^{2}+\frac{1}{2}{\left({{\Vert}w{\Vert}}_{1}-{C}_{w}\right)}_{+}^{2}\hfill \\ \hfill & \quad +\frac{1}{2}{\left({{\Vert}{K}_{h}^{{\ast}}\lambda -{\mathrm{d}\mathrm{i}\mathrm{v}}_{h}{v}_{1}{\Vert}}_{2}+{C}_{u}\right)}^{2}-\frac{{C}_{u}^{2}}{2}+\frac{1}{2}{\left({{\Vert}{v}_{1}+{\mathrm{d}\mathrm{i}\mathrm{v}}_{h}{v}_{2}{\Vert}}_{\infty }+{C}_{w}\right)}^{2}-\frac{{C}_{w}^{2}}{2},\hfill \end{aligned}\end{equation*}$

where ${\mathrm{K}\mathrm{L}}^{{\ast}}\left(\lambda ,{f}_{h}\right)=-{\sum }_{i}{\left({f}_{h}\right)}_{i}\enspace \mathrm{log}\left(1-{\lambda }_{i}\right)$ whenever λ_i ⩽ 1 for each i where ${\left({f}_{h}\right)}_{i}\enspace \mathrm{log}\left(0\right)=\infty$ for ${\left({f}_{h}\right)}_{i}{ >}0$ , 0 log(0) = 0, and KL*(λ, f_h) = ∞ else. Further, C_u is an a priori bound on the two-norm of ${u}^{{\ast}}={w}_{0}^{{\ast}}$ analogous to (18) while C_w is an a priori bound on the one-norm of ${w}^{{\ast}}={w}_{1}^{{\ast}}$ according to (51).

Algorithm 3. Implementation for solving the KL– ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{2}$ problem (79).

1: function KL– ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{2}$ -Tikhonov(K_h, f_h, α)	⊳ Requirement: \|\|K_h\|\| ⩽ 1
2: $\left(u,w,\overline{u},\overline{w}\right){\leftarrow}\left(0,0,0,0\right),\left({v}_{1},{v}_{2},\lambda \right){\leftarrow}\left(0,0,0\right)$
3: Choose σ, τ > 0 such that $\sigma \tau {\leqslant}\frac{6}{71}$
4: repeat
5: Dual updates
6: ${v}_{1}{\leftarrow}{\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{j}}_{\left\{{\Vert}\cdot {{\Vert}}_{\infty }{\leqslant}{\alpha }_{1}\right\}}\left({v}_{1}+\sigma \left({\nabla }_{h}\overline{u}-\overline{w}\right)\right)$
7: ${v}_{2}{\leftarrow}{\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{j}}_{\left\{{\Vert}\cdot {{\Vert}}_{\infty }{\leqslant}{\alpha }_{0}\right\}}\left({v}_{2}+\sigma {\mathcal{E}}_{h}\overline{w}\right)$
8: $\lambda {\leftarrow}\lambda +\sigma {K}_{h}\overline{u}$
9: $\lambda {\leftarrow}\lambda -\frac{\lambda -1+\sqrt{{\left(\lambda -1\right)}^{2}+4\sigma {f}_{h}}}{2}$
10: Primal updates
11: ${u}_{+}{\leftarrow}u+\tau \left({\mathrm{d}\mathrm{i}\mathrm{v}}_{h}{v}_{1}-{K}_{h}^{{\ast}}\lambda \right)$
12: w₊ ← w + τ(v₁ + div_h v₂)
13: Extrapolation and update
14: $\left(\overline{u},\overline{w}\right){\leftarrow}2\left({u}_{+},{w}_{+}\right)-\left(u,w\right)$
15: (u, w) ← (u₊, w₊)
16: until stopping criterion fulfiled
17: return u
18: end function

We refer to, e.g., [27] for more examples of primal-dual-based algorithms for TGV regularisation.

6.3. Implicit and preconditioned optimisation methods

Let us shortly discuss other proximal algorithms for the solution of (65). One popular method is the alternating direction method of multipliers (ADMM) [89, 95] which bases on augmented Lagrangian formulations for (65), for instance,

$\begin{equation}\underset{\left(u,w,v,\lambda \right)\in {U}_{h}{\times}{W}_{h}{\times}{V}_{h}{\times}{Y}_{h}}{\mathrm{min}}\enspace {S}_{{f}_{h}}\left(\lambda \right)+{{\Vert}v{\Vert}}_{1,\alpha }\quad \text{subject}\;\text{to}\;\begin{cases}\lambda ={K}_{h}u,\hfill \\ v={D}_{h}^{1}u+{D}_{h}^{2}w,\hfill \end{cases}\end{equation} \tag{ 80 }$

which results in the augmented Lagrangian

$\begin{equation*}\begin{aligned}\hfill {\mathcal{L}}_{\tau }\left(u,w,v,\lambda ,\overline{v},\overline{\lambda }\right)& ={S}_{{f}_{h}}\left(\lambda \right)+{{\Vert}v{\Vert}}_{1,\alpha }+{\langle {K}_{h}u-\lambda ,\enspace \overline{\lambda }\rangle }_{{Y}_{h}}+{\langle {D}_{h}^{1}u+{D}_{h}^{2}w-v,\enspace \overline{v}\rangle }_{{V}_{h}}\hfill \\ \hfill & \quad +\frac{1}{2\tau }{{\Vert}{K}_{h}u-\lambda {\Vert}}_{{Y}_{h}}^{2}+\frac{1}{2\tau }{{\Vert}{D}_{h}^{1}u+{D}_{h}^{2}w-v{\Vert}}_{{V}_{h}}^{2}\hfill \end{aligned}\end{equation*}$

where τ > 0. For (80), the ADMM algorithm amounts to

$\begin{equation*}\begin{cases}\left({u}^{k+1},{w}^{k+1}\right)\in \underset{\left(u,w\right)\in {U}_{h}{\times}{W}_{h}}{\text{arg}\,\text{min}}\enspace {\mathcal{L}}_{\tau }\left(u,w,{v}^{k},{\lambda }^{k},{\overline{v}}^{k},{\overline{\lambda }}^{k}\right),\hfill \\ \left({v}^{k+1},{\lambda }^{k+1}\right)=\underset{\left(v,\lambda \right)\in {V}_{h}{\times}{Y}_{h}}{\text{arg}\,\text{min}}\enspace {\mathcal{L}}_{\tau }\left({u}^{k+1},{w}^{k+1},v,\lambda ,{\overline{v}}^{k},{\overline{\lambda }}^{k}\right),\hfill \\ {\overline{v}}^{k+1}={\overline{v}}^{k}+\frac{1}{\tau }\left({D}_{h}^{1}{u}^{k+1}+{D}_{h}^{2}{w}^{k+1}-{v}^{k+1}\right),\hfill \\ {\overline{\lambda }}^{k+1}={\overline{\lambda }}^{k}+\frac{1}{\tau }\left({K}_{h}{u}^{k+1}-{\lambda }^{k+1}\right).\hfill \end{cases}\end{equation*}$

Here, the first subproblem amounts to solving a least-squares problem and the associated normal equation is usually stably solvable since ${D}_{h}^{1}$ and ${D}_{h}^{2}$ involve discrete differential operators and hence, the normal equation essentially corresponds to the solution of a discrete elliptic equation that is perturbed by ${K}_{h}^{{\ast}}{K}_{h}$ . For this reason, ADMM is usually considered an implicit method. The second step turns out to be the application of the proximal mappings for ${S}_{{f}_{h}}$ and ||⋅||_1,α while the last update steps have an explicit form, see algorithm 4. By virtue of Moreau's identity (72) (also see lemma 6.13), the operators ${\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\tau {{\Vert}\cdot {\Vert}}_{1,\alpha }}$ and ${\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\tau {S}_{{f}_{h}}}$ can easily be computed knowing ${\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{j}}_{\left\{{\Vert}\cdot {{\Vert}}_{\infty ,{\alpha }^{-1}}{\leqslant}1\right\}}$ and ${\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{{\tau }^{-1}{S}_{{f}_{h}}^{{\ast}}}$ . We have, for instance,

$\begin{equation*}{\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\tau {{\Vert}\cdot {\Vert}}_{1,\alpha }}\left(v\right)=v-{\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{j}}_{\left\{{\Vert}\cdot {{\Vert}}_{\infty ,{\alpha }^{-1}}{\leqslant}\tau \right\}}\left(v\right)=v-\tau {\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{j}}_{\left\{{{\Vert}\cdot {\Vert}}_{\infty ,{\alpha }^{-1}}{\leqslant}1\right\}}\left(\frac{v}{\tau }\right)\end{equation*}$

where the projection operator usually has an explicit representation, see lemma 6.16 and example 6.19. Further, for the discrepancies discussed in subsection 6.2, it holds that

$\begin{equation*}\begin{gathered}{c}{\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\tau \frac{1}{2}{{\Vert}\cdot -{f}_{h}{\Vert}}_{2}^{2}}\left(\lambda \right)=\frac{\lambda +\tau f}{1+\tau },\quad {\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\tau {{\Vert}\cdot -{f}_{h}{\Vert}}_{1}}\left(\lambda \right)=\lambda -\tau \frac{\lambda -{f}_{h}}{\mathrm{max}\left\{\tau ,\left\vert \lambda -{f}_{h}\right\vert \right\}},\\ {\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\tau \mathrm{K}\mathrm{L}\left(\cdot ,{f}_{h}\right)}\left(\lambda \right)=\frac{\sqrt{{\left(\tau -\lambda \right)}^{2}+4\tau {f}_{h}}+\lambda -\tau }{2}.\end{gathered}\end{equation*}$

Algorithm 4. ADMM scheme for the numerical solution of (80).

1: function Tikhonov-ADMM(K_h, f_h, α)

2: $\left(u,w\right){\leftarrow}\left(0,0\right),\left(v,\lambda \right){\leftarrow}\left(0,0\right),\left(\overline{v},\overline{\lambda }\right){\leftarrow}\left(0,0\right)$ $\left(u,w\right){\leftarrow}\left(0,0\right),\left(v,\lambda \right){\leftarrow}\left(0,0\right),\left(\overline{v},\overline{\lambda }\right){\leftarrow}\left(0,0\right)$

3: Choose τ > 0

4: repeat

5: Linear subproblem

6: (u, w) ← solution of

$\left(\begin{matrix}\hfill {K}_{h}^{{\ast}}{K}_{h}+{\left({D}_{h}^{1}\right)}^{{\ast}}{D}_{h}^{1}\hfill & \hfill {\left({D}_{h}^{1}\right)}^{{\ast}}{D}_{h}^{2}\hfill \\ \hfill {\left({D}_{h}^{2}\right)}^{{\ast}}{D}_{h}^{1}\hfill & \hfill {\left({D}_{h}^{2}\right)}^{{\ast}}{D}_{h}^{2}\hfill \end{matrix}\right)\left(\begin{matrix}\hfill u\hfill \\ \hfill w\hfill \end{matrix}\right)=\left(\begin{matrix}\hfill {K}_{h}^{{\ast}}\left(\lambda -\tau \overline{\lambda }\right)+{\left({D}_{h}^{1}\right)}^{{\ast}}\left(v-\tau \overline{v}\right)\hfill \\ \hfill {\left({D}_{h}^{2}\right)}^{{\ast}}\left(v-\tau \overline{v}\right)\hfill \end{matrix}\right)$ $\left(\begin{matrix}\hfill {K}_{h}^{{\ast}}{K}_{h}+{\left({D}_{h}^{1}\right)}^{{\ast}}{D}_{h}^{1}\hfill & \hfill {\left({D}_{h}^{1}\right)}^{{\ast}}{D}_{h}^{2}\hfill \\ \hfill {\left({D}_{h}^{2}\right)}^{{\ast}}{D}_{h}^{1}\hfill & \hfill {\left({D}_{h}^{2}\right)}^{{\ast}}{D}_{h}^{2}\hfill \end{matrix}\right)\left(\begin{matrix}\hfill u\hfill \\ \hfill w\hfill \end{matrix}\right)=\left(\begin{matrix}\hfill {K}_{h}^{{\ast}}\left(\lambda -\tau \overline{\lambda }\right)+{\left({D}_{h}^{1}\right)}^{{\ast}}\left(v-\tau \overline{v}\right)\hfill \\ \hfill {\left({D}_{h}^{2}\right)}^{{\ast}}\left(v-\tau \overline{v}\right)\hfill \end{matrix}\right)$

7: Proximal subproblem

8: $v{\leftarrow}{\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\tau {{\Vert}\cdot {\Vert}}_{1,\alpha }}\left({D}_{h}^{1}u+{D}_{h}^{2}w+\tau \overline{v}\right)$ $v{\leftarrow}{\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\tau {{\Vert}\cdot {\Vert}}_{1,\alpha }}\left({D}_{h}^{1}u+{D}_{h}^{2}w+\tau \overline{v}\right)$

9: $\lambda {\leftarrow}{\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\tau {S}_{{f}_{h}}}\left({K}_{h}u+\tau \overline{\lambda }\right)$ $\lambda {\leftarrow}{\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\tau {S}_{{f}_{h}}}\left({K}_{h}u+\tau \overline{\lambda }\right)$

10: Lagrange multiplier

11: $\overline{v}{\leftarrow}\overline{v}+\frac{1}{\tau }\left({D}_{h}^{1}u+{D}_{h}^{2}w-v\right)$ $\overline{v}{\leftarrow}\overline{v}+\frac{1}{\tau }\left({D}_{h}^{1}u+{D}_{h}^{2}w-v\right)$

12: $\overline{\lambda }{\leftarrow}\overline{\lambda }+\frac{1}{\tau }\left({K}_{h}u-\lambda \right)$ $\overline{\lambda }{\leftarrow}\overline{\lambda }+\frac{1}{\tau }\left({K}_{h}u-\lambda \right)$

13: until stopping criterion fulfiled

14: return u

15: end function

While ADMM has the advantage of converging for arbitrary stepsizes τ > 0 (see, e.g. [25]), the main drawback is often considered the linear update step which amounts to solving a linear equation (or, alternatively, a least-squares problem) which can be computationally expensive. The latter can be avoided, for instance, with preconditioning techniques [43, 74]. Denoting again by $\mathcal{K}=\left(\begin{matrix}\hfill {D}_{h}^{1}\hfill & \hfill {D}_{h}^{2}\hfill \\ \hfill {K}_{h}\hfill & \hfill 0\hfill \end{matrix}\right)$ , the linear solution step amounts to solving ${\mathcal{K}}^{{\ast}}\mathcal{K}\left(\begin{matrix}\hfill u\hfill \\ \hfill w\hfill \end{matrix}\right)={\mathcal{K}}^{{\ast}}\left(\begin{matrix}\hfill v-\tau \overline{v}\hfill \\ \hfill \lambda -\tau \bar{\lambda }\hfill \end{matrix}\right)$ . Introducing the additional variables (u', w') ∈ U_h × W_h as well as the constraint $\left({u}^{\prime },{w}^{\prime }\right)={\left(\rho \enspace \mathrm{i}\mathrm{d}-{\mathcal{K}}^{{\ast}}\mathcal{K}\right)}^{1/2}\left(u,w\right)$ for $\rho { >}{{\Vert}\mathcal{K}{\Vert}}^{2}$ , we can consider the problem

$\begin{align}\hfill & \hfill \underset{\left(u,w,{u}^{\prime },{w}^{\prime },v,\lambda \right)\in {\left({U}_{h}{\times}{W}_{h}\right)}^{2}{\times}{V}_{h}{\times}{Y}_{h}}{\mathrm{min}}\enspace {S}_{{f}_{h}}\left(\lambda \right)+{{\Vert}v{\Vert}}_{1,\alpha }\\ \hfill & \hfill \text{subject}\;\text{to}\;\begin{cases}{\left(\rho \enspace \mathrm{i}\mathrm{d}-{\mathcal{K}}^{{\ast}}\mathcal{K}\right)}^{1/2}\left(u,w\right)=\left({u}^{\prime },{w}^{\prime }\right)\hfill \\ \mathcal{K}\left(u,w\right)=\left(v,\lambda \right)\hfill \end{cases}\end{align}$

which is equivalent to (80). The associated ADMM procedure, however, simplifies. In particular, the linear subproblem only involves ρ id whose solution is trivial. Also, the Lagrange multipliers of the additional constraint are always zero within the iteration and the evaluation of the square root ${\left(\rho \enspace \mathrm{i}\mathrm{d}-{\mathcal{K}}^{{\ast}}\mathcal{K}\right)}^{1/2}$ can be avoided. This leads to the linear subproblem of algorithm 4 being replaced by the linear update step

$\begin{equation*}\left(\begin{matrix}\hfill u\hfill \\ \hfill w\hfill \end{matrix}\right){\leftarrow}\left(\begin{matrix}\hfill u+\frac{1}{\rho }\left({K}_{h}^{{\ast}}\left(\lambda -\tau \overline{\lambda }-{K}_{h}u\right)+{\left({D}_{h}^{1}\right)}^{{\ast}}\left(v-\tau \overline{v}-{D}_{h}^{1}u-{D}_{h}^{2}w\right)\right)\hfill \\ \hfill w+\frac{1}{\rho }{\left({D}_{h}^{2}\right)}^{{\ast}}\left(v-\tau \overline{v}-{D}_{h}^{1}u-{D}_{h}^{2}w\right)\hfill \end{matrix}\right).\end{equation*}$

Also, the procedure then requires, in each iteration, only one evaluation of K_h, ${D}_{h}^{1}$ , ${D}_{h}^{2}$ and their respective adjoints as well as the evaluation of proximal mappings, such that the computational effort is comparable to algorithm 1. As a special variant of the general ADMM algorithm, the above preconditioned version converges for τ > 0 if $\rho { >}{{\Vert}\mathcal{K}{\Vert}}^{2}$ is satisfied. Thus, an estimate for ${\Vert}\mathcal{K}{\Vert}$ is required which can, e.g., be obtained by lemma 6.20 (also confer the concrete examples in subsection 6.2). While this is the most common preconditioning strategy for ADMM, there are many other possibilities for transforming the original linear subproblem into a simpler one such that, e.g., the preconditioned problem amounts to the application of one or more steps of a symmetric Gauss–Seidel iteration or a symmetric successive over-relaxation (SSOR) procedure [43].

Another class of methods for solving (65) is given by the Douglas–Rachford iteration [80, 130], which is an iterative procedure for solving monotone inclusion problems of the type

$\begin{equation*}0\in Az+Bz\end{equation*}$

in Hilbert space, where A, B are maximally monotone operators. It proceeds as follows:

$\begin{equation*}\begin{cases}^{k+1}\hfill & ={\left(\mathrm{i}\mathrm{d}+\sigma A\right)}^{-1}\left({\overline{z}}^{k}\right),\hfill \\ {\overline{z}}^{k+1}\hfill & ={\overline{z}}^{k}+{\left(\mathrm{i}\mathrm{d}+\sigma B\right)}^{-1}\left(2{z}^{k+1}-{\overline{z}}^{k}\right)-{z}^{k+1},\hfill \end{cases}\end{equation*}$

where σ > 0 is a stepsize parameter. As only the resolvent operators (id + σA)⁻¹ and (id + σB)⁻¹ are involved, the Douglas–Rachford iteration is also considered an implicit scheme. In the context of optimisation problems, the operators A and B are commonly chosen based on first-order optimality conditions, which are subgradient inclusions [45, 89]. Here, we choose the saddle-point formulation (73) and the associated optimality conditions:

$\begin{equation*}\left(\begin{matrix}\hfill 0\hfill \\ \hfill 0\hfill \\ \hfill 0\hfill \\ \hfill 0\hfill \end{matrix}\right)\in \left(\begin{matrix}\hfill {\left({D}_{h}^{1}\right)}^{{\ast}}v+{K}_{h}^{{\ast}}\lambda \hfill \\ \hfill {\left({D}_{h}^{2}\right)}^{{\ast}}v\hfill \\ \hfill -{D}_{h}^{1}u-{D}_{h}^{2}w\hfill \\ \hfill -{K}_{h}u\hfill \end{matrix}\right)+\left(\begin{matrix}\hfill 0\hfill \\ \hfill 0\hfill \\ \hfill \partial {\mathcal{I}}_{\left\{{{\Vert}\cdot {\Vert}}_{\infty ,{\alpha }^{-1}}{\leqslant}1\right\}}\left(v\right)\hfill \\ \hfill \partial {S}_{{f}_{h}}^{{\ast}}\left(\lambda \right)\hfill \end{matrix}\right).\end{equation*}$

For instance, choosing A and B as the first and second operator in the above splitting, respectively, leads to the iteration outlined in algorithm 5: Indeed, in terms of x = (u, w), y = (v, λ) and $\mathcal{K}=\left(\begin{matrix}\hfill {D}_{h}^{1}\hfill & \hfill {D}_{h}^{2}\hfill \\ \hfill {K}_{h}\hfill & \hfill 0\hfill \end{matrix}\right)$ , the resolvent for the linear operator A corresponds to solving the linear system

$\begin{equation*}\begin{cases}x+\sigma {\mathcal{K}}^{{\ast}}y\hfill & =\overline{x},\hfill \\ y-\sigma \mathcal{K}x\hfill & =\bar{y},\hfill \end{cases}{\Leftrightarrow}\begin{cases}\left(\mathrm{i}\mathrm{d}+{\sigma }^{2}{\mathcal{K}}^{{\ast}}\mathcal{K}\right)x\hfill & =\overline{x}-\sigma {\mathcal{K}}^{{\ast}}\bar{y},\hfill \\ y\hfill & =\bar{y}+\sigma \mathcal{K}x,\hfill \end{cases}\end{equation*}$

which is reflected by the linear subproblem and dual update in algorithm 5. The resolvent for B further corresponds to the application of proximal mappings, also see proposition 6.11, where the involved proximal operators are the same as for the primal-dual iteration in algorithm 1. The iteration can be shown to converge for each σ > 0, see, e.g., [45].

Algorithm 5. Douglas–Rachford scheme for the numerical solution of (73).

1: function Tikhonov-DR(K_h, f_h, α)

2: $\left(u,w\right){\leftarrow}\left(0,0\right),\left(v,\lambda \right){\leftarrow}\left(0,0\right),\left(\overline{v},\overline{\lambda }\right){\leftarrow}\left(0,0\right)$ $\left(u,w\right){\leftarrow}\left(0,0\right),\left(v,\lambda \right){\leftarrow}\left(0,0\right),\left(\overline{v},\overline{\lambda }\right){\leftarrow}\left(0,0\right)$

3: Choose σ > 0

4: repeat

5: Linear subproblem

6: $\left(u,w\right){\leftarrow}{\left(\begin{matrix}\hfill \mathrm{i}\mathrm{d}+{\sigma }^{2}\left({K}_{h}^{{\ast}}{K}_{h}+{\left({D}_{h}^{1}\right)}^{{\ast}}{D}_{h}^{1}\right)\hfill & \hfill {\sigma }^{2}{\left({D}_{h}^{1}\right)}^{{\ast}}{D}_{h}^{2}\hfill \\ {\sigma }^{2}{\left({D}_{h}^{2}\right)}^{{\ast}}{D}_{h}^{1}\hfill & \hfill \mathrm{i}\mathrm{d}+{\sigma }^{2}{\left({D}_{h}^{2}\right)}^{{\ast}}{D}_{h}^{2}\hfill \end{matrix}\right)}^{-1}\cdot \left(\begin{matrix}\hfill u-\sigma \left({K}_{h}^{{\ast}}\overline{\lambda }+{\left({D}_{h}^{1}\right)}^{{\ast}}\overline{v}\right)\hfill \\ w-\sigma {\left({D}_{h}^{2}\right)}^{{\ast}}\overline{v}\hfill \end{matrix}\right)$ $\left(u,w\right){\leftarrow}{\left(\begin{matrix}\hfill \mathrm{i}\mathrm{d}+{\sigma }^{2}\left({K}_{h}^{{\ast}}{K}_{h}+{\left({D}_{h}^{1}\right)}^{{\ast}}{D}_{h}^{1}\right)\hfill & \hfill {\sigma }^{2}{\left({D}_{h}^{1}\right)}^{{\ast}}{D}_{h}^{2}\hfill \\ {\sigma }^{2}{\left({D}_{h}^{2}\right)}^{{\ast}}{D}_{h}^{1}\hfill & \hfill \mathrm{i}\mathrm{d}+{\sigma }^{2}{\left({D}_{h}^{2}\right)}^{{\ast}}{D}_{h}^{2}\hfill \end{matrix}\right)}^{-1}\cdot \left(\begin{matrix}\hfill u-\sigma \left({K}_{h}^{{\ast}}\overline{\lambda }+{\left({D}_{h}^{1}\right)}^{{\ast}}\overline{v}\right)\hfill \\ w-\sigma {\left({D}_{h}^{2}\right)}^{{\ast}}\overline{v}\hfill \end{matrix}\right)$

7: Dual update

8: $v{\leftarrow}\overline{v}+\sigma \left({D}_{h}^{1}u+{D}_{h}^{2}w\right)$ $v{\leftarrow}\overline{v}+\sigma \left({D}_{h}^{1}u+{D}_{h}^{2}w\right)$

9: $\lambda {\leftarrow}\overline{\lambda }+\sigma {K}_{h}u$ $\lambda {\leftarrow}\overline{\lambda }+\sigma {K}_{h}u$

10: Proximal update

11: $\overline{v}{\leftarrow}\overline{v}+{\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{j}}_{\left\{{\Vert}\cdot {{\Vert}}_{\infty ,{\alpha }^{-1}}{\leqslant}1\right\}}\left(2v-\overline{v}\right)-v$ $\overline{v}{\leftarrow}\overline{v}+{\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{j}}_{\left\{{\Vert}\cdot {{\Vert}}_{\infty ,{\alpha }^{-1}}{\leqslant}1\right\}}\left(2v-\overline{v}\right)-v$

12: $\overline{\lambda }{\leftarrow}\overline{\lambda }+{\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\sigma {S}_{{f}_{h}}^{{\ast}}}\left(2\lambda -\overline{\lambda }\right)-\lambda$ $\overline{\lambda }{\leftarrow}\overline{\lambda }+{\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\sigma {S}_{{f}_{h}}^{{\ast}}}\left(2\lambda -\overline{\lambda }\right)-\lambda$

13: until stopping criterion fulfiled

14: return u

15: end function

As for ADMM, the linear subproblem in algorithm 5 can be avoided by preconditioning. Basically, for the above Douglas–Rachford iteration, the same types of preconditioners can be applied as for ADMM, ranging from the Richardson-type preconditioner that was discussed in detail before to symmetric Gauss–Seidel and SSOR-type preconditioners [40]. In particular, the potential of the latter for TGV-regularised imaging problems was shown in [41].

While all three discussed classes of algorithms, i.e., the primal-dual method, ADMM, and the Douglas–Rachford iteration can in principle be used to solve the discrete Tikhonov minimisation problem we are interested in, experience shows that the primal-dual method is usually easy to implement as it only involves forward evaluations of the involved linear operators and simple proximal operators, and thus suitable for prototyping. It needs, however, norm estimates for the forward operator and a possible rescaling. ADMM is, in turn, a very popular algorithm whose advantage lies, for instance, in its unconditional convergence (the parameter τ > 0 can be chosen arbitrarily). Also, in comparison to the primal-dual method, ADMM is observed to admit, in relevant cases, a more stable convergence behaviour, meaning less oscillations and faster objective functional reduction in the first iteration steps. However, ADMM requires the solution of a linear subproblem in each iteration step which might be expensive or call for preconditioning. The same applies to the Douglas–Rachford iteration which is also unconditionally convergent, comparably stable and usually involves the solution of a linear subproblem in each step. In contrast to ADMM it bases, however, on the same saddle-point formulation as the primal-dual methods such that translating a prototype primal-dual implementation into a more efficient Douglas–Rachford implementation with possible preconditioning is more immediate.

7. Applications in image processing and computer vision

7.1. Image denoising and deblurring

Image denoising is a simple yet heavily addressed problem in image processing (see for instance [127] for a review) as it is practically relevant by itself and, in addition, allows to investigate the effect of different smoothing and regularisation approaches independent of particular measurements setups or forward models. The standard formulation of variational denoising assumes Gaussian noise and, consequently, employs an L²-type data fidelity. Allowing for more general noise models, the denoising problem reads as

$\begin{equation*}\underset{u\in {L}^{p}\left({\Omega}\right)}{\mathrm{min}}{S}_{f}\left(u\right)+{\mathcal{R}}_{\alpha }\left(u\right),\end{equation*}$

where we assume S_f : L^p(Ω) → [0, ∞], with p ∈ [1, ∞], to be proper, convex, lower semi-continuous and coercive, and ${\mathcal{R}}_{\alpha }$ to be an appropriate regularisation functional. This setting covers, for instance, Gaussian noise (with ${S}_{f}\left(u\right)=\frac{1}{2}{\Vert}u-f{{\Vert}}_{2}^{2}$ ), impulse noise (with S_f(u) = ||u − f||₁) and Poisson noise (with S_f(u) = KL(u, f)). With first- or higher-order TV regularisation, additive or infimal-convolution-based combinations thereof, or TGV regularisation, the denoising problem is well-posed for any of the above choices of S_f. For ${S}_{f}\left(u\right)=\frac{1}{q}{\Vert}u-f{{\Vert}}_{q}^{q}$ and q > 1, also regularisation with ${\mathcal{R}}_{\alpha }\left(u\right)=\alpha {\Vert}{\Delta}u{{\Vert}}_{\mathcal{M}}$ is well-posed and figure 11 summarises, once again, the result of these different approaches for q = 2 and Gaussian noise on a piecewise affine test image. It further emphasises again the appropriateness of ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{2}$ as a regulariser for piecewise smooth images.

**Figure 11.** Comparison of different first- and second-order image models for variational image denoising with L²-discrepancy. Left column: the original image (top) and noisy input image (bottom). Columns 2–4: results for variational denoising with different regularisation terms. The parameters were manually optimised for best PSNR (see figures 1–4 and 6 for the PSNR values).
Download figure:
Standard image High-resolution image

In order to visualise the difference between different orders of TGV regularisation, figure 12 considers a piecewise smooth image corrupted by Gaussian noise and compares TGV regularisation with orders k ∈ {2, 3}. It can be seen there that third-order TGV yields a better approximation of smooth structures, resulting in an improved PSNR, while the second-order TGV regularised image has small defects resulting from a piecewise linear approximation of the data.

Another problem class is image deblurring which can be considered as a standard test problem for the ill-posed inversion of linear operators in imaging. Pick a blurring kernel k ∈ L^∞(Ω₀) with bounded domains Ω₀, Ω' ⊂ R^d such that Ω' − Ω₀ ⊂ Ω. Then, K : L¹(Ω) → L²(Ω') given by

$\begin{equation*}\left(Ku\right)\left(x\right)={\int }_{{{\Omega}}_{0}}u\left(x-y\right)k\left(y\right)\enspace \mathrm{d}y,\quad x\in {{\Omega}}^{\prime }\end{equation*}$

is well-defined, linear and continuous. Consequently, by theorems 2.11, 2.14 and proposition 5.17,

$\begin{equation*}\underset{u\in {L}^{p}\left({\Omega}\right)}{\mathrm{min}}\enspace \frac{1}{2}{\int }_{{{\Omega}}^{\prime }}{\left\vert \left(u{\ast}k\right)\left(x\right)-f\left(x\right)\right\vert }^{2}\enspace \mathrm{d}x+{\mathcal{R}}_{\alpha }\left(u\right),\end{equation*}$

for 1 < p ⩽ d/(d − 1) and ${\mathcal{R}}_{\alpha }\in \left\{\alpha \mathrm{T}\mathrm{V},{\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{2}\right\}$ admits a solution that stably depends on the data f ∈ L²(Ω'), which we assume to be a noise-contaminated image blurred by the convolution operator K. A numerical solution can again be obtained with the framework described in section 6 and a comparison of the two choices of ${\mathcal{R}}_{\alpha }$ for a test image can be found in figure 13. We can observe that both TV and ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{2}$ are able to remove noise and blur from the image, however, the TV reconstruction suffers from staircasing artefacts which are not present with ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{2}$ .

**Figure 13.** Deconvolution example. The original image u_orig [92] has been blurred and contaminated by noise resulting in f. The images u_TV and ${u}_{{\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{2}}$ are the regularised solutions recovered from f. Reproduced from [92]. CC BY 2.0.
Download figure:
Standard image High-resolution image

**Figure 13.** Deconvolution example. The original image u_orig [92] has been blurred and contaminated by noise resulting in f. The images u_TV and ${u}_{{\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{2}}$ are the regularised solutions recovered from f. Reproduced from [92]. CC BY 2.0.
Download figure:
Standard image High-resolution image

7.2. Compressed sensing

The next problem we would like to discuss is compressive sampling with total variation and total generalised variation [27]. More precisely, we aim at reconstructing a single-channel image from 'single-pixel camera' data [78], an inverse problem with finite-dimensional data space. Here, an image is not observed directly but only the accumulated grey values over finitely many random pixel patterns are sequentially measured by one sensor, the 'single pixel'. This can be modelled as follows. For a bounded Lipschitz image domain Ω ⊂ R², let the measurable sets E₁, ..., E_M ⊂ Ω be the collection of random patterns where each E_m is associated with the mth measurement. The image u is then determined by solving the inverse problem

$\begin{equation*}Ku=f\quad \text{where}\quad {\left(Ku\right)}_{m}={\int }_{{E}_{m}}u\enspace \mathrm{d}x\quad \text{for}\quad m=1,\dots ,M\end{equation*}$

and f ∈ R^M is the measurement vector, i.e., each f_m is the output of the sensor for the pattern E_m. As the set of u solving this inverse problem is an affine space with finite codimension, the compressive imaging approach assumes that the image u is sparse in a certain representation which is usually translated into the discrete total variation TV(u) being small. A way to reconstruct u from f is then to solve

$\begin{equation}\underset{u\in \mathrm{B}\mathrm{V}\left({\Omega}\right)}{\mathrm{min}}\enspace {S}_{f}\left(Ku\right)+\mathrm{T}\mathrm{V}\left(u\right),\quad {S}_{f}\left(v\right)={\mathcal{I}}_{\left\{f\right\}}\left(v\right).\end{equation} \tag{ 81 }$

In this context, also higher-order regularisers may be used as sparsity constraint. For instance, in [27], total generalised variation of order 2 has numerically been tested:

$\begin{equation}\underset{u\in \mathrm{B}\mathrm{V}\left({\Omega}\right)}{\mathrm{min}}\enspace {S}_{f}\left(Ku\right)+{\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{2}\left(u\right),\quad {S}_{f}\left(v\right)={\mathcal{I}}_{\left\{f\right\}}\left(v\right).\end{equation} \tag{ 82 }$

Figure 14 shows example reconstructions for real data according to discretised versions of (81) and (82). As supported by the theory of compressed sensing [52, 53], the image can essentially be recovered from a few single-pixel measurements. Here, TGV-minimisation helps to reconstruct smooth regions of the image such that in comparison to TV-minimisation, more features can still be recognised, in particular, when reconstructing from very few samples. Once again, staircasing artefacts are clearly visible for the TV-based reconstructions, a fact that recently was made rigorous in [26, 29].

7.3. Optical flow and stereo estimation

Another important fundamental problem in image processing and computer vision is the determination of the optical flow [110] of an image sequence. Here, we consider this task for two consecutive frames f₀ and f₁ in a sequence of images. This is often modelled by minimising a possibly joint discrepancy ${S}_{{f}_{0},{f}_{1}}\left(u\left(0\right),u\left(1\right)\right)$ for u : [0, 1] × Ω → R subject to the optical flow constraint $\frac{\partial u}{\partial t}+\nabla u\cdot v=0$ , see, for instance, [24]. Here, v : [0, 1] × Ω → R^d is the optical flow field that shall be determined. In order to deal with ill-posedness, ambiguities as well as occlusion, the vector field v needs to be regularised by a penalty term. This leads to the PDE-constrained problem

$\begin{equation*}\underset{u,v}{\mathrm{min}}\enspace {S}_{{f}_{0},{f}_{1}}\left(u\left(0\right),u\left(1\right)\right)+{\mathcal{R}}_{\alpha }\left(v\right)\quad \text{subject}\;\;\text{to}\quad \;\;\frac{\partial u}{\partial t}+\nabla u\cdot v=0,\end{equation*}$

where ${\mathcal{R}}_{\alpha }$ is a suitable convex regulariser for vector field sequences. Usually, ${S}_{{f}_{0},{f}_{1}}$ is chosen such that the initial condition u(0) is fixed to f₀, for instance, ${S}_{{f}_{0},{f}_{1}}\left({u}_{0},{u}_{1}\right)={\mathcal{I}}_{\left\{{f}_{0}\right\}}\left({u}_{0}\right)+\frac{1}{2}{{\Vert}{u}_{1}-{f}_{1}{\Vert}}_{2}^{2}$ , see [24, 64, 103, 118].

In many approaches, this problem is reformulated to a correspondence problem. This means, on the one hand, replacing the optical flow constraint by the displacement introduced by a vector field v₀ : Ω → R², i.e., u(0) = u₀ and u(1) = u₀ ◦ (id + v₀). The image u₀ : Ω → R is either prespecified or subject to optimisation. For instance, choosing again ${S}_{{f}_{0},{f}_{1}}\left({u}_{0},{u}_{1}\right)={\mathcal{I}}_{\left\{{f}_{0}\right\}}\left({u}_{0}\right)+\frac{1}{2}{{\Vert}{u}_{1}-{f}_{1}{\Vert}}_{2}^{2}$ leads to the classical correspondence problem

$\begin{equation*}\underset{{v}_{0}}{\mathrm{min}}\enspace \frac{1}{2}{{\Vert}{f}_{0}\enspace {\circ}\enspace \left(\mathrm{i}\mathrm{d}+{v}_{0}\right)-{f}_{1}{\Vert}}_{2}^{2}+{\mathcal{R}}_{\alpha }\left({v}_{0}\right),\end{equation*}$

see, for instance, [110], which uses the square of the H¹-seminorm as a regulariser. On the other hand, other approaches have been considered for the discrepancy (and regularisation), see [47, 196]. In this context, a popular concept is the census transform [195] that describes the local relative behaviour of an image and is invariant to brightness changes. For an image f : Ω → R, measurable patch Ω' ⊂ R² and threshold ɛ > 0, it is defined as

$\begin{equation*}{C}_{f}:{\Omega}{\times}{{\Omega}}^{\prime }\to \left\{-1,0,1\right\},\quad {C}_{f}\left(x,y\right)=\begin{cases}\mathrm{s}\mathrm{g}\mathrm{n}\left(f\left(x+y\right)-f\left(x\right)\right)\quad \hfill & \text{if}\enspace x,x+y\in {\Omega}\enspace \text{and}\hfill \\ \quad \hfill & \enspace \enspace \left\vert f\left(x+y\right)-f\left(x\right)\right\vert { >}\varepsilon ,\hfill \\ 0\quad \hfill & \text{else}.\hfill \end{cases}\end{equation*}$

Here, one usually sets u₀ = f₀ and u₁ = f₁ such that the discrepancy only depends on the vector field v₀, such as, for instance,

$\begin{equation*}{S}_{{f}_{0},{f}_{1}}\left({v}_{0}\right)={\int }_{{\Omega}}{\int }_{{{\Omega}}^{\prime }}\mathrm{min}\left(1,\left\vert {C}_{{f}_{0}}\left(x,y\right)-{C}_{{f}_{1}\enspace {\circ}\enspace \left(\mathrm{i}\mathrm{d}+{v}_{0}\right)}\left(x,y\right)\right\vert \right)\enspace \mathrm{d}y\enspace \mathrm{d}x,\end{equation*}$

leading to the optical-flow problem

$\begin{equation*}\underset{{v}_{0}}{\mathrm{min}}\enspace {S}_{{f}_{0},{f}_{1}}\left({v}_{0}\right)+{\mathcal{R}}_{\alpha }\left({v}_{0}\right),\end{equation*}$

see, for instance, [137, 188]. A closely related problem is stereo estimation which can also be modelled as a correspondence problem. In this context, f₀ and f₁ constitute a stereo image pair, for instance, f₀ being the left image and f₁ being the right image. The stereo information is then usually reflected by the disparity which describes the displacement of the right image with respect to the left image. This corresponds to setting the vertical component of the displacement field v₀ to zero, for instance, ${\left({v}_{0}\right)}_{2}=0$ . Census-transform based discrepancies are also used for this task [152], leading to the stereo-estimation model

$\begin{equation}\underset{{w}_{0}}{\mathrm{min}}\enspace {S}_{{f}_{0},{f}_{1}}\left(\left({w}_{0},0\right)\right)+{\mathcal{R}}_{\alpha }\left({w}_{0}\right)\end{equation} \tag{ 83 }$

with a suitable convex regulariser ${\mathcal{R}}_{\alpha }$ for scalar disparity images.

Both optical flow and stereo estimation are non-convex due to the non-convex data terms and require dedicated solution techniques. One possible approach is to smooth the discrepancy functional such that it becomes (twice) continuously differentiable, and approximate it, for each x ∈ Ω, by either first or second-order Taylor expansion. For the latter case, if one also projects the pointwise Hessian to the positive semi-definite cone, one arrives at the convex problem

$\begin{equation*}\underset{v}{\mathrm{min}}\enspace {\int }_{{\Omega}}S\left({v}_{0}\right)+\nabla S\left({v}_{0}\right)\cdot \left(v-{v}_{0}\right)+\frac{1}{2}\enspace {\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{j}}_{{S}^{+}}\left({\nabla }^{2}S\left({v}_{0}\right)\right)\left(v-{v}_{0}\right)\cdot \left(v-{v}_{0}\right)\enspace \mathrm{d}x+\alpha \mathcal{R}\left(v\right),\end{equation*}$

where v₀ is the base vector field for the Taylor expansion and ${\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{j}}_{{S}^{+}}:{S}^{2{\times}2}\to {S}_{+}^{2{\times}2}$ denotes the orthogonal projection to the cone of positive semi-definite matrices ${S}_{+}^{2{\times}2}$ . Besides classical regularisers such as the H¹-seminorm, the total variation has been chosen [193], i.e., ${\mathcal{R}}_{\alpha }=\alpha \mathrm{T}\mathrm{V}$ , which allows the identification of jumps in the displacement field associated with object boundaries. The displacement field is, however, piecewise smooth such that ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{2}$ turns out to be advantageous. Further improvements can be achieved by non-local total generalised variation NLTGV², see [154], leading to sharper and more accurate motion boundaries, see figure 15. For stereo estimation, a similar approach using first-order Taylor expansion and image-driven total generalised variation ${\mathrm{I}\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{2}$ also yields very accurate disparity images [152].

**Figure 15.** Example for higher-order approaches for optical flow determination. (a) An optical flow field obtained on a sample dataset from the Middlebury benchmark [11] using a ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{2}$ regulariser. (b) An enlarged detail of (a). (c) The optical flow field obtained by a NLTGV² regulariser. (d) An enlarged detail of (c). Images taken from [154]. Reprinted by permission from Springer Nature.
Download figure:
Standard image High-resolution image

A different concept for solving the non-convex optical flow/stereo estimation problem is functional lifting [4, 55]. For the stereo estimation problem, this means to recover the characteristic function of the subgraph of the disparity image, i.e., ${\chi }_{\left\{t{\leqslant}{w}_{0}\right\}}$ . Assume that the discrepancy for the disparity w₀ can be written in integral form, i.e., ${S}_{{f}_{0},{f}_{1}}\left({w}_{0}\right)={\int }_{{\Omega}}g\left(x,{w}_{0}\left(x\right)\right)\enspace \mathrm{d}x$ with a suitable g : Ω × R → R that is possibly non-convex with respect to the second argument. If w₀ is of bounded variation, then ${\chi }_{\left\{t{\leqslant}{w}_{0}\right\}}$ is also of bounded variation and the weak derivative with respect to x and t, respectively, are Radon measures. Denoting by v_x and v_t the respective components of a vector, i.e., v = (v_x, v_t) ∈ R² × R, these derivatives satisfy the identity $\frac{\partial }{\partial t}{\chi }_{\left\{t{\leqslant}{w}_{0}\right\}}={\left(\frac{\nabla {\chi }_{\left\{t{\leqslant}{w}_{0}\right\}}}{\left\vert \nabla {\chi }_{\left\{t{\leqslant}{w}_{0}\right\}}\right\vert }\right)}_{t}\left\vert \nabla {\chi }_{\left\{t{\leqslant}{w}_{0}\right\}}\right\vert$ as well as ${\nabla }_{x}{\chi }_{\left\{t{\leqslant}{w}_{0}\right\}}={\left(\frac{\nabla {\chi }_{\left\{t{\leqslant}{w}_{0}\right\}}}{\left\vert \nabla {\chi }_{\left\{t{\leqslant}{w}_{0}\right\}}\right\vert }\right)}_{x}\left\vert \nabla {\chi }_{\left\{t{\leqslant}{w}_{0}\right\}}\right\vert$ . The discrepancy term can then be written in the form

$\begin{equation*}{S}_{{f}_{0},{f}_{1}}\left(\left({w}_{0},0\right)\right)={\int }_{{\Omega}{\times}\mathbf{R}}g\enspace \mathrm{d}\left\vert \frac{\partial }{\partial t}{\chi }_{\left\{t{\leqslant}{w}_{0}\right\}}\right\vert \end{equation*}$

which is convex with respect to ${\chi }_{\left\{t{\leqslant}{w}_{0}\right\}}$ . In many cases, regularisation functionals can also be written in terms of ${\chi }_{\left\{t{\leqslant}{w}_{0}\right\}}$ , for instance, by the coarea formula,

$\begin{equation*}\mathrm{T}\mathrm{V}\left({w}_{0}\right)={\int }_{{\Omega}{\times}\mathbf{R}}\enspace \mathrm{d}\left\vert {\nabla }_{x}{\chi }_{\left\{t{\leqslant}{w}_{0}\right\}}\right\vert ,\end{equation*}$

which is again convex with respect to ${\chi }_{\left\{t{\leqslant}{w}_{0}\right\}}$ . As the set of all ${\chi }_{\left\{t{\leqslant}{w}_{0}\right\}}$ is still non-convex, this constraint is usually relaxed to a convex set, for instance, to the conditions

$\begin{equation}u\in \mathrm{B}\mathrm{V}\left({\Omega}{\times}\mathbf{R}\right),\quad 0{\leqslant}u{\leqslant}1,\quad \underset{t\to - \infty }{\mathrm{lim}}u\left(t,\cdot \right)=\mathbf{1},\quad \underset{t\to \infty }{\mathrm{lim}}u\left(t,\cdot \right)=0,\end{equation} \tag{ 84 }$

where the limits have to be understood in a suitable sense. Then, the stereo problem (83) with total-variation regularisation can be relaxed to the convex problem

$\begin{equation*}\underset{u\in \mathrm{B}\mathrm{V}\left({\Omega}{\times}\mathbf{R}\right)}{\mathrm{min}}\enspace {\int }_{{\Omega}{\times}\mathbf{R}}g\enspace \mathrm{d}\left\vert \frac{\partial u}{\partial t}\right\vert +\alpha {\int }_{{\Omega}{\times}\mathbf{R}}\enspace \mathrm{d}\left\vert {\nabla }_{x}u\right\vert \quad \text{subject}\;\text{to}\;\left(84\right).\end{equation*}$

Then, optimal solutions u* for the above problem yield minimisers of the original problem when thresholded, i.e., for $s\in \left.\right]0,1\left[\right.$ , the function ${\chi }_{\left\{s{\leqslant}{u}^{{\ast}}\right\}}$ is the characteristic function of the subgraph of a w₀ that is optimal for (83) for the assumed discrepancy and total-variation regularisation [145].

Unfortunately, a straightforward adaptation of this strategy to higher-order total-variation-type regularisation functionals is not possible. For TGV², one can nevertheless benefit from the convexification approach. Considering the ${\mathrm{T}\mathrm{G}\mathrm{V}}_{2}^{\alpha }$ -regularised problem

$\begin{equation}\underset{{w}_{0}\in \mathrm{B}\mathrm{V}\left({\Omega}\right),\enspace w\in \mathrm{B}\mathrm{D}\left({\Omega}\right)}{\mathrm{min}}\enspace {\int }_{{\Omega}}g\left(x,{w}_{0}\left(x\right)\right)\enspace \mathrm{d}x+{\alpha }_{1}{\int }_{{\Omega}}\enspace \mathrm{d}\left\vert \nabla {w}_{0}-w\right\vert +{\alpha }_{0}{\int }_{{\Omega}}\enspace \mathrm{d}\left\vert \mathcal{E}w\right\vert ,\end{equation} \tag{ 85 }$

one sees that the problem is convex in w and minimisation with respect to w₀ can still be convexified by functional lifting. For fixed w, the latter leads to

$\begin{align*}\hfill & \underset{u\in \mathrm{B}\mathrm{V}\left({\Omega}{\times}\mathbf{R}\right)}{\mathrm{min}}\enspace {\int }_{{\Omega}}g\enspace \mathrm{d}\left\vert \frac{\partial u}{\partial t}\right\vert +{\alpha }_{1}\enspace \mathrm{sup}\enspace \left\{{\int }_{{\Omega}{\times}\mathbf{R}}u\enspace \mathrm{d}\mathrm{i}\mathrm{v}\enspace \varphi \enspace \mathrm{d}x\enspace \vert \enspace \varphi \in {\mathcal{C}}_{\mathrm{c}}^{\infty }\left({\Omega},{\mathbf{R}}^{2}{\times}\mathbf{R}\right),\right.\hfill \\ \hfill & \left.\quad \left\vert {\varphi }_{x}\left(x,t\right)\right\vert {\leqslant}1,{\varphi }_{x}\left(x,t\right)\cdot w\left(x\right){\leqslant}{\varphi }_{t}\left(x,t\right)\enspace \text{a.e.in}\enspace {\Omega}{\times}\mathbf{R}\right\}\quad \text{subject}\;\text{to}\;\left(84\right),\hfill \end{align*}$

which is again convex and whose solutions can again be thresholded to yield a ${w}_{0}^{{\ast}}$ that is optimal with respect to w₀ for a fixed w. Alternating minimisation then provides a robust solution strategy for (85) based on convex optimisation [155], see figure 16 for an example. In this context, algorithms realising functional lifting strategies for TV and TGV regularisation have recently further been refined, for instance, in order to lower the computational complexity associated with the additional space dimension introduced by the lifting, see, e.g. [135, 181].

**Figure 16.** Total-generalised-variation-regularised stereo estimation based on functional lifting and convex optimisation for an image pair of the KITTI dataset [91]. (a) The reference image. (b) The disparity image obtained with TGV²-regularisation [155]. Reprinted from [153] by permission from Springer Nature Customer Service Centre GmbH: Springer © 2013.
Download figure:
Standard image High-resolution image

7.4. Image and video decompression

Pixelwise representations of image or image sequence data require, on the one hand, a large amount of digital storage but contain, on the other hand, enough redundancy to enable compression. Indeed, most digitally stored images and image sequences, e.g., on cameras, mobile phones or the world-wide web are compressed. Commonly-used lossy compression standards such as JPEG, JPEG2000 for images and MPEG for image sequences, however, suffer from visual artefacts in decompressed data, especially for high compression rates.

Those artefacts result from errors in the compressed data due to quantisation, which is not accounted for in the decompression procedure. These errors, however, can be well described using the data that is available in the compressed file and in particular, precise bounds on the difference of the available data and the unknown, ground truth data can be obtained. This observation motivates a generic approach for an improved decompression of such compressed image or video data, which consists of minimising a regularisation functional subject to these error bounds, see for instance [5, 31, 200] for TV-based works in this context. Following this generic approach, we present here a TGV-based reconstruction method (see [33, 34]) that allows for a variational reconstruction of still images from compressed data that is directly applicable to the major image compression standards such as JPEG, JPEG2000 or the image compression layer of the DjVu document compression format [108]. A further extension of this model to the decompression of MPEG encoded video data will be addressed afterwards.

The underlying principle of a broad class of image and video compression standards, and in particular of JPEG and JPEG 2000 compression, is as follows: first, a linear transformation is used to transform the image data to a different representation where information that is more and less important for visual image quality is well separated. Then, a weighted quantisation of this data (according to its expected importance for visual image quality) is carried out and the quantised data (together with information that allows to obtain the quantisation accuracy) is stored. Thus, defining K to be the linear transformation used in the compression process and D to be a set of admissible, transformed image data that can be obtained using the information available in the compressed file, decompression amounts to finding an image u such that Ku ∈ D. Using the TGV functional to regularise this compression procedure and considering colour images u : Ω → R³, we arrive at the following minimisation problem:

$\begin{equation}\underset{u\in {L}^{2}\left({\Omega},{\mathbf{R}}^{3}\right)}{\mathrm{min}}\enspace {\mathcal{I}}_{{U}_{D}}\left(u\right)+{\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k}\left(u\right),\quad {U}_{D}=\left\{u\in {L}^{2}\left({\Omega},{\mathbf{R}}^{3}\right)\enspace \vert \enspace Ku\in D\right\}\end{equation} \tag{ 86 }$

where K : L²(Ω, R³) → ℓ² is an analysis operator related to a Riesz basis of L²(Ω, R³), and a Frobenius-norm-type coupling of the colour channels is used in TGV, see subsection 5.3. The coefficient dataset D ⊂ ℓ² reflects interval restrictions on the coefficients, i.e., is defined as D = {v ∈ ℓ²|v_n ∈ J_n for all n ∈ N} for {J_n} a family of closed intervals. In case D is bounded, well-posedness of this approach can be obtained via a direct extension of proposition 5.17 to R³-valued functions, which in particular requires a multi-channel version of the Poincaré inequality for TGV as in proposition 5.15. The latter can straightforwardly be obtained by equivalence of norms in finite dimensions, see for instance [27, 33]. Beyond that, existence of a solution to (86) can be guaranteed also in case of a non-coercive discrepancy when arbitrarily many of the intervals J_n are unbounded, provided that only finitely many of them are half-bounded, i.e., are the form ${J}_{n}= \left.\right]- \infty ,{c}_{n} \left.\right]$ or ${J}_{n}=\left[\right.{c}_{n},\infty \left[\right.$ for c_n ∈ R, see [33]. In compression, half-bounded intervals would correspond to knowing only the sign but not the precision of the respective coefficient, a situation which does not occur in JPEG, JPEG2000 and DjVu. Thus, in all relevant applications, all intervals are either bounded or all of R, and hence, solutions exist. Further, under the assumption that all but finitely many intervals have a width that is uniformly bounded from below, again an assumption which holds true in all anticipated applications, optimality conditions for (86) can be obtained.

In the application to JPEG decompression, colour images are processed in the YCbCr colour space and the basis transformation operator K corresponds to a colour subsampling followed by a block- and channel-wise discrete cosine transformation, which together can be expressed as Riesz-basis transform. The interval sequence {J_n} can be obtained using a quantisation matrix that is available in the encoded file and each interval J_n is bounded.

In the application to JPEG2000 decompression, again the YCbCr colour space is used and K realises a colour-component-wise biorthogonal wavelet transform using Le Gall 5/3 or CDF 9/7 wavelets as defined in [65, tables 6.1 and 6.2]. Obtaining bounds on the precision of the wavelet coefficients is more involved than with JPEG (see [33, section 4.3]), but can be done by studying the bit truncation scheme of JPEG2000 in detail. As opposed to JPEG, however, the intervals J_n might either be bounded or unbounded.

A third application of the model (86) is a variational decompression of the image layers of a DjVu compressed document. DjVu [100] is a storage format for digital documents. It encodes document pages via a separation into fore- and background layers as well as a binary switching mask, where the former are encoded using a lossy, transform-based compression and the latter using a dictionary-based compression. While the binary switching mask typically encodes fine details such as written text, the fore- and background layer encode image data, which again suffers from compression artefacts that can be reduced via variational decompression. Here, the extraction of the relevant coefficient data together with error bounds has to account for the particular features of the DjVu compression standard (we refer to [108] and its supplementary material for a detailed description and software that extracts the relevant data from DjVu compressed files), but the overall model for the image layers is again similar to the one of JPEG and JPEG2000 decompression. In particular, encoding of the fore- and background layer can be modelled with the operator K, in this case corresponding to a colour-component-wise wavelet transformation using the Dubuc–Deslauriers–Lemire (DDL) (4, 4) wavelets [75], and the data intervals J_n, which are again either bounded or all of R.

In all of the above applications, a numerical solution of the corresponding particular instance of the minimisation problem (86) can be obtained using the primal-dual framework [34] as described in section 6 (see [34] for details). We refer to figure 17 for exemplary results using second-order TGV regularisation. Regarding the implementation, relevant differences arise depending on whether the projection onto the dataset U_D can be carried out explicitly or not, the latter requiring a dualisation of this constraint and an additional dual variable. Only in the application to JPEG decompression, this projection is explicit due to orthonormality of the cosine transform and the particular structure of the colour subsampling operator. This has the particular advantage that, at any iteration of the algorithm, the solution is feasible and one can, for instance, apply early stopping techniques to obtain already quite improved decompressed images in a computationally cheap way.

**Figure 17.** Example of variational image decompression. Standard (left column) and TGV-based (right column) decompression for a JPEG image (top row) compressed to 0.15 bits-per-pixel (bpp), a JPEG2000 image (middle row) compressed to 0.3 bpp, and a DjVu-compressed document page (bottom row) with close-ups. Results from [34] (rows 1 and 2) and [108] (bottom row). Reproduced with permission from [90]. Copyright © 2018 Society for Industrial and Applied Mathematics.
Download figure:
Standard image High-resolution image

Variational MPEG decompression. The MPEG video compression standard builds on JPEG compression for storing frame-wise image data, but incorporates additional motion prediction and correction steps which can significantly reduce storage size of video data. In MPEG-2 compression, which is a tractable blueprint for the MPEG compression family, video data is processed as subsequent groups of pictures (typically 12–15 frames) which can be handled separately. In each group of pictures, different frame types (I, P and B frames) are defined, and, depending to the frame type, image data is stored by using motion prediction and correction followed by a JPEG-type compression of the corrected data. Similar to JPEG compression, colour images are processed in the YCbCr colour space and additional subsampling of colour components is allowed.

While these are the main features of a typical MPEG video encoder, as usual for most compression standards, the MPEG standard defines the decompression procedure rather than compression. Hence, since compression might differ for different encoders, we build a variational model for MPEG decompression that works with a decoding operator (see [35] for more details on MPEG and the model): using the information (in particular motion correction vectors and quantisation tables) that is stored in the MPEG compressed files, we can define a linear operator K that maps encoded, motion corrected cosine-transform coefficient data to (colour subsampled) video data. Furthermore, bounds on the coefficient data can be obtained. Using a second operator S to model colour subsampling and choosing a right-inverse Ŝ, MPEG decompression amounts to finding a video u such that

$\begin{equation*}u=s+\widehat{S}Kv,\end{equation*}$

where v ∈ D with D being the admissible set of cosine coefficient data, and s ∈ ker(S) compensates for the colour upsampling of Ŝ. Incorporating the infimal-convolution of second-order spatio-temporal TGV functionals as regularisation for video data (see subsection 5.3 and [35, 109]), decompression then amounts to solve

$\begin{equation*}\underset{v\in D,\enspace s\in \mathrm{ker}\left(S\right)}{\mathrm{min}}\mathrm{I}\mathrm{C}\mathrm{T}\mathrm{G}\mathrm{V}\left(s+\widehat{S}Kv\right).\end{equation*}$

Again, the minimisation problem can be solved using duality-based convex optimisation methods as described in section 6 and we refer to figure 18 for a comparison of standard MPEG-2 decompression and the result obtained with this model.

**Figure 18.** Example of variational MPEG decompression. Standard (top row) and ICTGV-based (bottom row) decompression of the *Juggler* image sequence from [11]. On the left, the second frame (P-frame) is shown in detail while on the right, all 8 frames are depicted. Reprinted from [35] by permission from Springer Nature Customer Service Centre GmbH: Springer © 2015.
Download figure:
Standard image High-resolution image

8. Applications in medical imaging and image reconstruction

8.1. Denoising of dual-energy computed-tomography (CT) data

Since its development in the 1970s, computed x-ray tomography (CT) became a standard tool in medical imaging. As CT bases on x-rays, the health risks associated with ionising radiation is certainly a drawback of this imaging technique. Further, the acquired images do in general not allow to differentiate objects with the same density. For the former point, a low radiation dose is an important goal, being, of course in conflict with the demand of a high signal-to-noise ratio (SNR). Regarding the differentiation of objects with the same density [88, 115], a recently developed approach bases on an additional dataset from a second x-ray source (typically placed in a 90° offset) which possess a different spectrum (or energy) compared to the standard x-ray emitter in CT, the dual-energy CT device, see figure 19(a).

Objects of different material having the same response for one x-ray source may admit a different response for the second source, making a differentiation possible. A relevant application of this principle is, for instance, the quantification of contrast agent concentration. Adjusting a dual-energy CT device such that normal tissue is insensitive for both x-ray sources and sensitive for an administered contrast agent allows to infer its concentration from the difference of the two acquired images, see figure 19(b). This may be useful, for instance, for recognising perfusion deficits and thus aid the diagnosis of, e.g., pulmonary embolism in the lung [131]. However, due to low doses for the dual-energy CT scan as well as a limited sensitivity with respect to the contrast agent, the difference image can be noisy and denoising is required in order to obtain a meaningful interpretation, see figure 19(c).

**Figure 19.** Example of L¹–TGV² denoising for dual energy computed tomography. (a) A schematic of a dual-energy CT device. (b) A pair of (reconstructed) dual-energy CT images. (c) A noisy difference image with marked perfusion deficit region. (d) Difference image of the TGV-denoised dataset (3D denoising, only one slice is shown).
Download figure:
Standard image High-resolution image

In the following, a variational denoising approach is derived that takes the structure of the problem into account. First, let A₀ and B₀ be the noisy CT-reconstructions associated with the respective x-ray source. Then, as the difference image contains the relevant information, we would like to impose regularity on the difference image A − B as well as a 'base' image B instead of penalising each image separately. As we may assume that the contrast agent concentration as well as the density is piecewise smooth, is admits a low total generalised variation, and hence, we choose this functional as a penalty, for instance of second order. Furthermore, as the results should be usable for a quantification, we have to account for that and therefore choose an L¹-fidelity term as this is known to possess desirable contrast-preservation properties in conjunction with TV and TGV [38, 62]. In total, this leads to the variational problem

$\begin{equation*}\underset{\left(A,B\right)\in {L}^{1}{\left({\Omega}\right)}^{2}}{\mathrm{min}}\enspace {{\Vert}A-{A}_{0}{\Vert}}_{1}+{{\Vert}B-{B}_{0}{\Vert}}_{1}+{\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{2}\left(B\right)+{\mathrm{T}\mathrm{G}\mathrm{V}}_{{\alpha }^{\prime }}^{2}\left(A-B\right)\end{equation*}$

where A₀, B₀ ∈ L¹(Ω) is given and α = (α₀, α₁) as well as α' = (α₀', α₁') are positive regularisation parameters. Having the application in mind, the domain Ω is typically a bounded three-dimensional domain with Lipschitz boundary. Then, existence of minimisers can be obtained using the tools from section 5, see proposition 5.17, which nevertheless requires some straightforward adaptations. Due to the lack of strict convexity, however, the solutions might be non-unique. Further, numerical algorithms can be developed along the lines of section 6, for instance, a primal-dual algorithm as outlined in subsection 6.2. In case of non-uniqueness, the minimisation procedure 'chooses' one solution in the sense that it converges to one element of the solution set, such that variational model and optimisation algorithm cannot clearly be separated and other results might be possible using different optimisation algorithms.

Figure 19(d) shows denoising results for the primal-dual algorithm, where a clear improvement of image quality for the difference image in comparison to figure 19(c) can be observed. In particular, the total generalised variation model is suitable to recover the smooth distribution of the contrast agent within the lung, including the perfusion deficit region, as well as the discontinuities induced by bones, vessels, etc. Further, one can see that the dedicated modelling of the problem as a denoising problem for a difference image based on two datasets turns out to be beneficial. A denoising procedure that only depends on the noisy difference image would not allow for such an improvement of image quality.

8.2. Parallel reconstruction in magnetic resonance imaging

Magnetic resonance imaging (MRI) is a tomographic imaging technique that is heavily used in medical imaging and beyond. It builds on an interplay of magnetic fields and radio-frequency pulses, which allows for localised excitation and, via induction of current in receiver coils, for a subsequent measurement of the proton density inside the object of interest [46]. In the standard setting, MRI delivers qualitative images visualising the density of hydrogen protons, e.g., inside the human body. Its usefulness is in particular due to an excellent soft tissue contrast (as opposed to computed tomography) and a high spatial resolution of MR images. The trade-off, in particular for the latter, is the long measurement time, which comes with obvious drawbacks such as patient discomfort, limitations on patient throughput and imaging artefacts resulting from temporally inconsistent data due to patient motion.

Subsampled data acquisition and parallel imaging [97, 149, 179] (combined with appropriate reconstruction methods) are nowadays standard techniques to accelerate MRI measurements. As the data in an MR experiment is acquired sequentially, a reduced number of measurements directly implies a reduced measurement time, however, in order to maintain the same image resolution, the resulting lack of data needs to be compensated for by other means. Parallel imaging achieves this to some extent by using not a single but multiple measurement coils and combining the corresponding measured signals for image reconstruction. On top of that, advanced mathematical reconstruction methods such as compressed sensing techniques [21, 132] or, more general, variational reconstruction have been shown to allow for a further, significant reduction of measurement time with a negligible loss of image quality.

In this context, transform-based regularisation techniques [132, 133] and derivative-based techniques [21, 121] are among the most popular approaches. More recently, also learning-based methods building on the structure of variational approaches have become very popular [101, 150]. Here, we focus on variational regularisation approaches with first- and higher-order derivatives. To this aim, we first deal with the forward model of parallel, static MR imaging.

In a standard setting, the MR measurement process can be modelled as measuring Fourier coefficients of the unknown image. In order to include measurements from multiple coils, spatially varying sensitivity profiles of these coils also need to be included in the forward model via a pointwise multiplication in image space. Subsampled data acquisition then corresponds to measuring the Fourier coefficients only on a certain measurement domain in Fourier space, which is defined by a subsampling pattern. Let ${c}_{1},\dots ,{c}_{k}\in {\mathcal{C}}_{0}\left({\mathbf{R}}^{d},\mathbf{C}\right)$ be functions modelling some fixed coil sensitivity profiles for k receiver coils, let σ be a positive, finite Radon measure on R^d that defines the sampling pattern, and let Ω ⊂ R^d be a bounded Lipschitz domain that represents the image domain. Then, following the lines of [30], we define, for p ∈ [1, ∞], the MR measurement operator $K:{L}^{p}\left({\Omega},\mathbf{C}\right)\to {L}_{\sigma }^{2}{\left({\mathbf{R}}^{d},\mathbf{C}\right)}^{k}$ as

$\begin{equation}{\left(Ku\right)}_{i}\left(\xi \right)=\hat{{c}_{i}u}\left(\xi \right)=\frac{1}{{\left(2\pi \right)}^{d/2}}{\int }_{{\mathbf{R}}^{d}}{c}_{i}\left(x\right)u\left(x\right){\mathrm{e}}^{-\mathrm{i}\xi \cdot x}\enspace \mathrm{d}x,\quad \xi \in {\mathbf{R}}^{d},\end{equation} \tag{ 87 }$

where we extend u by zero to R^d. Note that for each u ∈ L^p(Ω, C), Ku as a function on R^d is bounded and continuous which follows from

$\begin{equation*}\vert {\left(Ku\right)}_{i}\left(\xi \right)\vert {\leqslant}\frac{{\Vert}{c}_{i}{{\Vert}}_{\infty }}{{\left(2\pi \right)}^{d/2}}{\Vert}u{{\Vert}}_{1}{\leqslant}C\frac{{\Vert}{c}_{i}{{\Vert}}_{\infty }}{{\left(2\pi \right)}^{d/2}}{\Vert}u{{\Vert}}_{p}.\end{equation*}$

Thus, since σ is finite, K indeed linearly and continuously maps into ${L}_{\sigma }^{2}\left({\mathbf{R}}^{d},\mathbf{C}\right)$ .

While here, we assume the coil sensitivities to be known (such that the forward model is linear), obtaining them prior to image reconstruction is non-trivial and we refer to [21, 166, 185, 190] for some existing methods. In the experiments discussed below, we followed the approach of [21] and employed, for each individual coil, a variational reconstruction with a quadratic regularisation on the derivative (H¹-regularisation) followed by the convolution with a smoothing kernel. For each coil, the sensitivity profile was then obtained by division with the sum-of-squares image (which is, despite its name, the pointwise square root of the sum of the squared modulus of the individual coil images).

A regularised reconstruction from MR measurement data $f\in {L}_{\sigma }^{2}{\left({\mathbf{R}}^{d},\mathbf{C}\right)}^{k}$ can be obtained by solving

$\begin{equation}\underset{u\in {L}^{p}\left({\Omega},\mathbf{C}\right)}{\mathrm{min}}\enspace \frac{1}{2}\sum _{i=1}^{k}{\Vert}{\left(Ku\right)}_{i}-{f}_{i}{{\Vert}}_{2}^{2}+{\mathcal{R}}_{\alpha }\left(u\right),\end{equation} \tag{ 88 }$

where we test with both ${\mathcal{R}}_{\alpha }=\alpha \mathrm{T}\mathrm{V}$ and ${\mathcal{R}}_{\alpha }={\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{2}$ , in which case we can choose 1 < p ⩽ d/(d − 1). Note that well-posedness for (88) follows from theorems 2.11, 2.14 in the case of TV and from proposition 5.17 in the case of ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{2}$ (where straightforward adaptions are necessary to include complex-valued functions). Numerically, the optimisation problem can be solved using the algorithmic framework described in section 6, where again, some modifications are necessary to deal with complex-valued images.

Figure 20 compares the results between these two choices of regularisation functionals and a conventional reconstruction based on direct Fourier inversion using non-uniform fast Fourier transform (NUFFT) [87] for different subsampling factors and a dataset for which a fully sampled ground truth is available. Undersampled 2D radial spin-echo measurements of the human brain were performed with a clinical 3T scanner using a receive-only 12 channel head coil. Sequence parameters were: T_R = 2500 ms, T_E = 50 ms, matrix size 256 × 256, slice thickness 2 mm, in-plane resolution 0.78 mm × 0.78 mm. The sampling direction of every second spoke was reversed to reduce artefacts from off-resonances [22], and numerical experiments were performed using 96, 48 and 24 projections. As $\frac{\pi }{2}N$ projections (402 for N = 256 in our case) have to be acquired to obtain a fully sampled dataset in line with the Nyquist criterion [18], this corresponds to undersampling factors of approximately 4, 8 and 16. The raw data was exported from the scanner, and image reconstruction was performed offline.

It can be seen that in particular at higher subsampling factors, variational, derivative-based reconstruction reduces artefacts stemming from limited Fourier measurements. Both TV and TGV perform well, while a closer look reveals that staircasing artefacts present with TV can be avoided using second-order TGV regularisation.

8.3. Diffusion tensor imaging

Magnetic resonance imaging offers, apart from obtaining morphological images as outlined in subsection 8.2, many other possibilities to acquire information about the imaged objects. Among these possibilities, diffusion tensor imaging (DTI) is one of the more recent developments. It aims at measuring the diffusion directions of water protons in each spatial point. The physical background is given by the Bloch–Torrey equation which describes the spatio-temporal evolution of magnetisation vector taking diffusion processes into account [183]. Based on this, diffusion-weighted imaging can be performed which uses dedicated MR sequences depending on a direction vector q ∈ R³ in order to obtain displacement information associated with that direction.

This leads to the following model. Assume that ρ₀ : R³ → R is the proton density to recover and ρ_t : R³ × R³ → R is the function such that for each x, x' ∈ R³, the value ρ_t(x, x') represents the probability of a proton moving from x to x' during the time t > 0. By applying a diffusion-sensitive sequence (such as, e.g., a pulsed-gradient spin echo [180]) associated with the vector q ∈ R³, one is able to measure in k-space as follows:

$\begin{equation*}S\left(k,q\right)=\frac{1}{{\left(2\pi \right)}^{3}}{\int }_{{\mathbf{R}}^{3}}{\rho }_{0}\left(x\right){\mathrm{e}}^{-\mathrm{i}k\cdot x}{\int }_{{\mathbf{R}}^{3}}{\rho }_{t}\left(x,{x}^{\prime }\right){\mathrm{e}}^{-\mathrm{i}q\cdot \left({x}^{\prime }-x\right)}\enspace \mathrm{d}{x}^{\prime }\enspace \mathrm{d}x,\end{equation*}$

where k ∈ R³, see [51, 67]. Note that in practice, also the coil sensitivity profile would influence the measurement as outlined in subsection 8.2, however, for the sake of simplicity, we neglect this aspect in the following. Now, sampling q across R³ would then, in principle, allow to recover the six-dimensional function u : (x, x') ↦ ρ₀(x)ρ_t(x, x') by inverse Fourier transform, since $S\left(k,q\right)=\left(\mathcal{F}u\right)\left(k-q,q\right)$ for each k, q ∈ R³. The 6D-space spanned by the coordinates k and q is called kq -space. Assuming that for a fixed q ∈ R³, the k-space is fully sampled then allows to recover f_q : R³ → C by inverse Fourier transform, where

$\begin{equation*}{f}_{q}\left(x\right)={\rho }_{0}\left(x\right)\frac{1}{{\left(2\pi \right)}^{3/2}}{\int }_{{\mathbf{R}}^{3}}{\rho }_{t}\left(x,{x}^{\prime }\right){\mathrm{e}}^{-\mathrm{i}q\cdot \left({x}^{\prime }-x\right)}\enspace \mathrm{d}{x}^{\prime }.\end{equation*}$

Obtaining and analysing f_q for a coverage of the q-space is called q -space imaging which also is the basis of orientation-based analysis such as q -ball imaging [184]. However, as these techniques require too much measurement time in practice, one usually makes assumptions about the structure of ρ_t in order to avoid the measurement of f_q for too many q.

Along this line, the probably simplest model is to assume that for each x, ρ_t(x, ⋅) follows a Gaussian distribution centred around x with symmetric positive definite covariance matrix 2tD(x) ∈ S^3×3, i.e.,

$\begin{equation*}{\rho }_{t}\left(x,{x}^{\prime }\right)=\frac{1}{\sqrt{{\left(4\pi t\right)}^{3}\left\vert \mathrm{det}\enspace D\left(x\right)\right\vert }}\enspace {\mathrm{e}}^{-\frac{1}{4t}\left({x}^{\prime }-x\right)\cdot D{\left(x\right)}^{-1}\left({x}^{\prime }-x\right)}.\end{equation*}$

For fixed x ∈ R³, this can be interpreted as the fundamental solution of the diffusion equation

$\begin{equation*}\frac{\partial \rho }{\partial t}-\mathrm{d}\mathrm{i}\mathrm{v}\left(D\left(x\right){\nabla }_{{x}^{\prime }}\rho \right)=0\quad \text{in}\; \;\left.\right]0,\infty \left[\right. \end{equation*}$

shifted by x and evaluated at time t. The model for ρ_t thus indeed reflects linear diffusion through a homogeneous medium. This makes sense as diffusion during the measurement process is usually orders of magnitudes smaller than the spatial scale one is interested in, but the homogeneity assumption might also be violated in case when microstructures are present. Nevertheless, with this assumption, in the above case of full k-space sampling, one gets

$\begin{equation}{f}_{q}\left(x\right)={\rho }_{0}\left(x\right){\mathrm{e}}^{-tq\cdot D\left(x\right)q}.\end{equation} \tag{ 89 }$

Clearly, for q = 0, we have f₀ = ρ₀, and assuming ρ₀ > 0 almost everywhere leads to the following pointwise equation that is linear in D:

$\begin{equation*}D\cdot \left(q\otimes q\right)=-\frac{1}{t}\enspace \mathrm{log}\left(\frac{{f}_{q}}{{f}_{0}}\right).\end{equation*}$

Hence, one can recover by D by measuring f₀ and ${f}_{{q}_{1}},\dots ,{f}_{{q}_{m}}$ for q₁, ..., q_m ∈ R³ suitably chosen, i.e., such that in particular, D is uniquely determinable from D ⋅ (q_i ⊗ q_i) for i = 1, ..., m. In particular, one requires that the symmetric tensors q₁ ⊗ q₁, ..., q_m ⊗ q_m span the space Sym²(R³), meaning that m must be at least 6. Note that according to (89), f_q must be real and non-negative, such that in practice, it suffices to reconstruct the absolute value of f_q, for instance, by computing the sum-of-squares image.

The inverse problem for D can then be described as follows. Restricting the considerations to a bounded domain Ω ⊂ R³ and letting p ∈ [1, ∞] such that g₁, ..., g_m ∈ L^p(Ω) where ${g}_{i}=-\frac{1}{t}\;\mathrm{log}\left({f}_{{q}_{i}}/{f}_{0}\right)$ , we aim at solving

$\begin{equation}D\cdot \left({q}_{i}\otimes {q}_{i}\right)={g}_{i}\quad \text{for}\quad i=1,\dots ,m,\end{equation} \tag{ 90 }$

for D ∈ L^p(Ω, Sym²(R³)). It is easy to see that this problem is well-posed, but regularisation is still necessary in practice as the measurements and the reconstruction are usually very noisy. To do so, one can, in the case p = 2, minimise a Tikhonov functional with quadratic discrepancy term and positive semi-definiteness constraints:

$\begin{equation}\underset{D\in {L}^{2}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{2}\left({\mathbf{R}}^{3}\right)\right)}{\mathrm{min}}\enspace \frac{1}{2}\sum _{i=1}^{m}{{\Vert}D\cdot \left({q}_{i}\otimes {q}_{i}\right)-{g}_{i}{\Vert}}_{2}^{2}+{\mathcal{R}}_{\alpha }\left(D\right)+{\mathcal{I}}_{\left\{D{\geqslant}0\right\}}\left(D\right),\end{equation} \tag{ 91 }$

Here, {D ⩾ 0} denotes the set of symmetric tensor fields that are positive semi-definite almost everywhere in Ω. Further, the regulariser ${\mathcal{R}}_{\alpha }$ is preferably tailored to the structure of symmetric tensor fields. Since the D to recover can be assumed to admit discontinuities, for instance, at tissue borders, the total deformation TD for Sym²(R³)-valued functions as described in subsection 3.2, constitutes a meaningful regulariser. In this context, higher-order regularisation via ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{2}$ for Sym²(R³)-valued functions according to definition 5.1 can be beneficial as, e.g., principal diffusion directions might smoothly vary within the same tissue type [186, 187]. In both cases, problem (91) is well-posed, admits a unique solution which can, once discretised, be found numerically by the algorithms outlined in section 6.

Once the diffusion tensor field D is obtained, one can use it to visualise some of its properties. For instance, in the context of medical imaging, the eigenvectors and eigenvalues of D play a role in interpreting DTI data. Based on the fractional anisotropy [12], which is defined as

$\begin{equation*}{\mathrm{F}\mathrm{A}}_{D}=\sqrt{\frac{1}{2}}\frac{\sqrt{{\left({\lambda }_{1}-{\lambda }_{2}\right)}^{2}+{\left({\lambda }_{2}-{\lambda }_{3}\right)}^{2}+{\left({\lambda }_{3}-{\lambda }_{1}\right)}^{2}}}{\sqrt{{\lambda }_{1}^{2}+{\lambda }_{2}^{2}+{\lambda }_{3}^{2}}}\end{equation*}$

where ${\lambda }_{1},{\lambda }_{2},{\lambda }_{3}:{\Omega}\to \left[\right. 0,\infty \left[\right.$ are the eigenvalues of D as a function in Ω, one is able to identify isotropic regions (FA_D ≈ 0) as well as regions where diffusion only takes place in one direction (FA_D ≈ 1). The latter case indicates the presence of fibres whose orientation then corresponds to a principal eigenvalue of D. Figure 21 shows an example of DTI reconstruction from noisy data using TD and TGV² regularisation for symmetric tensor fields, using principal-direction/fractional-anisotropy-based visualisation. It turns out that also here, higher-order regularisation is beneficial for image reconstruction [187]. In particular, the faithful recovery of piecewise smooth fibre orientation fields may improve advanced visualisation techniques such as DTI-based tractography.

**Figure 21.** Example of TD- and TGV²-regularised diffusion tensor imaging reconstruction. (a) Ground truth, (d) direct reconstruction from noisy data, (b) direct inversion of (90) followed by TD-denoising, (c) Tikhonov regularisation according to (91) with TD-regulariser, (e) direct inversion of (90) followed by TGV²-denoising, (f) Tikhonov regularisation according to (91) with TGV²-regulariser. All images visualise one slice of the respective 3D tensor fields.
Download figure:
Standard image High-resolution image

8.4. Quantitative susceptibility mapping

Magnetic resonance imaging also has capabilities for the quantification of certain material properties. One of these properties is the magnetic susceptibility which quantifies the ability of a material to magnetise in a magnetic field such as the static field that is used in MRI. Recovering the susceptibility distribution of an object is called quantitative susceptibility mapping (QSM) [72, 177].

Assuming that the static field is aligned with the z-axis of a three-dimensional coordinate system, this susceptibility can be related to the z-component of the static field inhomogeneity δB₀ : R³ → R that is caused by the material, which in turn induces a shift in resonance frequency and, consequently, a phase shift in the complex image data. For instance, if ${\varphi }_{t}:{\mathbf{R}}^{3}\to \left[\right. -\pi ,\pi \left[\right.$ denotes the phase of an MR image acquired with a gradient echo (GRE) sequence with echo time t > 0, the relation between δB₀ and φ₀ can be stated as:

$\begin{equation*}{\varphi }_{t}={\varphi }_{0}+2\pi \gamma t\left(\delta {B}_{0}\right)\enspace \mathrm{m}\mathrm{o}\mathrm{d}\enspace 2\pi \end{equation*}$

where ${\varphi }_{0}:{\mathbf{R}}^{3}\to \left[\right. -\pi ,\pi \left[\right.$ is the time-independent phase offset induced by a single measurement coil and γ is the gyromagnetic ratio. Using multiple coils, the phase offset φ₀ can be recovered [159] such that we may assume, in the following, that φ₀ = 0. Pursuing a Lorentzian sphere approach and assuming that in the near field, the magnetic dipoles moments that cause the magnetisation are randomly distributed, one is able to relate δB₀ with the magnetic susceptibility χ associated with the static field orientation approximately as follows [172]:

$\begin{equation*}\delta {B}_{0}={B}_{0}\left(\chi {\ast}d\right),\end{equation*}$

where B₀ is the static field strength and d : R³\{0} → R is the dipole kernel according to

$\begin{equation*}d\left(x,y,z\right)=\frac{1}{4\pi }\frac{2{z}^{2}-{x}^{2}-{y}^{2}}{{\left({x}^{2}+{y}^{2}+{z}^{2}\right)}^{5/2}}.\end{equation*}$

Assuming further that the susceptibility is isotropic, i.e., does not depend on the orientation of the static field, it may be recovered from the phase data φ_t. However, the phase image φ_t is only well-defined where the magnitude of the MR image is non-zero (or above a certain threshold). Denoting by Ω ⊂ R³ a Lipschitz domain that describes where φ_t is available, recovering χ then amounts to solving

$\begin{equation*}2\pi \gamma t{B}_{0}\left(\chi {\ast}d\right)={\varphi }_{t}\enspace \mathrm{m}\mathrm{o}\mathrm{d}\enspace 2\pi \enspace \text{in}\;{\Omega}\end{equation*}$

for χ : R³ → R. This problem poses several challenges. First, the values on the left-hand side are only available up to integer multiples of 2π, such that phase unwrapping becomes necessary. There is a plethora of methods available for doing this for discrete data [159], however, in regions of fast phase change, these methods might not correctly resolve the ambiguities introduced by phase wrapping. Consequently, the unwrapped phase image ${\varphi }_{t}^{\mathrm{u}\mathrm{n}\mathrm{w}\mathrm{r}\mathrm{a}\mathrm{p}}$ might be inaccurate.

With unwrapped phase data being available, the next challenge is to obtain χ on the whole space from a noisy version of χ * d on Ω, which is an underdetermined problem. The usual approach for this challenge is to split χ into its contributions on Ω and R³\Ω and only aim at reconstructing χ on Ω. Now, as the dipole kernel d is harmonic on R³\{0}, the function $\chi {\vert }_{{\mathbf{R}}^{3}{\backslash}{\Omega}}{\ast}d$ is harmonic in Ω. Thus, one can write

$\begin{equation}\left\{\begin{array}{ccc}\hfill 2\pi \gamma t{B}_{0}\left(\chi {\vert }_{{\Omega}}{\ast}d\right)+\psi & ={\varphi }_{t}^{\mathrm{u}\mathrm{n}\mathrm{w}\mathrm{r}\mathrm{a}\mathrm{p}}\hfill & \text{in}\enspace {\Omega},\hfill \\ \hfill {\Delta}\psi & =0\hfill & \text{in}\enspace {\Omega},\hfill \end{array}\right.\end{equation} \tag{ 92 }$

and solve this equation instead. For QSM, one often estimates ψ first and subtracts this estimate from the data. This step is called background field removal in this context and there are many different approaches for that [173]. Depending on the accuracy of the background field estimate, this step may introduce further errors into the data. Nevertheless, the procedure results in a foreground field estimate ${\varphi }_{t}^{\mathrm{f}\mathrm{g}}$ for which only the deconvolution problem

$\begin{equation*}2\pi \gamma t\chi {\vert }_{{\Omega}}{\ast}d={\varphi }_{t}^{\mathrm{f}\mathrm{g}}\quad \text{in}\quad {\Omega}\end{equation*}$

has to be solved. As this problem is ill-posed, it needs to be regularised. A Tikhonov regularisation approach can then be phrased as follows:

$\begin{equation*}\underset{\chi \in {L}^{p}\left({\Omega}\right)}{\mathrm{min}}\enspace \frac{1}{p}{\int }_{{\Omega}}{\left\vert \chi {\ast}{d}_{t}-{\varphi }_{t}^{\mathrm{f}\mathrm{g}}\right\vert }^{p}\enspace \mathrm{d}x+{\mathcal{R}}_{\alpha }\left(\chi \right)\end{equation*}$

for 1 < p < ∞, d_t = 2πγtd and ${\mathcal{R}}_{\alpha }$ a regularisation functional on L^p(Ω). As the convolution with d_t results in a singular integral, the operation χ ↦ χ * d_t is only continuous L^p(Ω) → L^p(Ω) by the Calderón–Zygmund inequality [50], i.e., does not increase regularity. In this context, first-order regularisers (H¹ and TV) have been used [20], but also TGV² has been employed [63]. Note that in these approaches, one usually considers p = 2 which might cause problems regarding well-posedness for TV and TGV² as in 3D, coercivity only holds in L^3/2(Ω). This problem can for instance be avoided by setting d_t to zero in a small ball around zero; a strategy that also seems consistent with the modelling of the forward problem [172]. A numerical solution for χ then finally gives a susceptibility map of the region of interest Ω. However, since the overall procedure involves three sequential steps, each possibly introducing an error that propagates, an integrative variational model that essentially only depends on the original wrapped phase data φ_t is desirable.

Such a model can indeed be derived. First, observe that in case of sufficient regularity, the Laplacian of the unwrapped phase can easily and directly be obtained from φ_t:

$\begin{equation}{\Delta}{\varphi }_{t}^{\mathrm{u}\mathrm{n}\mathrm{w}\mathrm{r}\mathrm{a}\mathrm{p}}=\mathrm{I}\mathrm{m}\mathrm{a}\mathrm{g}\left(\left({\Delta}{\mathrm{e}}^{\mathrm{i}{\varphi }_{t}}\right){\mathrm{e}}^{-\mathrm{i}{\varphi }_{t}}\right),\end{equation} \tag{ 93 }$

such that ${\varphi }_{t}^{\mathrm{u}\mathrm{n}\mathrm{w}\mathrm{r}\mathrm{a}\mathrm{p}}$ is known up to an additive harmonic contribution. Indeed, this is the concept behind Laplacian phase unwrapping [168]. Further, introducing the wave-type operator

$\begin{equation*}\square =\frac{1}{3}\left(\frac{{\partial }^{2}}{\partial {x}^{2}}+\frac{{\partial }^{2}}{\partial {y}^{2}}\right)-\frac{2}{3}\frac{{\partial }^{2}}{\partial {z}^{2}}\end{equation*}$

and noticing that d = □Γ, where Γ : R³\{0} → R is the fundamental solution of the Laplace equation, i.e., ${\Gamma}\left(x,y,z\right)=\frac{1}{4\pi }{\left({x}^{2}+{y}^{2}+{z}^{2}\right)}^{-1/2}$ , it follows from (92) that

$\begin{equation}2\pi \gamma t\square \chi ={\Delta}{\varphi }_{t}^{\mathrm{u}\mathrm{n}\mathrm{w}\mathrm{r}\mathrm{a}\mathrm{p}}\quad \text{in}\quad \;{\Omega}.\end{equation} \tag{ 94 }$

In particular, the harmonic contribution from the background field vanishes and the data obtained in (93) can directly be used on the right-hand side. Thus, only a wave-type partial differential equation has to be solved in which there is no longer the need for background field correction. The equation is, however, missing boundary conditions such that one cannot expect to recover χ in all circumstances. Under a priori assumptions on χ, the lack of boundary conditions can be mitigated by the introduction of a regularisation functional. Indeed, assuming that χ is piecewise constant and of bounded variation, the minimisation of TV subject to (94) recovers χ up to an additive constant [44].

Since the data φ_t might be noisy, the variational model should also account for errors on the right-hand side of (94) and introduce a suitable discrepancy term. Assuming Gaussian noise for φ_t, the right-hand side ${\Delta}{\varphi }_{t}^{\mathrm{u}\mathrm{n}\mathrm{w}\mathrm{r}\mathrm{a}\mathrm{p}}$ is perturbed by noise in H⁻²(Ω) which suggests a H⁻²-discrepancy term for (94). The latter can be realised by requiring ${\Delta}\psi =2\pi \gamma t\square \chi -{\Delta}{\varphi }_{t}^{\mathrm{u}\mathrm{n}\mathrm{w}\mathrm{r}\mathrm{a}\mathrm{p}}$ for a ψ ∈ L²(Ω) and measuring the L²-norm of ψ. In total, this leads to

$\begin{equation}\begin{cases}\underset{\chi \in {L}^{p}\left({\Omega}\right),\psi \in {L}^{2}\left({\Omega}\right)}{\mathrm{min}}\frac{1}{2}{\int }_{{\Omega}}\vert \psi {\vert }^{2}\mathrm{d}x+{\mathcal{R}}_{\alpha }\left(\chi \right)\hfill \\ \text{subject}\;\text{to}\;{\Delta}\psi =2\pi \gamma t\square \chi -{\Delta}{\varphi }_{t}^{\mathrm{u}\mathrm{n}\mathrm{w}\mathrm{r}\mathrm{a}\mathrm{p}}\quad \text{in}\;{\Omega},\quad \hfill \end{cases}\end{equation} \tag{ 95 }$

where 1 ⩽ p < ∞, the constraint has to be understood in the distributional sense, and ${\mathcal{R}}_{\alpha }$ is a regularisation functional on L^p(Ω) realising a priori assumptions on χ that compensate for the lack of boundary conditions. In [126], the choice ${\mathcal{R}}_{\alpha }={\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{2}$ was proposed and studied. Choosing $p=\frac{3}{2}$ , the functional in (95) is coercive up to finite dimensions and the linear PDE-constraint is closed, so one can easily see that an optimal solution always exists and yields finite values once there is a pair (χ, ψ) ∈ L²(Ω) × BV(Ω) that satisfies the constraints.

A numerical algorithm for the discrete solution of (95) with TGV²-regularisation can easily be derived by employing the tools of section 6 and, e.g., finite-difference discretisations of the operator □. In [126], a primal-dual algorithm has been implemented and tested for synthetic as well as real-life data. It turns out that the integrative approach (95) is very robust to noise and can in particular be employed for fast 3D MRI-acquisition schemes that may yield low signal-to noise ratio such as 3D echo-planar imaging (EPI) [147]. It has been tested on raw phase data, see figure 22, where the benefits of higher-order regularisation also become apparent. Due to the short scan time that is possible by this approach as well as its robustness, it might additionally contribute to advance QSM further towards clinical applications.

**Figure 22.** Example for integrative TGV-regularised susceptibility reconstruction from wrapped phase data. (a) Magnitude image (for brain mask extraction). (b) Input phase image φ_t (single gradient echo, echo time: 27 ms, field strength: 3 T). (c) Result of the integrative approach (95) (scale from −0.15 to 0.25 ppm). All images visualise one slice of the respective 3D image.
Download figure:
Standard image High-resolution image

8.5. Dynamic MRI reconstruction

As mentioned in subsection 8.2, data acquisition in MR imaging is relatively slow. This can be compensated by subsampling and variational reconstruction techniques such that in controlled environments, as for instance with brain or knee imaging, a good reconstruction quality can be obtained. The situation is more difficult when imaging parts of the body that are affected, for instance, by breathing motion, or when one aims to image certain dynamics such as with dynamic contrast enhanced MRI or heart imaging. Regarding unwanted motion, there exists a large amount of literature on motion correction techniques (see [197] for a review) which can be separated into prospective and retrospective motion correction and which often rely on additional measurements to estimate and correct for unwanted motion. In contrast to that, dynamic MRI aims to capture certain dynamic processes such as heartbeats, the flow of blood or contrast agent. Here, the approach is often to acquire highly subsampled data, possibly combined with gating techniques, such that motion consistency can be assumed for each single frame of a time series of measurements. The severe lack of data for each frame can then only be mitigated by exploiting temporal correspondences between different measurement times. One way to achieve this is via Tikhonov regularisation of the dynamic inverse problem, which, for instance, amounts to

$\begin{equation}\underset{u\in {L}^{p}\left(\left.\right]0,T\left[\right.{\times}{\Omega}\right)}{\mathrm{min}}\enspace \frac{1}{2}\sum _{i=1}^{k}{\int }_{0}^{T}{\Vert}{\left({K}_{t}{u}_{t}\right)}_{i}-{\left({f}_{t}\right)}_{i}{{\Vert}}_{2}^{2}\enspace \mathrm{d}t+{\mathcal{R}}_{\alpha }\left(u\right),\end{equation} \tag{ 96 }$

where p ∈ [1, ∞], T > 0, Ω ⊂ R^d is the image domain, and for almost every $t\in \left.\right]0,T\left[\right.$ , σ_t is a positive, finite Radon measure on R^d that represents the possibly time-dependent Fourier sampling pattern at time t, such that ${K}_{t}:{L}^{p}\left({\Omega}\right)\to {L}_{{\sigma }_{t}}^{2}{\left({\mathbf{R}}^{d},\mathbf{C}\right)}^{k}$ according to (87) models the MR forward operator, and ${f}_{t}\in {L}_{{\sigma }_{t}}^{2}{\left({\mathbf{R}}^{d},\mathbf{C}\right)}^{k}$ represents the associated measurement data. Further, u_t denotes the evaluation of u at time t which is almost everywhere a function in L^p(Ω). As usual, ${\mathcal{R}}_{\alpha }$ corresponds to the regularisation functional that can be used to enforce additional regularity constraints. Note that in order to obtain a well-defined formulation for the time-dependent integral, the σ_t have to vary in a measurable way with t such that the associated K_t and the data f_t are also measurable in a suitable sense. We refer to [30] for details on the necessary notion and spaces, and an analysis of the above problem in the context of optimal-transport-based regularisation.

In the context of clinical MR applications, temporal Fourier transforms [117], temporal derivatives [2] or combinations thereof [86] have, for instance, been proposed for temporal regularisation. More recently, methods that build on motion-dependent additive decomposition of the dynamic image data into different components have been successful. The work [141] achieves this in a discrete setting via low-rank and sparse decomposition which, for the low-rank component, penalises the singular values of the matrix containing the vectorised frames in each column. In contrast to that, by employing the ICTGV functional presented in subsection 5.3, the work [167] achieves an additive decomposition and adaptive regularisation of the dynamic data via penalising differently weighted spatio-temporal derivatives. There, problem (96) is solved for the choice

$\begin{equation*}{\mathcal{R}}_{\alpha }\left(u\right)={\mathrm{I}\mathrm{C}\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{2}\left(u\right)=\underset{w\in \mathrm{B}\mathrm{V}\left(\left.\right]0,T\left[\right.{\times}{\Omega}\right)}{\mathrm{inf}}\enspace {\mathrm{T}\mathrm{G}\mathrm{V}}_{{\beta }_{1}}^{2}\left(u-w\right)+{\mathrm{T}\mathrm{G}\mathrm{V}}_{{\beta }_{2}}^{2}\left(w\right),\end{equation*}$

where the ${\mathrm{T}\mathrm{G}\mathrm{V}}_{{\beta }_{i}}^{2}$ are second-order spatio-temporal TGV functionals that employ different weightings of the components of the spatio-temporal derivatives in such a way that for ${\mathrm{T}\mathrm{G}\mathrm{V}}_{{\beta }_{1}}^{2}$ , changes in time are penalised stronger than changes in space while ${\mathrm{T}\mathrm{G}\mathrm{V}}_{{\beta }_{2}}^{2}$ acts the other way around. The numerical solution of (96) can again be obtained within the algorithmic framework presented in section 6 and we refer to [171] for a GPU-accelerated open source implementation and demo scripts.

Figure 23 shows the result of ICTGV-regularised reconstruction of a multi-coil cardiac cine dataset (subsampled with factor ≈ 11) and compares to the straightforward sum-of-squares (SOS) reconstruction. Since the SOS reconstruction does not account for temporal correspondences, it is not able to obtain a useful result for a high subsampling factor while the ICTGV-based reconstruction resolves fine details as well as motion dynamics rather well. Figure 24 shows a comparison to the low-rank and sparse (L + S) method of [141] for a second cine dataset with a different view. Here, the parameters for the L + S method where optimised for each experiment separately using the (in practice unknown) ground truth while for ICTGV, the parameters were trained a priori on a different dataset and fixed afterwards. It can be seen in figure 24 that both methods perform rather well up the high subsampling factors, where the ICTGV-based is able to recover fine details (highlighted by arrows) that are lost with L + S reconstruction.

**Figure 24.** Comparison of L + S- and ICTGV-regularised dynamic MR reconstruction. The first column shows, from top to bottom, a frame of the ground truth image sequence along with the temporal evolution of a vertical and horizontal cross section (indicated by red dotted lines) as well as a close up. Columns 2 and 3 depict the reconstruction results for L + S regularisation, while columns 4 and 5 depict the corresponding results for ICTGV regularisation (subsampling factors r = 12 and r = 16). The red arrows indicate details that are lost by L + S regularisation but maintained with ICTGV regularisation. Reproduced from [167] John Wiley & Sons. © 2016 International Society for Magnetic Resonance in Medicine.
Download figure:
Standard image High-resolution image

8.6. Joint MR-PET reconstruction

We have seen in subsections 8.2 and 8.5 that image reconstruction from parallel, subsampled MRI data is non-trivial and can greatly be improved with variational regularisation. Beyond MRI and CT, a further medical imaging modality of high clinical relevance is positron emission tomography (PET). As opposed to standard MR imaging, PET imaging is quantitative and builds on reconstructing the spatial distribution of a radioactive tracer that is injected into the patient prior to the measurement. The forward model in PET imaging is the x-ray transform (often combined with resolution modelling) and, since measurements correspond to photon counts, the noise in PET imaging is typically assumed to be Poisson distributed. Reconstructing images from PET measurement data is a non-trivial inverse problem, where difficulties arise, for instance, from high Poisson noise due to limited data acquisition time, dosage restrictions for the radioactive tracer, as well as from limited measurement resolution due to finite detector size and photon acollinearity. As a result, variational reconstruction methods and in particular TV regularisation are employed also in PET imaging to improve reconstruction (see, for instance, [116, 165]).

In a clinical workflow, often both MR and PET images are acquired, which provides two complementary sources of information for diagnoses. This can also be exploited for reconstruction and in particular, for MR-prior-based PET reconstruction methods, which incorporate structural information from the MR image for PET reconstruction, are now well established in theory [105] and in practice [81, 169, 189]. While those methods regard an a priori reconstructed MR image as fixed, anatomical prior for PET, also joint, synergistic reconstruction is possible and recently became more popular due to the availability of joint MR-PET scanners [82, 122]. An advantage of the latter is that neither of the two images is fixed a priori and, in principle, a mutual benefit for both modalities due to joint reconstruction is possible. To this aim, the regularisation term needs to incorporate an appropriate coupling of the two modalities, and here we discuss the coupled TGV-based approach of [122] that allows to achieve this.

At first, we consider the forward model for PET imaging, which consists of a convolution followed by an attenuated the x-ray transform and additive corrections. With Ω ⊂ R^d the image domain such that Ω ⊂ B_R(0) for some R > 0, the x-ray transform can be defined as linear operator $P:{L}^{p}\left({\Omega}\right)\to {L}_{\mu }^{1}\left({\Sigma}\right)$ , where p ∈ [1, ∞], ${\Sigma}\subset \left\{\left(\vartheta ,x\right)\enspace \vert \enspace \vartheta \in {\mathcal{S}}^{d-1},\enspace x\in {\left\{\vartheta \right\}}^{\perp },\enspace {\Vert}x{\Vert}{< }R\right\}$ , Σ is a non-empty and open subset of the tangent bundle to ${\mathcal{S}}^{d-1}$ , via

$\begin{equation*}Pu\left(\vartheta ,x\right)={\int }_{\mathbf{R}}u\left(x+t\vartheta \right)\enspace \mathrm{d}t.\end{equation*}$

Note that here, u is extended by zero outside Ω and the measure μ on Σ is induced by the functional

$\begin{equation*}\varphi {\mapsto}{\int }_{{\mathcal{S}}^{d-1}}{\int }_{{\left\{\vartheta \right\}}^{\perp }}\varphi \left(\vartheta ,x\right)\enspace \mathrm{d}{\mathcal{H}}^{d-1}\left(x\right)\enspace \mathrm{d}{\mathcal{H}}^{d-1}\left(\vartheta \right),\end{equation*}$

see [134, section 3.4] for details. We further denote by k ∈ L¹(B_r(0)) a convolution kernel with width r > 0 that models physical limitations in PET imaging, for instance, due to finite detector size and photon acollinearity, see [151]. The PET forward model is then defined as ${K}_{\mathrm{P}\mathrm{E}\mathrm{T}}:{L}^{p}\left({\Omega}\right)\to {L}_{\mu }^{1}\left({\Sigma}\right)$

$\begin{equation*}u{\mapsto}{K}_{\mathrm{P}\mathrm{E}\mathrm{T}}u+c,\quad {K}_{\mathrm{P}\mathrm{E}\mathrm{T}}u\left(\vartheta ,x\right)=a\left(\vartheta ,x\right)P\left(u{\ast}k\right)\left(\vartheta ,x\right),\end{equation*}$

where u * k denotes the convolution of u and k (using again zero extension), $a\in {L}_{\mu }^{\infty }\left({\Sigma}\right)$ with a > 0 a.e. includes a correction for attenuation and detector sensitivities and $c\in {L}_{\mu }^{1}\left({\Sigma}\right)$ with c ⩾ 0 a.e. accounts for additive errors due to random and scattered events. Assuming the noise in PET to be Poisson distributed, we use the Kullback–Leibler divergence as defined in (2) for data fidelity.

For the MR forward model, we use again the parallel MR operator K_MR according to (87) in subsection 8.2 that includes coil sensitivity profiles and a measurement trajectory defined via a finite, positive Radon measure σ on R^d.

For regularisation, we use an extension of second-order TGV to multi-channel data as discussed in subsection 5.3, which, as in parallel MR reconstruction, is adapted to complex-valued data. That is, we define ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{2}$ for $u=\left({u}_{1},{u}_{2}\right)\in {L}_{\mathrm{l}\mathrm{o}\mathrm{c}}^{1}\left({\Omega},\mathbf{C}\right)$ similar to (53), where we use the spectral norm as pointwise dual norm on ${\mathrm{S}\mathrm{y}\mathrm{m}}^{1}{\left({\mathbf{C}}^{d}\right)}^{2}$ and the Frobenius norm on ${\mathrm{S}\mathrm{y}\mathrm{m}}^{2}{\left({\mathbf{C}}^{d}\right)}^{2}$ . In the primal version of TGV analogous to (35), this results in particular in a pointwise nuclear-norm penalisation of the first-order derivative information ∇u − w and is motivated by the goal of enforcing pointwise rank one of ∇u − w in a discretised setting and hence an alignment of level sets.

With these building blocks, a variational model for coupled MR-PET reconstruction can be written as

$\begin{equation*}\underset{u=\left({u}_{1},{u}_{2}\right)\in {L}^{p}{\left({\Omega},\mathbf{C}\right)}^{2}}{\mathrm{min}}\enspace {\lambda }_{1}\sum _{i=1}^{k}{\Vert}{\left({K}_{\mathrm{M}\mathrm{R}}{u}_{1}\right)}_{i}-{\left({f}_{1}\right)}_{i}{{\Vert}}_{2,\sigma }^{2}+{\lambda }_{2}\mathrm{K}\mathrm{L}\left({K}_{\mathrm{P}\mathrm{E}\mathrm{T}}{u}_{2}+c,{f}_{2}\right)+{\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{2}\left(u\right)\end{equation*}$

where ${\left({f}_{1}\right)}_{1},\dots ,{\left({f}_{1}\right)}_{k}\in {L}_{\sigma }^{2}\left({\mathbf{R}}^{d},\mathbf{C}\right)$ and ${f}_{2}\in {L}_{\mu }^{1}\left({\Sigma}\right)$ , f₂ ⩾ 0 almost everywhere is the given measurement data for MR and PET, respectively, and λ₁, λ₂ > 0 are the weights for the different data terms. Well-posedness for this model follows again by a straightforward adaptation of proposition 5.17 to the multi-channel setting. Regarding the regularisation parameters λ₁, λ₂, this is a particular case of coupled multi-discrepancy regularisation and we refer to [107] for results on convergence and parameter choice for vanishing noise. A numerical solution can again be obtained with the techniques described in section 6, where the discrete forward operator K_h is vectorised as K_h = diag(K_MR,h, K_PET,h) with discretised operators K_MR,h and K_PET,h, and the discrepancy ${S}_{{f}_{h}}$ is the component-wise sum of the two discrepancies above in a discrete version.

Numerical results for 3D in-vivo data using this method, together with a comparison to a standard method, can be found in figure 25 (see also [122] for a more detailed evaluation). As can be seen there, the coupling of the two modalities yield improved reconstruction results in particular for the PET channels, making sharp features and details more visible.

8.7. Radon inversion for multi-channel electron microscopy

Similar to joint MR-PET reconstruction, coupled higher-order regularisation can also be used in multi-channel electron microscopy imaging for improving reconstruction quality. As a particular technique in electron microscopy, scanning transmission electron microscopy (STEM) allows for a three-dimensional imaging of nanomaterials down to atomic resolution and is heavily used in material sciences and nanotechnology, e.g. for quality control and troubleshooting in the production of microchips. Beyond providing pure density images, spectroscopy methods in STEM imaging also allow to image the 3D elemental and chemical make-up of a sample.

Standard techniques for density and spectroscopy imaging in STEM are high-angle annular dark-field (HAADF) imaging and energy-dispersive x-ray spectroscopy (EDXS), respectively. For both imaging methods, measurement data can be acquired simultaneously while raster-scanning the material sample with a focussed electron beam. HAADF imaging records the number of electrons scattered to a specific annular range while EDXS allows to record characteristic x-rays for specific elements which are emitted when electrons change their shell position. For each position of the electron beam, both HAADF and EDXS measurements correspond to measuring (approximately) the density of a weighted sum of all elements and single element, respectively, integrated along the line of the electron beam that intersects the sample. Scanning over the entire sample orthogonal to an imaging plane, the acquired signals hence correspond to a slice-wise Radon transform of an overall density image (HAADF) and different elemental maps (EDXS).

Due to physical restrictions in the imaging system, the number of available projections (i.e., measurement angles) as well as the signal-to-noise ratio, in particular for EDXS, is limited and volumetric images obtained with standard image reconstruction methods, such as the simultaneous iterative reconstruction technique (SIRT) [93], suffer from artefacts and noise. As a result, regularised reconstruction is increasingly used also for electron tomography, with total-variation-based methods being a popular example [96]. While TV regularisation works well for piecewise-constant density distributions with sharp interfaces, the presence of gradual changes between different sample regions, e.g., due to diffusion at interfaces, motivates the usage of higher-order regularisation approaches for electron tomography [111], also in a single-channel setting [3]. In a multi-channel setting as discussed here, an additional coupling of different measurement channels is very beneficial in particular for the reconstruction of elemental maps and has been carried out with first-order TV regularisation in [201, 202] and second-order TGV regularisation in [111]. In the following, we discuss the TGV-based approach of [111] in more detail and provide experimental results.

**Figure 26.** Example of multi-channel electron tomography. The images show density maps (top row) and elemental maps (bottom row) of one slice of a 3D multi-channel electron tomography reconstruction using different reconstruction strategies [111]. Left column: SIRT [93] method. Middle column: uncoupled TGV-based regularisation. Right column: coupled TGV-based regularisation. Reproduced from [111]. CC BY 3.0.
Download figure:
Standard image High-resolution image

With Σ = Σ₁ × Σ₂, ${{\Sigma}}_{1}\subset {\mathcal{S}}^{1}{\times}\left.\right]-R,R\left[\right.$ non-empty, open, ${{\Sigma}}_{2}\subset \left.\right]-R,R\left[\right.$ non-empty, open, for some R > 0 and $\mu =\left({\mathcal{H}}^{1}\enspace \llcorner \enspace {\mathcal{S}}^{1}{\times}{\mathcal{L}}^{1}\right){\times}{\mathcal{L}}^{1}$ , we define, for ${\Omega}={B}_{R}\left(0\right){\times}\left.\right]-R,R\left[\right.$ where B_R(0) ⊂ R² and p ∈ [1, ∞], the forward operator for electron tomography as ${K}_{\mathrm{T}\mathrm{E}\mathrm{M}}:{L}^{p}\left({\Omega}\right)\to {L}_{\mu }^{1}\left({\Sigma}\right)$ via

$\begin{equation*}{K}_{\mathrm{T}\mathrm{E}\mathrm{M}}u\left(\vartheta ,s,z\right)={\int }_{\left\{x\in {B}_{R}\left(0\right)\vert x\cdot \vartheta =s\right\}}u\left(x,z\right)\enspace \mathrm{d}{\mathcal{H}}^{1}\left(x\right)\end{equation*}$

which corresponds to a slice-wise 2D Radon transform. By continuity of the Radon transform from L¹(B_R(0)) to ${L}^{1}\left({\mathcal{S}}^{1}{\times}\left.\right]-R,R\left[\right.\right)$ (see [134, section 3.4]) there is a C > 0 such that, for every u ∈ L^p(B_R(0)) and almost every $z\in \left.\right]-R,R\left[\right.$ ,

$\begin{equation*}{\int }_{{{\Sigma}}_{1}}\vert {K}_{\mathrm{T}\mathrm{E}\mathrm{M}}u\left(\vartheta ,s,z\right)\vert \enspace \mathrm{d}\left({\mathcal{H}}^{1}{\times}{\mathcal{L}}^{1}\right)\left(\vartheta ,s\right){\leqslant}C{\int }_{{B}_{R}\left(0\right)}\vert u\left(x,z\right)\vert \enspace \mathrm{d}x.\end{equation*}$

Integrating over Σ₂, it follows that K_TEM is bounded from L^p(Ω) to ${L}_{\mu }^{1}\left({\Sigma}\right)$ . Now assume f₁, ..., f_n to be given, multi-channel measurement data and the forward model for the ith measurement channel to be described with ${\left({K}_{\mathrm{T}\mathrm{E}\mathrm{M}}\right)}_{i}\in \mathcal{L}\left({L}^{p}\left({\Omega}\right),{L}_{\mu }^{1}\left({\Sigma}\right)\right)$ for i = 1, ..., n. In the example considered below, f = (f_HAADF, f_Yb, f_Al, f_Si), with f_HAADF the HAADF data, and (f_Yb, f_Al, f_Si) the EDXS data for ytterbium, aluminium and silicon, respectively. With ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{2}$ the multi-channel extension of second-order TGV as discussed in subsection 5.3, using a Frobenius-norm coupling of the different channels, we consider

$\begin{equation*}\underset{u\in {L}^{p}{\left({\Omega}\right)}^{n}}{\mathrm{min}}\enspace \sum _{i=1}^{n}{\lambda }_{i}\mathrm{K}\mathrm{L}\left({\left({K}_{\mathrm{T}\mathrm{E}\mathrm{M}}\right)}_{i}{u}_{i},{f}_{i}\right)+{\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{2}\left(u\right)\end{equation*}$

for the reconstruction of multi-channel image data, for which well-posedness again results from a multi-channel extension of proposition 5.17. A numerical solution can be obtained using the framework as described in section 6 where again the discrete forward operator K_h and the discrepancy term ${S}_{{f}_{h}}$ are vectorised accordingly, similar as in subsection 8.6.

Experimental results for this setting and a comparison to other methods can be found in figure 26, where in particular, separate TGV regularisation of each channel is compared to the Frobenius-norm-based coupling as mentioned above. It can be seen in figure 26 that using TGV regularisation significantly improves upon the standard SIRT method. Also, a coupling of the different channels is very beneficial in particular for the elemental maps, making material inclusions visible that can hardly be seen with an uncoupled reconstruction. We refer to [111] for a more detailed evaluation (and comparison to TV-based regularisation).

9. Conclusions

The higher-order total variation strategies and application examples discussed in this review show once again that while regularisation makes it possible to solve ill-posed inverse problems in the first place, the actual choice of the regularisation strategy has a tremendous impact on the qualitative properties of the regularised solutions and can be decisive on whether the inverse problem is considered being solved in practice. In the considered context of Tikhonov regularisation, convex regularisation functionals offer a great flexibility in terms of functional-analytic properties and a priori assumptions on the solutions. With the total variation being an established regulariser sharing desirable properties such as the ability to recover discontinuities, higher-order total variation regularisers offer additional possibilities, mainly the efficient modelling of piecewise smooth regions in which the derivative of some order may jump. We have seen in this paper that a regularisation theory for these functionals can be established and the overall theory is now sufficiently advanced such that the favourable properties of both first- and higher-order TV can be obtained with suitable functionals, for instance by infimal convolution. The underlying concepts are in particular suitable for various generalisations. The total generalised variation, for instance, bases on TV-type penalties for a multiple-order differentiation cascade and thus enables to realise the a priori assumption of piecewise smoothness with jump discontinuities. Further, due to higher-order derivatives being intrinsically connected to symmetric tensor fields, a generalisation to dedicated regularisation approaches for the latter is immediate. All these approaches and generalisations are indeed beneficial for applications and the solution of concrete inverse problems. This is in particular the case for inverse problems in medical imaging.

Of course, there are still several directions of future research, open questions and topics that have not been covered by this review. For instance, one of the major differences between first-order TV and higher-order approaches is the availability of a co-area formula that can be used to describe the total variation of scalar function in terms of its sublevel-sets. Generalisations to vector-valued functions or higher-order derivatives do either not exist or are not practical from the view of regularisation theory for inverse problems. As the co-area formula allows, for instance, to obtain geometrical properties for TV-regularisation [58, 112], it would be interesting to bridge the gap to higher-order TV approaches such that similar statements can be made. Some recent progress in this direction might be the connection between the solutions of certain linear inverse problems and the extremal points of the sublevel sets of the regulariser [26, 29], since the extremal points of the TV-balls are essentially characteristic functions. However, for higher-order TV and the generalisations discussed in this paper, a characterisation of its extremal points is not known to date. Further, in the context of TV-regularisation, only natural orders of differentiation have been considered in detail so far, with regularisation theory for fractional-order TV just emerging [70, 194, 199]. Indeed, there are many open questions for fractional-order TV regularisation ranging from the properties of the fractional derivative operators and their underlying spaces to optimal selection of the fractional differentiation parameter as well as the construction of efficient numerical algorithms. Finally, with all the possibilities of combining distributional differentiation and Radon-norm-penalisation, which are the essential building blocks of the regularisers discussed in this paper, the question arises whether their structure, parameters and differential operators can also be learned by data-driven optimisation. Some results in this direction can already be found in the literature [49, 71], and we expect that more will follow in the future.

Acknowledgments

The authors gratefully acknowledge support of the Austrian Science Fund (FWF) within the project P 29192. The Institute of Mathematics and Scientific Computing at the University of Graz is a member of NAWI Graz (www.nawigraz.at) and BioTechMed Graz (www.biotechmedgraz.at).

Appendix A.: Additional proofs

Lemma A.1. With Ω' ⊂ R^d measurable, in accordance with equation (2), let the functional KL on L¹(Ω')² be given as

$\begin{equation*}\left(v,f\right){\mapsto}\mathrm{K}\mathrm{L}\left(v,f\right)=\begin{cases}_{{{\Omega}}^{\prime }}f\left(\frac{v}{f}-\mathrm{log}\left(\frac{v}{f}\right)-1\right)\mathrm{d}x\hfill & \text{if}\;f{\geqslant}0,\enspace v{\geqslant}0\;\text{a.e.},\hfill \\ \infty \hfill & \text{else,}\hfill \end{cases}\end{equation*}$

where we set the integrand to v where f = 0 and to ∞ where v = 0 and f > 0. Then, KL is well-defined, non-negative, convex and lower semi-continuous. In case f ⩾ 0 a.e., it holds that KL(v, f) = 0 if and only if v = f. Further, for all v, f ∈ L¹(Ω'),

$\begin{equation}{\Vert}v-f{{\Vert}}_{1}^{2}{\leqslant}\left(\frac{2}{3}{\Vert}f{{\Vert}}_{1}+\frac{4}{3}{\Vert}v{{\Vert}}_{1}\right)\mathrm{K}\mathrm{L}\left(v,f\right),\end{equation} \tag{ A.1 }$

and in particular,

$\begin{equation}{\Vert}v{{\Vert}}_{1}{\leqslant}2\left(\mathrm{K}\mathrm{L}\left(v,f\right)+{\Vert}f{{\Vert}}_{1}\right)\quad \text{and}\quad {\Vert}f{{\Vert}}_{1}{\leqslant}2\left(\mathrm{K}\mathrm{L}\left(v,f\right)+{\Vert}v{{\Vert}}_{1}\right)\end{equation} \tag{ A.2 }$

for all f, v ∈ L¹(Ω').

Proof. At first note that, in case f, v ⩾ 0, KL is given by integrating g : [0, ∞[² → [0, ∞] with $g\left(x,y\right)=x-y-y\enspace \mathrm{log}\left(\frac{x}{y}\right)$ for $x,y\in \left[\right.0,\infty \left[\right.$ , where we use the conventions $0\enspace \mathrm{log}\left(\frac{v}{0}\right)=0$ for v ⩾ 0 and $-f\enspace \mathrm{log}\left(\frac{0}{f}\right)=\infty$ for f > 0. It is easy to see that g is non-negative, convex and lower semi-continuous, hence KL is well-defined, non-negative, convex, lower semi-continuous and, in case f ⩾ 0 a.e., KL(v, f) = 0 if and only if v = f. Also, a simple computation (see [23]) shows that for all $x,y\in \left[\right.0,\infty \left[\right.$ ,

$\begin{equation*}{\left(x-y\right)}^{2}{\leqslant}\left(\frac{2y}{3}+\frac{4x}{3}\right)g\left(x,y\right)\end{equation*}$

from which the estimate (A.1) follows with the Cauchy–Schwarz inequality applied to the square root of the above estimate. Now, for the first estimate in (A.2), we take f, v ∈ L¹(Ω') and note that in case ||v||₁ ⩽ ||f||₁, the estimate holds trivially. In the other case, v ≠ 0 and we observe that (A.1) implies

$\begin{equation*}{\Vert}v{{\Vert}}_{1}^{2}-2{\Vert}v{{\Vert}}_{1}{\Vert}f{{\Vert}}_{1}{\leqslant}\frac{2}{3}{\Vert}f{{\Vert}}_{1}\mathrm{K}\mathrm{L}\left(v,f\right)+\frac{4}{3}{\Vert}v{{\Vert}}_{1}\mathrm{K}\mathrm{L}\left(v,f\right)\end{equation*}$

from which the claimed estimate follows from rearranging, dividing by ||v||₁ and noting that ||f||₁/||v||₁ ⩽ 1. The second estimate in (A.2) follows analogously. □

Lemma A.2. For {fⁿ} and f in L¹(Ω'), let KL(f, fⁿ) → 0. Then, ||f − fⁿ||₁ → 0 and for each sequence {vⁿ} in L¹(Ω') with vⁿ ⇀ v as n → ∞ for v ∈ L¹(Ω'), it holds that

$\begin{equation*}\mathrm{K}\mathrm{L}\left(v,f\right){\leqslant}\underset{n\to \infty }{\mathrm{lim inf}}\enspace \mathrm{K}\mathrm{L}\left({v}^{n},{f}^{n}\right).\end{equation*}$

If, in addition, fⁿ ⩽ Cf a.e. in Ω' for all n and some C > 0, then for all v ∈ L¹(Ω'), we have

$\begin{equation*}\underset{n\to \infty }{\mathrm{lim sup}}\enspace \mathrm{K}\mathrm{L}\left(v,{f}^{n}\right){\leqslant}\mathrm{K}\mathrm{L}\left(v,f\right).\end{equation*}$

Proof. Assume that KL(f, fⁿ) → 0. It follows from the second estimate in (A.2) that {KL(f, fⁿ)} bounded implies {||fⁿ||₁} bounded which, using (A.1), yields that fⁿ → f in L¹(Ω'). The lim inf estimate then follows from lower semi-continuity as in lemma A.1. Now assume that additionally, fⁿ ⩽ Cf a.e. in Ω' for all n and some C > 0. By L¹-convergence we can take a subsequence $\left\{{f}^{{n}_{k}}\right\}$ such that ${f}^{{n}_{k}}\to f$ pointwise a.e., and ${\mathrm{lim}}_{k\to \infty }\mathrm{K}\mathrm{L}\left(v,{f}^{{n}_{k}}\right)={\text{lim sup}}_{n\to \infty }\mathrm{K}\mathrm{L}\left(v,{f}^{n}\right)$ . As $\mathrm{K}\mathrm{L}\left(f,{f}^{{n}_{k}}\right)\to 0$ as k → ∞, we have ${\int }_{{{\Omega}}^{\prime }}{f}^{{n}_{k}}\enspace \mathrm{log}\left(f/{f}^{{n}_{k}}\right)\enspace \mathrm{d}x\to 0$ . Also, since f log(v/f) ∈ L¹(Ω') and ${f}^{{n}_{k}}/f$ , where we set ${f}^{{n}_{k}}/f=0$ where f = 0, is bounded a.e. uniformly with respect to k, we have ${\int }_{{{\Omega}}^{\prime }}\left({f}^{{n}_{k}}/f\right)f\enspace \mathrm{log}\left(v/f\right)\enspace \mathrm{d}x\to {\int }_{{{\Omega}}^{\prime }}f\enspace \mathrm{log}\left(v/f\right)\enspace \mathrm{d}x$ by virtue of Lebesgue's theorem. Together, we get

$\begin{align*}\hfill \underset{n\to \infty }{\text{lim sup}}\mathrm{K}\mathrm{L}\left(v,{f}^{n}\right)& =\underset{k\to \infty }{\mathrm{lim}}\mathrm{K}\mathrm{L}\left(v,{f}^{{n}_{k}}\right)\hfill \\ \hfill & =\underset{k\to \infty }{\mathrm{lim}}{\int }_{{{\Omega}}^{\prime }}v\enspace \mathrm{d}x-{\int }_{{{\Omega}}^{\prime }}\frac{{f}^{{n}_{k}}}{f}f\enspace \mathrm{log}\left(\frac{v}{f}\right)\enspace \mathrm{d}x\hfill \\ \hfill & \,\quad -{\int }_{{{\Omega}}^{\prime }}{f}^{{n}_{k}}\enspace \mathrm{log}\left(\frac{f}{{f}^{{n}_{k}}}\right)\enspace \mathrm{d}x-{\int }_{{{\Omega}}^{\prime }}{f}^{{n}_{k}}\enspace \mathrm{d}x\hfill \\ \hfill & ={\int }_{{{\Omega}}^{\prime }}v-f\enspace \mathrm{log}\left(\frac{v}{f}\right)-f\enspace \mathrm{d}x=\mathrm{K}\mathrm{L}\left(v,f\right),\hfill \end{align*}$

which is what we wanted to show. □

Lemma A.3. Let k ⩾ 1, l ⩾ 0 and u : Ω → Sym^l(R^d) be (k + l)-times continuously differentiable such that ${\mathcal{E}}^{k}u=0$ in Ω. Then, ∇^k+l ⊗ u = 0 in Ω.

Proof. The statement is a slight generalisation of [28, proposition 3.1] and its proof is analogous. We present it for the sake of completeness. Choose a₁, ..., a_2l+k ∈ R^d. We show that (∇^k+l ⊗ u)(x)(a₁, ..., a_2l+k) = 0 for each x ∈ Ω. For this purpose, let L ⊂ {1, ..., 2l + k} with $\left\vert L\right\vert =l$ and denote, dropping the dependence on x, by

$\begin{equation*}{u}_{L}=u\left({a}_{\pi \left(1\right)},\dots ,{a}_{\pi \left(l\right)}\right)\end{equation*}$

for some bijective π : {1, ..., l} → L, giving a (k + l)-times differentiable u_L : Ω → R. Observe that by symmetry, u_L does not depend on the choice of π but indeed only on L. Likewise, denote by

$\begin{equation*}\frac{{\partial }^{k+l}{u}_{L}}{\partial {a}_{\complement L}}=\left({\nabla }^{k+l}\otimes {u}_{L}\right)\left({a}_{\sigma \left(1\right)},\dots ,{a}_{\sigma \left(k+l\right)}\right)\end{equation*}$

for some bijective σ : {1, ..., k + l} → ∁L. By symmetry of the derivative, $\frac{{\partial }^{k+l}{u}_{L}}{\partial {a}_{\complement L}}:{\Omega}\to \mathbf{R}$ only depends on L. We also introduce an analogous notation for the symmetrised derivative ${\mathcal{E}}^{k}$ :

$\begin{equation*}{\left({\mathcal{E}}^{k}u\right)}_{\complement L}=\frac{1}{\left(k+l\right)!}\sum _{\begin{matrix}\hfill \sigma :\left\{1,\dots ,k+l\right\}\to \complement L,\hfill \\ \hfill \sigma \enspace \mathrm{b}\mathrm{i}\mathrm{j}\mathrm{e}\mathrm{c}\mathrm{t}\mathrm{i}\mathrm{v}\mathrm{e}\hfill \end{matrix}}\left({\nabla }^{k}\otimes u\right)\left({a}_{\sigma \left(1\right)},\dots ,{a}_{\sigma \left(k+l\right)}\right),\end{equation*}$

and, for some π : {1, ..., l} → L bijective,

$\begin{equation*}\frac{{\partial }^{l}{\left({\mathcal{E}}^{k}u\right)}_{\complement L}}{\partial {a}_{L}}=\frac{1}{\left(k+l\right)!}\sum _{\begin{matrix}\hfill \sigma :\left\{1,\dots ,k+l\right\}\to \complement L,\hfill \\ \hfill \sigma \enspace \mathrm{b}\mathrm{i}\mathrm{j}\mathrm{e}\mathrm{c}\mathrm{t}\mathrm{i}\mathrm{v}\mathrm{e}\hfill \end{matrix}}\left({\nabla }^{k+l}\otimes u\right)\left({a}_{\pi \left(1\right)},\dots ,{a}_{\pi \left(l\right)},{a}_{\sigma \left(1\right)},\dots ,{a}_{\sigma \left(k+l\right)}\right).\end{equation*}$

Now, as for π : {1, ..., l} → L bijective, the definitions as well as symmetry yield

$\begin{equation*}\begin{aligned}\hfill \frac{{\partial }^{l}{\left({\mathcal{E}}^{k}u\right)}_{\complement L}}{\partial {a}_{L}}& =\frac{1}{\left(k+l\right)!}\sum _{\begin{matrix}\hfill K\subset \complement L,\hfill \\ \hfill \left\vert K\right\vert =k\hfill \end{matrix}}\hfill \\ \hfill & \hfill \sum _{\begin{matrix}\hfill \sigma :\left\{1,\dots ,k+l\right\}\to \complement L,\hfill \\ \hfill \sigma \left(\left\{1,\dots ,k\right\}\right)=K,\hfill \\ \hfill \sigma \left(\left\{k+1,\dots ,k+l\right\}\right)=\complement L{\backslash}K\hfill \end{matrix}}\left({\nabla }^{k+l}\otimes u\right)\left({a}_{\pi \left(1\right)},\dots ,{a}_{\pi \left(l\right)},{a}_{\sigma \left(1\right)},\dots ,{a}_{\sigma \left(k\right)},{a}_{\sigma \left(k+1\right)},\dots ,{a}_{\sigma \left(k+l\right)}\right)\\ \hfill & =\frac{k!l!}{\left(k+l\right)!}\sum _{\begin{matrix}\hfill K\subset \complement L,\hfill \\ \hfill \left\vert K\right\vert =k\hfill \end{matrix}}\frac{{\partial }^{k+l}{u}_{\complement L{\backslash}K}}{\partial {a}_{\complement \left(\complement L{\backslash}K\right)}}={\left(\genfrac{}{}{0pt}{}{k+l}{l}\right)}^{-1}\sum _{\begin{matrix}\hfill M\subset \left\{1,\dots ,2l+k\right\},\hfill \\ \hfill \left\vert M\right\vert =l,\enspace L\cap M=\varnothing\hfill \end{matrix}}\frac{{\partial }^{k+l}{u}_{M}}{\partial {a}_{\complement M}},\hfill \end{aligned}\end{equation*}$

we see that each $\frac{{\partial }^{l}{\left({\mathcal{E}}^{k}u\right)}_{\complement L}}{\partial {a}_{L}}$ can be written as a linear combination of $\frac{{\partial }^{k+l}{u}_{M}}{\partial {a}_{\complement M}}$ . Up to the factor ${\left(\genfrac{}{}{0pt}{}{k+l}{l}\right)}^{-1}$ , the linear mapping that takes the formal vector ${\left(\frac{{\partial }^{k+l}{u}_{M}}{\partial {a}_{\complement M}}\right)}_{M}$ indexed by all M ⊂ {1, ..., 2l + k}, $\left\vert M\right\vert =l$ to the formal vector ${\left(\frac{{\partial }^{l}{\left({\mathcal{E}}^{k}u\right)}_{\complement L}}{\partial {a}_{L}}\right)}_{L}$ indexed by all L ⊂ {1, ..., 2l + k}, $\left\vert L\right\vert =l$ is corresponding to the multiplication with the adjacency matrix of the Kneser graph K_2l+k,l, see, for instance, [19] for a definition. The latter is regular, which can, for instance, be seen by looking at its eigenvalues which are known to be

$\begin{equation*}{\lambda }_{m}={\left(-1\right)}^{m}\left(\genfrac{}{}{0pt}{}{k+l-m}{l-m}\right),\quad m=0,\dots ,l,\end{equation*}$

see again [19]. Thus, we can find real numbers ${\left({c}_{L}\right)}_{L}$ indexed by all L ⊂ {1, ..., 2k + l}, $\left\vert L\right\vert =l$ and independent from u, and a₁, ..., a_2l+k such that for M = {k + l + 1, ..., 2l + k}, the identity

$\begin{equation*}\frac{{\partial }^{k+l}{u}_{M}}{\partial {a}_{\complement M}}=\sum _{\begin{matrix}\hfill L\subset \left\{1,\dots ,2l+k\right\},\hfill \\ \hfill \left\vert L\right\vert =l,\enspace L\cap M=\varnothing\hfill \end{matrix}}{c}_{L}\frac{{\partial }^{l}{\left({\mathcal{E}}^{k}u\right)}_{\complement L}}{\partial {a}_{L}}\end{equation*}$

holds. If ${\mathcal{E}}^{k}u=0$ , then the right-hand side is 0 while the left-hand side corresponds to ∇^k+l u(a₁, ..., a_2l+k). This completes the proof. □

Lemma A.4. Let k ⩾ 1 and Ω ⊂ R^d be a bounded Lipschitz domain. For each u ∈ BV^k(Ω) and δ > 0, there exists a u^δ ∈ BV^k(Ω) ∩ C^∞(Ω) such that for δ → 0,

$\begin{equation*}{\Vert}{u}^{\delta }-u{{\Vert}}_{1}\to 0\quad \text{and}\quad {\Vert}{\nabla }^{m}{u}^{\delta }{{\Vert}}_{\mathcal{M}}\to {\Vert}{\nabla }^{m}u{{\Vert}}_{\mathcal{M}}\quad \text{for}\enspace m=1,\dots ,k,\end{equation*}$

i.e., {u^δ} converges strictly in BV^k(Ω) to u as δ → 0.

Proof. The proof builds on the result [32, lemma 5.4] and techniques from [7, 85]. Choose a sequence of open sets {Ω_n} such that Ω = ⋃_n∈NΩ_n, ${\overline{{\Omega}}}_{n}\subset \subset {\Omega}$ for all n ∈ N and any point of Ω belongs to at most four sets Ω_n (cf [7, theorem 3.9] for a construction of such sets). Further, let {φⁿ} be a partition of unity relative to {Ω_n}, i.e., ${\varphi }^{n}\in {\mathcal{C}}_{\mathrm{c}}^{\infty }\left({{\Omega}}_{n}\right)$ with φⁿ ⩾ 0 for all n ∈ N and ${\sum }_{n=1}^{\infty }{\varphi }^{n}=1$ pointwise in Ω. Finally, let $\rho \in {\mathcal{C}}_{\mathrm{c}}^{\infty }\left({\mathbf{R}}^{d}\right)$ be a standard mollifier, i.e., ρ is radially symmetric, non-negative and satisfies ${\int }_{{\mathbf{R}}^{d}}\rho \enspace \mathrm{d}x=1$ . Denote by ρ_ɛ the function given by ρ_ɛ(x) = ɛ^−d ρ(x/ɛ) for ɛ > 0.

As ρ is a mollifier and φⁿ has compact support in Ω_n, we can find, for any n ∈ N, an ɛ_n > 0 such that $\mathrm{s}\mathrm{u}\mathrm{p}\mathrm{p}\left(\left(v{\varphi }^{n}\right){\ast}{\rho }_{{\varepsilon }_{n}}\right)\subset {{\Omega}}_{n}$ for any v ∈ BD(Ω, Sym^l(Ω)), l ∈ N. Further, as shown in [32, lemma 5.4], for any v ∈ BD(Ω, Sym^l(Ω)) fixed, for any δ > 0 we can pick a sequence $\left\{{\varepsilon }_{n}^{\delta }\right\}$ with each ${\varepsilon }_{n}^{\delta }$ being small enough such that with ${v}^{\delta }={\sum }_{n=1}^{\infty }\left(v{\varphi }^{n}\right){\ast}{\rho }_{{\varepsilon }_{n}^{\delta }}$ , we have

$\begin{equation*}{\Vert}{v}^{\delta }-v{{\Vert}}_{1}{\leqslant}\delta \quad \text{and}\quad {\Vert}\mathcal{E}{v}^{\delta }{{\Vert}}_{\mathcal{M}}{\leqslant}{\Vert}\mathcal{E}v{{\Vert}}_{\mathcal{M}}+\delta .\end{equation*}$

In particular, for u ∈ BV^k(Ω) fixed and v_l = ∇^l u ∈ BD^k−l(Ω, Sym^l(R^d)) for l = 0, ..., k − 1, we can pick a sequence $\left\{{\varepsilon }_{n}^{\delta }\right\}$ with each component small enough such that

$\begin{equation}{\Vert}{v}_{l}^{\delta }-{v}_{l}{{\Vert}}_{1}{\leqslant}\delta \quad \text{and}\quad {\Vert}\mathcal{E}{v}_{l}^{\delta }{{\Vert}}_{\mathcal{M}}{\leqslant}{\Vert}\nabla \otimes {v}_{l}{{\Vert}}_{\mathcal{M}}+\delta \quad \text{for}\enspace l=0,\dots ,k-1,\end{equation} \tag{ A.3 }$

since $\nabla \otimes {v}_{l}=\mathcal{E}{v}_{l}$ . Further we note that, as additional consequence of the Sobolev–Korn inequality of theorem 3.18, v_l ∈ H^k−1−l,1(Ω, Sym^l(R^d)) and hence by the product rule $\mathcal{E}\left({v}_{l-1}{\varphi }^{n}\right)=\left(\mathcal{E}{v}_{l-1}\right){\varphi }^{n}+\vert \vert \vert \left({v}_{l-1}\otimes \nabla {\varphi }^{n}\right)$ , we get

$\begin{equation*}\mathcal{E}{v}_{l-1}^{\delta }-{v}_{l}^{\delta }=\sum _{n=1}^{\infty }\mathcal{E}\left({v}_{l-1}{\varphi }^{n}\right){\ast}{\rho }_{{\varepsilon }_{n}^{\delta }}-\left({v}_{l}{\varphi }^{n}\right){\ast}{\rho }_{{\varepsilon }_{n}^{\delta }}=\sum _{n=1}^{\infty }\vert \vert \vert \left({v}_{l-1}\otimes \nabla {\varphi }^{n}\right){\ast}{\rho }_{{\varepsilon }_{n}^{\delta }}.\end{equation*}$

In addition,

$\begin{equation*}\sum _{n=1}^{\infty }\vert \vert \vert \left({v}_{l-1}\otimes \nabla {\varphi }^{n}\right)=\vert \vert \vert \left({v}_{l-1}\otimes \nabla \left(\sum _{n=1}^{\infty }{\varphi }^{n}\right)\right)=0.\end{equation*}$

Since each | | |(v_l−1 ⊗ ∇φⁿ) ∈ H^k−l,1(Ω, Sym^l(R^d)), by adaptation of standard mollification results [85, theorem 5.2.2] we can further reduce any ${\varepsilon }_{n}^{\delta }$ to be small enough such that for each m = 1, ..., k,

$\begin{equation*}{{\Vert}{\mathcal{E}}^{m-l}\left(\vert \vert \vert \left({v}_{l-1}\otimes \nabla {\varphi }^{n}\right){\ast}{\rho }_{{\varepsilon }_{n}^{\delta }}-\vert \vert \vert \left({v}_{l-1}\otimes \nabla {\varphi }^{n}\right)\right){\Vert}}_{1}{\leqslant}{2}^{-n}\delta \quad \text{for}\enspace l=1,\dots ,m-1\end{equation*}$

and consequently,

$\begin{equation*}{{\Vert}{\mathcal{E}}^{m-l}\left(\mathcal{E}{v}_{l-1}^{\delta }-{v}_{l}^{\delta }\right){\Vert}}_{1}={{\Vert}{\mathcal{E}}^{m-l}\left(\sum _{n=1}^{\infty }\vert \vert \vert \left({v}_{l-1}\otimes \nabla {\varphi }^{n}\right){\ast}{\rho }_{{\varepsilon }_{n}^{\delta }}-\vert \vert \vert \left({v}_{l-1}\otimes \nabla {\varphi }^{n}\right)\right){\Vert}}_{1}{\leqslant}\delta .\end{equation*}$

Now, setting ${u}^{\delta }={v}_{0}^{\delta }$ , we estimate for m = 1, ..., k, using the second estimate in (A.3) and that $\mathcal{E}{v}_{m-1}={\nabla }^{m}u$ as well as ${\mathcal{E}}^{m}{u}^{\delta }={\nabla }^{m}{u}^{\delta }$ ,

$\begin{align*}\hfill {\Vert}{\nabla }^{m}{u}^{\delta }{{\Vert}}_{\mathcal{M}}& ={{\Vert}\left(\sum _{l=1}^{m-1}{\mathcal{E}}^{m-l}\left(\mathcal{E}{v}_{l-1}^{\delta }-{v}_{l}^{\delta }\right)\right),+,\mathcal{E},{v}_{m-1}^{\delta }{\Vert}}_{\mathcal{M}}\hfill \\ \hfill & {\leqslant}\left(\sum _{l=1}^{m-1}{\Vert}{\mathcal{E}}^{m-l}\left(\mathcal{E}{v}_{l-1}^{\delta }-{v}_{l}^{\delta }\right){{\Vert}}_{1}\right)+{\Vert}\mathcal{E}{v}_{m-1}^{\delta }{{\Vert}}_{\mathcal{M}}{\leqslant}m\delta +{\Vert}{\nabla }^{m}u{{\Vert}}_{\mathcal{M}}.\hfill \end{align*}$

This shows in particular that u^δ ∈ BV^k(Ω) and by construction, ${u}^{\delta }\in {\mathcal{C}}^{\infty }\left({\Omega}\right)$ . Taking the limit δ → 0 and using the lower semi-continuity of TV^m, we finally obtain

$\begin{equation*}{\Vert}{\nabla }^{m}{u}^{\delta }{{\Vert}}_{\mathcal{M}}\to {\Vert}{\nabla }^{m}u{{\Vert}}_{\mathcal{M}}\end{equation*}$

for m = 1, ..., k which, together with the first estimate in (A.3), implies the assertion. □

Higher-order total variation approaches and generalisations

Article metrics

Submit

Share this article

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction

2. Total-variation (TV) regularisation

2.1. Functions of bounded variation

2.2. Tikhonov regularisation

2.3. Further first-order approaches

2.4. Colour and multichannel images

3. Higher-order TV regularisation

3.1. Symmetric tensor calculus

3.2. Functions of higher-order bounded variation

3.3. Tikhonov regularisation

4. Combined approaches

4.1. Additive multi-order regularisation

4.2. Multi-order infimal convolution

5. Total generalised variation (TGV)

5.1. Basic concepts

5.2. Functional analytic and regularisation properties

5.3. Extensions

6. Numerical algorithms

6.1. Discretisation of higher-order TV functionals

6.2. A general saddle-point framework

6.3. Implicit and preconditioned optimisation methods

7. Applications in image processing and computer vision

7.1. Image denoising and deblurring

7.2. Compressed sensing

7.3. Optical flow and stereo estimation

7.4. Image and video decompression

8. Applications in medical imaging and image reconstruction

8.1. Denoising of dual-energy computed-tomography (CT) data

8.2. Parallel reconstruction in magnetic resonance imaging

8.3. Diffusion tensor imaging

8.4. Quantitative susceptibility mapping

8.5. Dynamic MRI reconstruction

8.6. Joint MR-PET reconstruction

8.7. Radon inversion for multi-channel electron microscopy

9. Conclusions

Acknowledgments

Appendix A.: Additional proofs