Blind image fusion for hyperspectral imaging with the directional total variation

Leon Bungert; David A Coomes; Matthias J Ehrhardt; Jennifer Rasch; Rafael Reisenhofer; Carola-Bibiane Schönlieb

doi:10.1088/1361-6420/aaaf63

1. Introduction

Hyperspectral imaging is an earth observation technique that measures the energy of light reflected off the earth's surface within many narrow spectral bands, from which reflectance curves can be calculated. The shape of these reflectance curves provides information about the chemical and physical properties of materials such as minerals in rocks, vegetation, synthetic materials and water, which allows these materials to be classified and mapped remotely (see [1, 2], for instance).

Usually, hyperspectral images of the earth are being recorded by imaging devices that are mounted on planes or satellites. In either case, the speed of the carrier and narrowness of the spectral bands result in limited energy reaching the sensor for each spectral band. Therefore, most hyperspectral cameras have an intrinsic trade-off between spatial and spectral resolution [3, 4]. A standard approach to overcoming this trade-off is to record a second image with low spectral but high spatial resolution (e.g. a photograph or a laser scan) [5] and to fuse these data together in order to create an image that has the high spectral resolution of the hyperspectral image and the high spatial resolution of the photograph [3–13]. These techniques have become known as pansharpening or image fusion. The problem is illustrated by three example data sets in figure 1.

**Figure 1.** Three example data for image fusion in remote sensing. They each consist of a hyperspectral image (small image, only one channel shown) and an image of higher spatial resolution (large image). The goal is to create an image that has both high spatial and high spectral resolution.
Download figure:
Standard image High-resolution image

Most image fusion techniques are based on certain assumptions regarding the data and the corresponding imaging devices. First, it is often assumed that the image of high spatial resolution is a linear combination of the spectral channels with known weights [4]. Second, the loss of resolution is usually modeled as a linear operator which consists of a subsampled convolution with known kernel (point spread function). While both assumptions may be justified in some applications, it may be difficult to measure or estimate the weights and the convolution kernel in a practical situation.

Instead of having accurate descriptions or measurements about the system response of our cameras [4, 6, 12], we make use of the fact that both images are acquired from the same scenery. While the actual image intensities will depend on the wavelengths that are being recorded and are therefore dissimilar, the geometric structure of those images is likely to be very similar as they are both taken from the same terrain [6, 7, 9, 12]. This approach has been studied extensively in the context of medical imaging where traditionally an anatomical imaging modality with high spatial resolution is used to aid the reconstruction of a functional imaging modality, see e.g. [14–20].

Note that the hyperspectral data image and the high-resolution photograph usually undergo a georectification procedure which aligns them using a digital elevation map of the study area. However, this adjustment is often imprecise, particularly in topographically complex regions, so further steps are required to co-register the images precisely [21]. Moreover, having a good description of the point spread function of the system is very unnatural. Thus, we take a blind approach and estimate the point spread function within our reconstruction and thereby learn this information from the recorded data. This approach is intrinsically related to blind deconvolution [22–25]. Figure 2 shows an example of blind versus non-blind image fusion applied to the real remote sensing data with unknown blurring kernel. It can be seen that the shape of the kernel that is estimated during the reconstruction clearly differs from the fixed kernel. Equally important is the translation of the estimated kernel, which can compensate for shifts between the given image pair. To the best of our knowledge this is the first contribution where structural side information / anatomical priors are combined with registration which removes the unnatural assumption of perfectly registered images that is usually made, see [14–20], for instance.

**Figure 2.** Comparison of *non-blind* versus *blind* image fusion. From left to right: hyperspectral image (one channel, four times enlarged), result of non-blind and blind image fusion together with the kernels that are used or estimated, respectively. The plotted kernels are of size $41\times 41$ and have been enlarged for better visibility. The dotted green lines highlight the center of mass of the obtained kernel and indicate the amount of translational displacement between the fused images.
Download figure:
Standard image High-resolution image

**Figure 2.** Comparison of *non-blind* versus *blind* image fusion. From left to right: hyperspectral image (one channel, four times enlarged), result of non-blind and blind image fusion together with the kernels that are used or estimated, respectively. The plotted kernels are of size $41\times 41$ and have been enlarged for better visibility. The dotted green lines highlight the center of mass of the obtained kernel and indicate the amount of translational displacement between the fused images.
Download figure:
Standard image High-resolution image

2. Mathematical model and notation

In this work we consider the problem of simultaneous image fusion and blind image deconvolution which can be cast as the optimization problem

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\mA}{{\bf A}} \newcommand{\aceKernel}{K} \newcommand{\aceLarge}{U} \newcommand{\Rk}{\RegFun_k} \newcommand{\RegFun}{{\mathcal R}} \newcommand{\Ru}{\RegFun_u} \newcommand{\R}{{\mathbb R}} \newcommand{\Argmin}[2]{{\rm arg} \mathop{\min}\limits_{#1} \;#2} \displaystyle \label{EQ:BID} (u^\ast, k^\ast) \in \Argmin{(u, k)\in \aceLarge\times\aceKernel}{\left\{\frac 12 \Vert \mA_k u - f\Vert ^2 + \Ru(u) + \Rk(k)\right\}} \nonumber \end{align} \tag{ 1 }$

where the forward operator $\newcommand{\mA}{{\bf A}} \mA_k$ is a sampled blur with kernel k. Thus, it relates an image $\newcommand{\R}{{\mathbb R}} \newcommand{\aceLarge}{U} u \in \aceLarge := \R^m$ of size $m = (m_1, m_2)$ to blurred and subsampled data $\newcommand{\R}{{\mathbb R}} f \in \R^n, n = (n_1, n_2)$ . Note that we used multi-index notation which means $\newcommand{\R}{{\mathbb R}} \R^m = \R^{m_1 \times m_2}$ , for instance. Here, we consider the case of super-resolution n < m (true if and only if $n_1 < m_1$ and $n_2 < m_2$ ). The data discrepancy is measured in terms of the Euclidean / Frobenius norm $\Vert x\Vert ^2 := \sum\nolimits_{i} x_i^2$ and the optimal solution is regularized through the penalty functionals $\newcommand{\R}{{\mathbb R}} \newcommand{\RegFun}{{\mathcal R}} \newcommand{\Ru}{\RegFun_u} \Ru$ and $\newcommand{\R}{{\mathbb R}} \newcommand{\RegFun}{{\mathcal R}} \newcommand{\Rk}{\RegFun_k} \Rk$ .

Since—up to now—the ingredients of our model are only vaguely defined, the remainder of this section is devoted to an detailed explanation of the operators and regularization functionals involved.

2.1. The forward operator

The forward operator $\newcommand{\mA}{{\bf A}} \newcommand{\mB}{{\bf B}} \newcommand{\mC}{{\bf C}} \newcommand{\mS}{{\bf S}} \newcommand{\De}{:=} \mA_k \De \mS \circ \mB \circ \mC_{k}$ is the composition of three operators: $\newcommand{\mC}{{\bf C}} \mC_k$ represents a convolution with a kernel k, $\newcommand{\mB}{{\bf B}} \mB$ performs boundary treatment and $\newcommand{\mS}{{\bf S}} \mS$ is a sampling operator.

The convolution operator $\newcommand{\aceLarge}{U} \newcommand{\mC}{{\bf C}} \mC_k: \aceLarge \to \aceLarge$ with kernel $\newcommand{\R}{{\mathbb R}} \newcommand{\aceKernel}{K} k \in \aceKernel := \R^r, r := (r_1, r_2)$ is defined by

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\De}{:=} \newcommand{\mJ}{{\bf J}} \newcommand{\mC}{{\bf C}} \displaystyle \mC_k u \De \mJ k \ast u \nonumber \end{align} \tag{ 2 }$

where * is the cyclic convolution of images in $\newcommand{\R}{{\mathbb R}} \newcommand{\aceLarge}{U} \aceLarge = \R^m$ . Since the kernel k is generally assumed to have smaller support than the image ${u}$ , i.e. r < m, we introduced the embedding operator $\newcommand{\aceLarge}{U} \newcommand{\aceKernel}{K} \newcommand{\mJ}{{\bf J}} \mJ : \aceKernel \rightarrow \aceLarge$ which embeds the 'small' convolution kernel $\newcommand{\aceKernel}{K} k \in \aceKernel$ into the image space $\newcommand{\aceLarge}{U} \aceLarge$ by padding it with zeros. By assuming periodic boundary conditions on u and $\newcommand{\mJ}{{\bf J}} \mJ k$ , the convolution can be efficiently computed via the discrete Fourier transform $\newcommand{\mF}{{\bf F}} \newcommand{\Fourier}{\mF} \Fourier$ , i.e. $\newcommand{\mF}{{\bf F}} \newcommand{\Fourier}{\mF} \newcommand{\mC}{{\bf C}} \newcommand{\mJ}{{\bf J}} \mC_k u = \Fourier^{-1}\left(\Fourier \mJ k \odot \Fourier u\right)$ where $\odot$ denotes the Hadamard product (pointwise multiplication). With a slight abuse of notation, we can exploit the symmetry of the convolution to infer

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\mJ}{{\bf J}} \newcommand{\mC}{{\bf C}} \displaystyle \mC_k u = \mJ k \ast u = u \ast \mJ k = \mC_u \mJ k \, , \nonumber \end{align} \tag{ 3 }$

and thus

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\mJ}{{\bf J}} \newcommand{\mA}{{\bf A}} \displaystyle \label{EQ:symmetry} \mA_k u = \mA_u \mJ k \, , \nonumber \end{align} \tag{ 4 }$

which will prove helpful later-on.

While periodic boundary conditions are computationally useful, any kind of boundary treatment is artificial and may result in artifacts. For instance, boundary artifacts are expected in the boundary region with margins $\newcommand{\De}{:=} l \De (r-1)/2$ where r is the diameter of the convolution kernel. To avoid these artefacts, we follow the boundary-layer approach (see [26] and references therein; also described by Almeida and Figueiredo in [27]) and introduce a boundary clipping (margin deletion) operator $\newcommand{\R}{{\mathbb R}} \newcommand{\aceLarge}{U} \newcommand{\mB}{{\bf B}} \mB:\aceLarge \to \R^{m-2l}$ ,

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\De}{:=} \newcommand{\mB}{{\bf B}} \displaystyle (\mB u)_i \De u_{i + l} \nonumber \end{align} \tag{ 5 }$

that maps a large image u to a meaningful image $\newcommand{\R}{{\mathbb R}} \newcommand{\mB}{{\bf B}} \mB u \in \R^{m-2l}$ which is independent of the type of boundary conditions used for the precedent convolution. Thus, after computation of an optimal solution $u^\ast$ of (1) we only use the meaningful part $\newcommand{\mB}{{\bf B}} \mB u^\ast$ .

Finally, the third component of our forward operator is the sampling $\newcommand{\R}{{\mathbb R}} \newcommand{\mS}{{\bf S}} \mS:\R^{m-2l}\to\R^n$ which mimics an integration sensor of size $s \times s$ , i.e.

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\De}{:=} \newcommand{\mS}{{\bf S}} \displaystyle (\mS u)_i \De \frac{1}{s^2}\sum_{j \in \Delta_i} u_j \nonumber \end{align} \tag{ 6 }$

where the detector indices are given by the set $\newcommand{\De}{:=} \Delta_i := \{\,j = (\,j_1, j_2) \, \vert \, i \leqslant j < i + s \}$ , which contains s² elements. Consequently, for a kernel of odd dimensions the relation m − 2l = sn connects the sizes of kernel, image and data.

Let us now turn to the regularization and prior knowledge on the image u and the kernel k.

2.2. Regularization functionals

2.2.1. Image regularization.

A popular prior to regularize images is the total variation [28] as it allows the recovery of smooth images whilst preserving edges. It is defined as the global 1-norm of the discretized gradient

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\De}{:=} \newcommand{\grad}{\nabla} \newcommand{\tv}{{\rm TV}} \displaystyle \label{EQ:TV} \tv(u) \De \sum_i\Vert \grad u_i\Vert \, . \nonumber \end{align} \tag{ 7 }$

Due to the periodic boundary conditions needed for the forward operator, the discretized gradient (with forward differences) $\newcommand{\grad}{\nabla} \newcommand{\aceLarge}{U} \grad u \in \aceLarge^2$ of an image $\newcommand{\aceLarge}{U} u \in \aceLarge$ is

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\De}{:=} \newcommand{\grad}{\nabla} \displaystyle (\grad u_i)_j \De u_{{\rm mod}(i+e_j, m)}-u_i \, ,\quad j\in\{1,2\}, \nonumber \end{align} \tag{ 8 }$

where e₁ = (1,0) and e₂ = (0,1) are the standard unit vectors in $\newcommand{\R}{{\mathbb R}} \R^2$ .

In the task at hand, the high-resolution photograph can be viewed as a source of additional a priori knowledge about the sought solution. As both the hyperspectral data and the high-resolution photograph depict the same scenery, it is reasonable to assume that they show the similar geometrical structures albeit having very different intensities. However, the TV functional (7) is not able to incorporate this kind of additional prior information. Instead, we will apply the so-called directional TV (dTV), which can fuse the structural information of a high-resolution image with the intensities of a low-resolution channel of a hyperspectral image. Figure 3 provides a visual comparison of reconstructions obtained from applying the TV and the dTV functional, respectively.

**Figure 3.** Comparison of total variation ( $\newcommand{\tv}{{\rm TV}} \tv$ ) versus directional total variation ( $\newcommand{\dtv}{{\rm dTV}} \dtv$ ) regularization for image fusion. From left to right: hyperspectral image (one channel, four times enlarged), results of $\newcommand{\tv}{{\rm TV}} \tv$ and $\newcommand{\dtv}{{\rm dTV}} \dtv$ together with the estimated kernels.
Download figure:
Standard image High-resolution image

$ \newcommand{\tv}{{\rm TV}} \tv$ — **Figure 3.** Comparison of total variation ( $\newcommand{\tv}{{\rm TV}} \tv$ ) versus directional total variation ( $\newcommand{\dtv}{{\rm dTV}} \dtv$ ) regularization for image fusion. From left to right: hyperspectral image (one channel, four times enlarged), results of $\newcommand{\tv}{{\rm TV}} \tv$ and $\newcommand{\dtv}{{\rm dTV}} \dtv$ together with the estimated kernels.
Download figure:
Standard image High-resolution image

Definition (Directional total variation [18]).

definition Let $\newcommand{\aceLarge}{U} \xi \in \aceLarge^2, \Vert \xi_i\Vert \leqslant \gamma < 1$ be a vector-field that determines directions and denote by $\newcommand{\aceLarge}{U} \newcommand{\mI}{{\bf I}} \newcommand{\mP}{{\bf P}} \mP \in \aceLarge^{2 \times 2}, \mP_i := \mI-\xi_i \otimes \xi_i$ an associated matrix-field where $\newcommand{\mI}{{\bf I}} \mI$ is the $2\times 2$ identity matrix and $\otimes$ marks the outer product of vectors. Then the directional total variation $\newcommand{\R}{{\mathbb R}} \newcommand{\dtv}{{\rm dTV}} \newcommand{\aceLarge}{U} \dtv : \aceLarge \to \R$ is defined as as

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\De}{:=} \newcommand{\mP}{{\bf P}} \newcommand{\grad}{\nabla} \newcommand{\dtv}{{\rm dTV}} \displaystyle \label{EQ:dTV} \dtv(u) \De \sum_i \Vert \mP_i \grad u_i \Vert \, . \nonumber \end{align} \tag{ 9 }$

Note that due to the linearity of $\newcommand{\mP}{{\bf P}} \mP_i$ the directional total variation is convex. Let us shortly comment on the interpretation of the directional total variation. The vector-field ξ allows the definition of directions that should be less penalized and therefore get promoted. It is easy to see that the quantity $\newcommand{\grad}{\nabla} \newcommand{\mP}{{\bf P}} \mP_i \grad u_i$ can be expanded as

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\mP}{{\bf P}} \newcommand{\grad}{\nabla} \displaystyle \mP_i \grad u_i = \grad u_i - \langle \xi_i, \grad u_i \rangle \xi_i \nonumber \end{align} \tag{ 10 }$

where $\langle \cdot, \cdot \rangle$ denotes the inner product on $\newcommand{\R}{{\mathbb R}} \R^2$ . This expression reduces to $\newcommand{\grad}{\nabla} (1-\Vert \xi_i\Vert ^2)\grad u_i$ in regions where $\newcommand{\grad}{\nabla} \grad u_i$ is collinear to $\xi_i$ , and to $\newcommand{\grad}{\nabla} \grad u_i$ where $\newcommand{\grad}{\nabla} \grad u_i$ is orthogonal to $\xi_i$ . Thus, gradients that are aligned / collinear $\xi_i$ are favored as long as $\Vert \xi_i\Vert > 0$ . In the extreme case $\Vert \xi_i\Vert = 1$ , aligned gradients are not penalized any more. It is important to note, that this prior does not enforce gradients in the direction of $\xi_i$ , as a gradient in the direction of $\xi_i$ is never 'cheaper' than a zero gradient.

The uniform upper bound $\gamma < 1$ ensures that the directional total variation is equivalent to the total variation in a semi-norm sense, i.e.

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\dtv}{{\rm dTV}} \newcommand{\tv}{{\rm TV}} \displaystyle (1-\gamma^2) \tv(u) \leqslant \dtv(u) \leqslant \tv(u) \, . \nonumber \end{align} \tag{ 11 }$

Similar versions of the directional total variation have been used before [17, 19, 29] and it is related to other anisotropic priors [14, 30, 31] and the notion of parallel level sets [32–34]. Note that the directional total variation generalizes the usual total variation (7) for $\xi = 0$ and other versions of the directional total variation [35, 36] where the direction ξ is constant and not depending in the pixel location i. With the isotropic choice $\newcommand{\mI}{{\bf I}} \newcommand{\mP}{{\bf P}} \mP_i = w_i \mI, \, 0 \leqslant w_i \leqslant 1$ it reduces to weighted total variation [18, 29, 37–39].

Having this definition of the directional total variation at hand, the question is how to define the vector-field $\newcommand{\aceLarge}{U} \xi \in \aceLarge^2$ . As we want to favor images having a similar structure as the high-resolution photograph, we let $\newcommand{\aceLarge}{U} v\in\aceLarge$ denote a gray-scale image which is generated from the color high-resolution image and define the vector-field

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\De}{:=} \newcommand{\grad}{\nabla} \displaystyle \label{EQ:vectorfield} \xi_i \De \gamma\frac{\grad v_i}{\Vert {\grad v}_i\Vert _\varepsilon} \, , \nonumber \end{align} \tag{ 12 }$

where $\newcommand{\De}{:=} \Vert x\Vert _\varepsilon \De \sqrt{\Vert x\Vert ^2+\varepsilon^2}$ and the scaling $\gamma \in [0, 1)$ ensures that $\Vert \xi_i\Vert \leqslant \gamma$ . The parameter $\varepsilon > 0$ guarantees that the vector-field ξ is well-defined, even if $\newcommand{\grad}{\nabla} \grad v_i = 0$ . Moreover, it allows to control which gradient magnitudes $\newcommand{\grad}{\nabla} \Vert \grad v_i\Vert$ should be considered meaningful and which are merely due to noise. Thus, both parameters ε and γ in general will depend on the given side information v but can be chosen without considering a particular data set f. Note that in practice, one chooses γ very close or even equal to 1 (see [18] for the latter) such that the side information image has sufficiently much influence. Extensive studies of this parameter are performed in [36] where the authors obtain their best results for small values of $\newcommand{\De}{:=} a\De1-\gamma^2$ . An example photograph, its gray scale version and the corresponding vector-field are shown in figure 4.

**Figure 4.** Side information for directional total variation. The side information is given as an RGB image (left) and converted into gray scales (center) from which a vector-field may be computed (right), e.g. with (12) (shown for $\gamma = 1, \varepsilon = 0.3$ ). The close-ups show that the vector-field does not only carry detailed information about edge *locations* but also about edge *directions*.
Download figure:
Standard image High-resolution image

**Figure 4.** Side information for directional total variation. The side information is given as an RGB image (left) and converted into gray scales (center) from which a vector-field may be computed (right), e.g. with (12) (shown for $\gamma = 1, \varepsilon = 0.3$ ). The close-ups show that the vector-field does not only carry detailed information about edge *locations* but also about edge *directions*.
Download figure:
Standard image High-resolution image

In addition to the structural a priori knowledge given by the photograph and encoded in the directional total variation, it is reasonable to assume that the solution we are looking for is non-negative. Thus, the regularization functional for the image becomes

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\De}{:=} \newcommand{\RegParam}{\lambda} \newcommand{\Ru}{\RegFun_u} \newcommand{\RegFun}{{\mathcal R}} \newcommand{\dtv}{{\rm dTV}} \newcommand{\R}{{\mathbb R}} \displaystyle \label{EQ:Reg_u} \Ru \De \RegParam_u \dtv + \imath_{[0,\infty)^m} \nonumber \end{align} \tag{ 13 }$

where the characteristic function of the non-negative quadrant is defined as

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\De}{:=} \displaystyle \label{EQ:char_pos_quadr} \imath_{[0,\infty)^m}(u) \De \left\{\begin{array}{@{}ll@{}} 0 & {\rm if}~ u \in [0,\infty)^m \nonumber \\ \infty & {\rm else} \end{array}\right. \, . \nonumber \end{align} \tag{ 14 }$

The regularization parameter for the image $\newcommand{\R}{{\mathbb R}} \newcommand{\RegParam}{\lambda} \RegParam_u>0$ controls the trade-off between regularity and data fidelity.

2.3. Kernel regularization

Similarly as for the image u, it is reasonable to assume that the kernel is non-negative and regular. Motion blurs that arise from ideal diffraction-free lenses have typically a compactly supported kernel with sharp cut-off boundaries [26, p 234]. Thus, the desired regularity of the kernel may be achieved by utilizing the total variation. Of course, other smoothness priors such as the H¹-semi-norm or total generalized variation [40] are possible, too, and will be subject of future work. In addition, blind deconvolution has the intrinsic non-uniqueness in the problem formulation that does not allow to estimate the correct global scaling factors for both the kernel and the image. To circumvent this issue, a common choice (e.g. [24–26, 41]) is to normalize the kernel with respect to the 1-norm. Thus, together with the non-negativity, we assume that the kernel is in the unit simplex

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\De}{:=} \newcommand{\Simplex}{{\mathbb S}} \newcommand{\aceKernel}{K} \displaystyle \Simplex \De \left\{k \in \aceKernel \, : \, k_i \geqslant 0, \, \sum_i k_i = 1\right\} \, . \nonumber \end{align} \tag{ 15 }$

Letting $\newcommand{\Simplex}{{\mathbb S}} \imath_{\Simplex}$ denote the characteristic function of the unit simplex, defined analogously to (14), the regularization functional for the kernel becomes

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\De}{:=} \newcommand{\Simplex}{{\mathbb S}} \newcommand{\RegParam}{\lambda} \newcommand{\Rk}{\RegFun_k} \newcommand{\RegFun}{{\mathcal R}} \newcommand{\tv}{{\rm TV}} \newcommand{\R}{{\mathbb R}} \displaystyle \label{EQ:Reg_k} \Rk \De \RegParam_k \tv + \imath_{\Simplex} \nonumber \end{align} \tag{ 16 }$

where we again can trade-off regularity and data fidelity by the regularization parameter $\newcommand{\R}{{\mathbb R}} \newcommand{\RegParam}{\lambda} \RegParam_k>0$ which in general needs to be chosen differently than the regularization parameter for the image $\newcommand{\R}{{\mathbb R}} \newcommand{\RegParam}{\lambda} \RegParam_u$ .

3. Algorithms

Given the abstract mathematical model (1) which models the solution that we are after, this section discusses algorithms that will attempt to numerically compute this solution. For a concise presentation, we cast problem (1) in the form

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\Obj}{\Psi} \newcommand{\Argmin}[2]{{\rm arg} \mathop{\min}\limits_{#1} \;#2} \displaystyle \label{EQ:BID_short} (u^\ast, k^\ast) \in \Argmin{u, k}{\Obj(u,k)} \nonumber \end{align} \tag{ 17 }$

by defining $\newcommand{\R}{{\mathbb R}} \newcommand{\Obj}{\Psi} \newcommand{\DataFid}{\mathcal{D}} \newcommand{\Ru}{\RegFun_u} \newcommand{\RegFun}{{\mathcal R}} \newcommand{\Rk}{\RegFun_k} \newcommand{\De}{:=} \Obj(u, k)\De \DataFid(u, k) + \Ru(u)+\Rk(k)$ and abbreviating the data fidelity term in (1) by $\newcommand{\DataFid}{\mathcal{D}} \newcommand{\mA}{{\bf A}} \newcommand{\De}{:=} \DataFid(u, k)\De\frac{1}{2}\Vert \mA_ku-f\Vert ^2$ . While the function $\newcommand{\Obj}{\Psi} \Obj$ is not convex in the joint variable $(u, k)$ , it is bi-convex, i.e. it is convex in each of the variables u and k. The latter holds true as the regularization functions $\newcommand{\R}{{\mathbb R}} \newcommand{\RegFun}{{\mathcal R}} \newcommand{\Ru}{\RegFun_u} \Ru$ and $\newcommand{\R}{{\mathbb R}} \newcommand{\RegFun}{{\mathcal R}} \newcommand{\Rk}{\RegFun_k} \Rk$ are convex and the data fidelity $\newcommand{\DataFid}{\mathcal{D}} \DataFid$ is bi-convex.

An abstract algorithm for finding solutions of (17) is the proximal alternating minimization (PAM) algorithm [42]. Given an image-kernel pair $(u, k)$ and step sizes $\tau_u, \tau_k > 0$ , it computes the next iterate $(u^+, k^+)$ by the update scheme

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\Obj}{\Psi} \newcommand{\Argmin}[2]{{\rm arg} \mathop{\min}\limits_{#1} \;#2} \displaystyle u^+ = \Argmin{x}{\left\lbrace \frac12 \Vert x - u\Vert ^2 + \tau_u \Obj(x, k) \right\rbrace} \, , \nonumber \end{align} \tag{ 18a }$

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\Obj}{\Psi} \newcommand{\Argmin}[2]{{\rm arg} \mathop{\min}\limits_{#1} \;#2} \displaystyle k^+ = \Argmin{x}{\left\lbrace \frac12 \Vert x - k\Vert ^2 + \tau_k \Obj(u^+, x) \right\rbrace} \, . \nonumber \end{align} \tag{ 18b }$

Under reasonably mild conditions on the regularity of $\newcommand{\Obj}{\Psi} \Obj$ (see [42] for more details), PAM monotonously decreases the objective and its iterates converge to a critical point of $\newcommand{\Obj}{\Psi} \Obj$ . As the overall problem is non-convex, we cannot hope to find global solutions with this strategy.

3.1. Proximal alternating linearized minimization (PALM)

In addition, we may exploit the fact that the data term $\newcommand{\DataFid}{\mathcal{D}} \DataFid$ is continuously differentiable with respect to u and k and that it is not very difficult to compute the proximal operators with respect to the regularization functionals $\newcommand{\R}{{\mathbb R}} \newcommand{\RegFun}{{\mathcal R}} \newcommand{\Ru}{\RegFun_u} \Ru$ and $\newcommand{\R}{{\mathbb R}} \newcommand{\RegFun}{{\mathcal R}} \newcommand{\Rk}{\RegFun_k} \Rk$ . The proximal operator of a functional ${\mathcal R}$ is defined as the solution operator to a denoising problem with regularization term ${\mathcal R}$

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\Prox}[1]{{\rm prox}_{#1}} \newcommand{\De}{:=} \newcommand{\RegFun}{{\mathcal R}} \newcommand{\R}{{\mathbb R}} \newcommand{\Argmin}[2]{{\rm arg} \mathop{\min}\limits_{#1} \;#2} \displaystyle \label{EQ:prox} \Prox{\RegFun}(y)\De \Argmin{x}{\left\lbrace \frac12\Vert x-y\Vert ^2 + \RegFun(x)\right\rbrace} \, . \nonumber \end{align} \tag{ 19 }$

These extra information can be utilized with the proximal alternating linearized minimization (PALM) [43]. Linearizing the differentiable part $\newcommand{\DataFid}{\mathcal{D}} \DataFid$ of the objective function $\newcommand{\Obj}{\Psi} \Obj$ in the update of (18) yields the new update

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\Prox}[1]{{\rm prox}_{#1}} \newcommand{\stepsize}{\tau} \newcommand{\Ru}{\RegFun_u} \newcommand{\RegFun}{{\mathcal R}} \newcommand{\DataFid}{\mathcal{D}} \newcommand{\grad}{\nabla} \newcommand{\R}{{\mathbb R}} \displaystyle \label{EQ:PALM1} u^+ = \Prox{\stepsize_u \Ru}\Bigl(u - \stepsize_u \grad_u \DataFid(u, k) \Bigr) , \nonumber \end{align} \tag{ 20a }$

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\Prox}[1]{{\rm prox}_{#1}} \newcommand{\stepsize}{\tau} \newcommand{\Rk}{\RegFun_k} \newcommand{\RegFun}{{\mathcal R}} \newcommand{\DataFid}{\mathcal{D}} \newcommand{\grad}{\nabla} \newcommand{\R}{{\mathbb R}} \displaystyle \label{EQ:PALM2} k^+ = \Prox{\stepsize_k \Rk}\Bigl(k - \stepsize_k \grad_k \DataFid(u^+, k) \Bigr) \, . \nonumber \end{align} \tag{ 20b }$

It can be interpreted both as a linearized version of PAM and as an alternating form of forward-backward splitting [44, 45]. Note that PALM also monotonously decreases the objective and has the same convergence guarantees as long as the step size parameters are well-chosen which we will discuss later in this section.

3.2. Inertial PALM (iPALM)

Local methods for non-convex optimization may lead to critical points that are not global minimizers. The authors of [46] proposed an inertial variant of forward-backward splitting which next to the gradient direction gives inertia to steer the iterates in the previously chosen direction. They empirically showed on a simple two dimensional problem that inertia may help to avoid spurious critical points. An inertial version of PALM [41], called iPALM, follows the updates

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\InertParamA}{\alpha} \displaystyle u_\InertParamA = u + \InertParamA(u - u^-) \, , \nonumber \end{align} \tag{ 21a }$

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\Prox}[1]{{\rm prox}_{#1}} \newcommand{\InertParamA}{\alpha} \newcommand{\stepsize}{\tau} \newcommand{\Ru}{\RegFun_u} \newcommand{\RegFun}{{\mathcal R}} \newcommand{\DataFid}{\mathcal{D}} \newcommand{\grad}{\nabla} \newcommand{\R}{{\mathbb R}} \displaystyle u^+ = \Prox{\stepsize_u \Ru} \Bigl(u_\InertParamA - \tau_u \grad_u \DataFid(u_\InertParamA, k)\Bigr) \, , \nonumber \end{align} \tag{ 21b }$

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\InertParamA}{\alpha} \displaystyle k_\InertParamA = k + \InertParamA(k - k^-) \, , \nonumber \end{align} \tag{ 21c }$

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\Prox}[1]{{\rm prox}_{#1}} \newcommand{\InertParamA}{\alpha} \newcommand{\stepsize}{\tau} \newcommand{\Rk}{\RegFun_k} \newcommand{\RegFun}{{\mathcal R}} \newcommand{\DataFid}{\mathcal{D}} \newcommand{\grad}{\nabla} \newcommand{\R}{{\mathbb R}} \displaystyle k^+ = \Prox{\stepsize_k \Rk} \Bigl(k_\InertParamA - \tau_k \grad_k \DataFid(u^+, k_\InertParamA)\Bigr) \, , \nonumber \end{align} \tag{ 21d }$

with inertia parameter $\newcommand{\InertParamA}{\alpha} \InertParamA \in [0, 1)$ . Next to the current iterate $(u, k)$ it also depends on the previous iterate $(u^-, k^-)$ . We remark that PALM can be recovered by vanishing inertial parameter $\newcommand{\InertParamA}{\alpha} \InertParamA = 0$ . Also note that one may define iPALM not with one inertial parameter α but with four different ones [41]. While iPALM is not guaranteed to decrease the objective $\newcommand{\Obj}{\Psi} \Obj$ , it monotonously decreases a surrogate functional that relates to $\newcommand{\Obj}{\Psi} \Obj$ [41] and its iterates are guaranteed to converge to a critical point of $\newcommand{\Obj}{\Psi} \Obj$ for well-chosen step size and inertia parameter.

3.3. Step sizes and backtracking

The convergence of PALM and iPALM depend on the step size parameters $\tau_u, \tau_k$ . Following [41], we choose the step size parameters as

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\De}{:=} \newcommand{\InertParamA}{\alpha} \newcommand{\stepsize}{\tau} \displaystyle \label{EQ:stepsizes} \stepsize_x \De \frac{1-\InertParamA}{1 + 2\InertParamA}\frac{2}{\theta L_x} , \qquad x \in \{u, k\} \, , \nonumber \end{align} \tag{ 22 }$

where $\theta>1$ is a global constant and L_x is the local Lipschitz constant of the derivative of the data fit ${\mathcal D}$ with respect to x. Using the relation (4), its derivatives with respect to the image and kernel are

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\mA}{{\bf A}} \newcommand{\DataFid}{\mathcal{D}} \displaystyle \nabla_u \DataFid(u,k) = \mA_k^\ast(\mA_ku-f) \, , \nonumber \end{align} \tag{ 23a }$

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\mJ}{{\bf J}} \newcommand{\mA}{{\bf A}} \newcommand{\DataFid}{\mathcal{D}} \displaystyle \nabla_k \DataFid(u,k) = \mJ^\ast\mA_u^\ast(\mA_u\mJ k-f) \, \nonumber \end{align} \tag{ 23b }$

which shows that L_u will depend on k and L_k on u. While it is possible to conservatively estimate these constants, standard estimates turn out to be highly pessimistic and one can do significantly better by determining L_u and L_k with a backtracking scheme [46, 47]. The actual local Lipschitz constants L_u and L_k satisfy the descent inequalities (also called descent lemma) [41, 43, 46, 47]

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\DataFid}{\mathcal{D}} \newcommand{\grad}{\nabla} \displaystyle \DataFid(u^+, k) \leqslant \DataFid(u_\alpha, k) + \left\langle\grad_u \DataFid(u_\alpha, k), u^+ - u_\alpha\right\rangle + \frac{L_u}{2}\Vert u^+-u_\alpha\Vert ^2 \, , \nonumber \end{align} \tag{ 24a }$

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\DataFid}{\mathcal{D}} \newcommand{\grad}{\nabla} \displaystyle \DataFid(u^+, k^+) \leqslant \DataFid(u^+, k_\alpha) + \left\langle\grad_k \DataFid(u^+, k_\alpha), k^+ - k_\alpha\right\rangle + \frac{L_k}{2}\Vert k^+-k_\alpha\Vert ^2 \, . \nonumber \end{align} \tag{ 24b }$

If during the iterations these inequalities are not satisfied, we increase the Lipschitz estimates by a constant factor $\newcommand{\e}{{\rm e}} \eta > 1$ and repeat the iteration.

In addition to the descent inequality, an inexact evaluation of the proximal operators (e.g. due to finite iterations) may also hinder convergence. Thus, we will evaluate the proximal operators to a precision such that the proximal descent inequalities

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\RegFun}{{\mathcal R}} \newcommand{\InertParamA}{\alpha} \newcommand{\stepsize}{\tau} \newcommand{\DataFid}{\mathcal{D}} \newcommand{\grad}{\nabla} \newcommand{\R}{{\mathbb R}} \displaystyle \RegFun_u(u^+) \leqslant \RegFun_u(u) + \left\langle \grad_u\DataFid (u_\InertParamA, k), u - u^+\right\rangle + \frac{1}{2 \stepsize_u}\left(\Vert u - u_\InertParamA\Vert ^2 - \Vert u^+-u_\InertParamA\Vert ^2\right) \, , \nonumber \end{align} \tag{ 25a }$

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\RegFun}{{\mathcal R}} \newcommand{\InertParamA}{\alpha} \newcommand{\stepsize}{\tau} \newcommand{\DataFid}{\mathcal{D}} \newcommand{\grad}{\nabla} \newcommand{\R}{{\mathbb R}} \displaystyle \RegFun_k(k^+) \leqslant \RegFun_u(k) + \left\langle \grad_k\DataFid (u^+,k_\InertParamA), k - k^+\right\rangle + \frac{1}{2 \stepsize_k}\left(\Vert k - k_\InertParamA\Vert ^2 - \Vert k^+-k_\InertParamA\Vert ^2\right) \nonumber \end{align} \tag{ 25b }$

hold. Equations (24), (25) and the parameters $L_u, L_k$ guarantee the monotonic descent of PALM and iPALM with respect to the objective or its surrogate [41, 43].

The pseudo code for PALM and iPALM with this backtracking can be found in algorithm 1.

Algorithm 1. PALM / iPALM with backtracking. PALM is recovered by setting the inertial parameter $\newcommand{\InertParamA}{\alpha} \InertParamA = 0$ . The parameters $\newcommand{\e}{{\rm e}} \underline{L}, \overline{L}, \eta, \theta$ can all be chosen different for the image u and kernel k but for simplicity we chose not to.

Input: initial iterates $u, k$

$u, k$ , inertial parameter $\newcommand{\InertParamA}{\alpha} \InertParamA \in [0, 1)$ $\newcommand{\InertParamA}{\alpha} \InertParamA \in [0, 1)$ , initial step size parameters $L_u, L_k$

$L_u, L_k$ , constant for step size $\theta > 1$ $\theta > 1$ , bounds for step size parameters $0 < \underline{L} \leqslant \overline{L} < \infty$ $0 < \underline{L} \leqslant \overline{L} < \infty$ , backtracking constant $\newcommand{\e}{{\rm e}} \eta> 1$ $\newcommand{\e}{{\rm e}} \eta> 1$

1: function iPALM

2: $u^- \gets u, \;k^-\gets k$ $u^- \gets u, \;k^-\gets k$

3: for $\newcommand{\iter}{t} \iter=0, 1, \ldots$ $\newcommand{\iter}{t} \iter=0, 1, \ldots$ do

4: % update image u

5: $\newcommand{\InertParamA}{\alpha} u_\InertParamA \gets u + \InertParamA(u - u^-)$ $\newcommand{\InertParamA}{\alpha} u_\InertParamA \gets u + \InertParamA(u - u^-)$

6: $(u^+, L_u) \gets$ $(u^+, L_u) \gets$ backtracking( $\newcommand{\grad}{\nabla} \newcommand{\DataFid}{\mathcal{D}} \newcommand{\InertParamA}{\alpha} u_\InertParamA, \grad_u \DataFid(u_\InertParamA, k), L_u$ $\newcommand{\grad}{\nabla} \newcommand{\DataFid}{\mathcal{D}} \newcommand{\InertParamA}{\alpha} u_\InertParamA, \grad_u \DataFid(u_\InertParamA, k), L_u$ )

7: $u^-\gets u, \;u\gets u^+$ $u^-\gets u, \;u\gets u^+$

8: % update kernel k

9: $\newcommand{\InertParamA}{\alpha} k_\InertParamA \gets k + \InertParamA(k - k^-)$ $\newcommand{\InertParamA}{\alpha} k_\InertParamA \gets k + \InertParamA(k - k^-)$

10: $(k^+, L_k) \gets$ $(k^+, L_k) \gets$ backtracking( $\newcommand{\grad}{\nabla} \newcommand{\DataFid}{\mathcal{D}} \newcommand{\InertParamA}{\alpha} k_\InertParamA, \grad_k \DataFid(u^+, k_\InertParamA), L_k$ $\newcommand{\grad}{\nabla} \newcommand{\DataFid}{\mathcal{D}} \newcommand{\InertParamA}{\alpha} k_\InertParamA, \grad_k \DataFid(u^+, k_\InertParamA), L_k$ )

11: $k^-\gets k, \;k\gets k^+$ $k^-\gets k, \;k\gets k^+$

12: return $(u^+, k^+)$

$(u^+, k^+)$

13: function backtracking( $x_\alpha, g, L$ $x_\alpha, g, L$ )

14: while True do

15: % select step size

16: $\newcommand{\stepsize}{\tau} \newcommand{\InertParamA}{\alpha} \stepsize \gets \frac{1-\InertParamA}{1 + 2\InertParamA}\frac{2}{\theta L}$ $\newcommand{\stepsize}{\tau} \newcommand{\InertParamA}{\alpha} \stepsize \gets \frac{1-\InertParamA}{1 + 2\InertParamA}\frac{2}{\theta L}$

17: % forward-backward step where the proximal operator

18: % is approximated by a fixed number of warm-started iterations

19: $\newcommand{\R}{{\mathbb R}} \newcommand{\stepsize}{\tau} \newcommand{\RegFun}{{\mathcal R}} \newcommand{\Prox}[1]{{\rm prox}_{#1}} x^+ \gets \Prox{\stepsize \RegFun}(x_\alpha - \stepsize g)$ $\newcommand{\R}{{\mathbb R}} \newcommand{\stepsize}{\tau} \newcommand{\RegFun}{{\mathcal R}} \newcommand{\Prox}[1]{{\rm prox}_{#1}} x^+ \gets \Prox{\stepsize \RegFun}(x_\alpha - \stepsize g)$

20: if Descent inequality (24) does not hold then

21: % increase Lipschitz constant and try again

22: $\newcommand{\e}{{\rm e}} L \gets \min(\eta L, \overline{L})$ $\newcommand{\e}{{\rm e}} L \gets \min(\eta L, \overline{L})$

23: continue

24: if Proximal descent inequality (25) does hold then

25: % continue to next iteration with larger step size

26: $\newcommand{\e}{{\rm e}} L \gets \max(L/\eta, \underline{L})$ $\newcommand{\e}{{\rm e}} L \gets \max(L/\eta, \underline{L})$

27: break

28: return (x⁺ ,L)

4. Numerical setting

4.1. Data

In this work we use data from two separate data sets. The first data set [48] is from environmental remote sensing to study vegetation in the Mediterranean woodlands in Alta Tajo natural park, northeast of Madrid, Spain [49]. It was acquired in October–November 2014 using an AISA Eagle hyperspectral sensor and a Leica RCD105 39 megapixel digital camera mounted on a Dornier 228 aircraft. The hyperspectral imagery consisting of 126 spectral bands almost equidistantly covering the wavelengths 400 nm–970 nm with a spatial resolution of 1 m × 1 m. The aerial photograph has a resolution of 0.25 m × 0.25 m but only captures channels corresponding to the visible red, green and blue. We refer to images from this data set as $\newcommand{\DataTreesA}{\texttt{environment1}} \DataTreesA$ , $\newcommand{\DataTreesB}{\texttt{environment2}} \DataTreesB$ .

The second data set [50, 51] shows an urban area of Vancouver, Canada. It was acquired in March–May 2015 by the DEIMOS-2 satellite operating on a mean altitude of 620 km. The satellite carried a very highly resolving push-broom camera with one panchromatic channel of 1 m × 1 m resolution and the four multispectral channels red, green, blue and near-infrared with a resolution of 4 m × 4 m. Images from this data set are marked $\newcommand{\DataUrbanA}{\texttt{urban1}} \DataUrbanA$ , $\newcommand{\DataUrbanB}{\texttt{urban2}} \DataUrbanB$ .

Next to the already described real data sets we simulated an environmental data set where we know the ground truth image and kernel. We test two different kernels, one which is disk-shaped and one which is an off-centered Gaussian and refer to the corresponding data sets as $\newcommand{\DataGTdisk}{\texttt{groundtruth\_disk}} \DataGTdisk$ and $\newcommand{\DataGTgaussian}{\texttt{groundtruth\_Gaussian}} \DataGTgaussian$ . The red channel of an RGB image is convolved with these two kernels, subsampled by a factor of s = 4 and Gaussian noise of variance 0.001 is added. The data is shown at the top left of figures 12 and 13. For the data set $\newcommand{\DataGTgaussian}{\texttt{groundtruth\_Gaussian}} \DataGTgaussian$ we use the RGB image itself as side information and for $\newcommand{\DataGTdisk}{\texttt{groundtruth\_disk}} \DataGTdisk$ we use a shifted version of it. This situation occurs frequently in applications since usually data and side information are acquired by different imaging devices which can lead to a relative translation between the observed images. Note that we used replicated instead of periodic boundary conditions for the convolution in order to avoid inverse crime. As the ground truth is known in these cases, we evaluate the result in terms of the structural similarity index (SSIM) [52] and the Haar wavelet-based perceptual similarity index (HPSI) [53]. As compared to measures like the peak-signal-to-noise ratio (PSNR) or the mean squared error (MSE), the SSIM and the HPSI aim at reproducing how image similarity is perceived by human viewers. The SSIM is one of the most widely used image similarity measures and obtained by comparing local statistical parameters such as local means and local variances. The HPSI is based on a simple discrete Haar wavelet transform and incorporates basic assumptions about the human visual system. The HPSI was recently shown to yield state-of-the-art correlations with human opinion scores on large benchmarking databases of differently distorted images.

In all numerical experiments the data f are scaled to $[0, 1]$ .

4.2. Visualization

Throughout the results, the data and reconstructed images are shown with MATLAB's 'parula' colormap where yellow represents values $\geqslant 1$ and dark blue corresponds to values $\leqslant 0$ , respectively. The kernels are visualized with gray scales where black corresponds to the smallest value and white to the largest. In addition, we draw a red hair cross on the center, i.e. the pixel with indices l + 1, where l denotes the radius of the kernel. The kernel's centroid $\sum\nolimits_i i k_i$ is visualized by a green hair cross in order to visualize the off-set between data and side information.

4.3. Algorithms

The numerical results will be computed with the algorithms PALM, iPALM and PAM. For PAM, the inner problems require solving a deconvolution problem with convex regularization which is implemented with ADMM [54–56] following a similar strategy as in [18]. All three—PALM, iPALM and PAM—require the computation of proximal operators for the total variation and directional total variation under a convex constraint. These are implemented with a finite number of warm started fast gradient projection/FISTA [47, 57] iterations. Within these, projections onto the constraint sets are computed by $\newcommand{\Prox}[1]{{\rm prox}_{#1}} \Prox{\imath_{[0, \infty){\hspace{0pt}}^m}}(u) = \max(0, u)$ (to be understood componentwise) and the fast algorithm of [58] for the projection onto the unit simplex. A MATLAB implementation of the algorithms and the data itself can be found at [59].

4.4. Parameters

The numerical experiments require model parameters and algorithm parameters. For the mathematical model, the parameters of the directional total variation are chosen as $\gamma = 0.9995$ and $\varepsilon = 0.003$ for all experiments, see (12) and figure A1 in the appendix. The low resolution images are of size $100\times 100$ with an subsampling rate of s = 4. Since we fix the kernel size to be $41\times 41$ , we will consequently obtain reconstructions of size $440\times 440$ where we only visualize the meaningful part of size $400\times 400$ .

Except for the algorithm comparison, we use PALM with 2000 iterations and a step size constant $\theta = 1.1$ , see (22). The backtracking parameters in algorithm 1 are chosen as $\newcommand{\e}{{\rm e}} \eta=2$ , $\underline{L} = 1$ and $\overline{L} = 10^{30}$ .

4.5. Initialization

Both algorithm 1 and our implementation of PAM require initial guesses for image and kernel. While the kernel is initialized with a compactly supported Gaussian kernel, we choose the initial image to be $\newcommand{\mH}{{\bf H}} \mH f$ , where $\newcommand{\R}{{\mathbb R}} \newcommand{\aceLarge}{U} \newcommand{\mH}{{\bf H}} \mH:\R^n\to\aceLarge$ is a right inverse of the operator $\newcommand{\mB}{{\bf B}} \newcommand{\mS}{{\bf S}} \mS\circ\mB$ , i.e. $\newcommand{\mH}{{\bf H}} \mH$ is an upsampling operator (see the top-left images in figures 5 and 8).

**Figure 5.** Iterates of PALM for data set $\newcommand{\DataGTdisk}{\texttt{groundtruth\_disk}} \DataGTdisk$ with $\lambda_u = 0.1$ , $\lambda_k = 10$ with objective function values $\newcommand{\Obj}{\Psi} \Obj$ and similarity to ground truth.
Download figure:
Standard image High-resolution image

$ \newcommand{\DataGTdisk}{\texttt{groundtruth\_disk}} \DataGTdisk$ — **Figure 5.** Iterates of PALM for data set $\newcommand{\DataGTdisk}{\texttt{groundtruth\_disk}} \DataGTdisk$ with $\lambda_u = 0.1$ , $\lambda_k = 10$ with objective function values $\newcommand{\Obj}{\Psi} \Obj$ and similarity to ground truth.
Download figure:
Standard image High-resolution image

5. Numerical results

5.1. Comparison of algorithms

Before we focus on the actual image fusion reconstructions, we compare the algorithms PALM, iPALM and PAM on two different data sets. The results are shown in figures 5, 6 and 7 for the data set $\newcommand{\DataGTdisk}{\texttt{groundtruth\_disk}} \DataGTdisk$ and in figures 8, 9 and 10 for $\newcommand{\DataTreesA}{\texttt{environment1}} \DataTreesA$ . Please refer to the captions of each figure for some detailed observations. We would like to highlight two important observations. First, both examples show that all three algorithms converge to practically the same critical point and that a visually pleasing reconstruction is already obtained after about 500 iterations. Thus, even though problem (1) is non-convex, the algorithm itself does not seem to have much influence on the final reconstructions. Second, the examples show that inertia may slow the convergence down as for $\newcommand{\DataGTdisk}{\texttt{groundtruth\_disk}} \DataGTdisk$ or may make it faster as for $\newcommand{\DataTreesA}{\texttt{environment1}} \DataTreesA$ . Other results—which are not shown here for brevity—indicate that this is correlated with the regularization parameter $\lambda_u$ . For large values, e.g. $\lambda_u \geqslant 1$ , we found that inertia consistently sped up convergence while for smaller values, e.g. $\lambda_u \leqslant 0.1$ we found that it was always slowing down the convergence. Thus, for simplicity and due to the lack of supporting examples that would justify other decisions, all further reconstructions are performed by PALM.

**Figure 6.** Reconstructions by three different algorithms—PAM and iPALM with inertial parameters $\alpha = 0.2$ and $\alpha = 0.5$ —after thousands of iterations for the same setting as figure 5. The visual impression of the images and kernels and the objective values indicate that all algorithms converge to a similar critical point.
Download figure:
Standard image High-resolution image

**Figure 6.** Reconstructions by three different algorithms—PAM and iPALM with inertial parameters $\alpha = 0.2$ and $\alpha = 0.5$ —after thousands of iterations for the same setting as figure 5. The visual impression of the images and kernels and the objective values indicate that all algorithms converge to a similar critical point.
Download figure:
Standard image High-resolution image

**Figure 7.** Algorithm statistics over the course of the iterations for the same setting as figure 5. The relative objective function values (top left) show that all algorithms have about the same speed on this example. The objective function values (top center) and its summands (bottom) indicate that also all algorithms converge to the same critical point. Overall, the plots show that more inertia leads to slower convergence.
Download figure:
Standard image High-resolution image

**Figure 8.** Iterates of PALM for the data set $\newcommand{\DataTreesA}{\texttt{environment1}} \DataTreesA$ with $\lambda_u = 1$ and $\lambda_k = 10$ .
Download figure:
Standard image High-resolution image

$ \newcommand{\DataTreesA}{\texttt{environment1}} \DataTreesA$ — **Figure 8.** Iterates of PALM for the data set $\newcommand{\DataTreesA}{\texttt{environment1}} \DataTreesA$ with $\lambda_u = 1$ and $\lambda_k = 10$ .
Download figure:
Standard image High-resolution image

**Figure 9.** Reconstructions by PAM and iPALM with inertial parameters $\alpha = 0.2$ and $\alpha=0.5$ for the setting of figure 8. Visually, the images and kernels and quantitatively, the objective function values indicate that all algorithms converge to the same critical point.
Download figure:
Standard image High-resolution image

**Figure 10.** Algorithm statistics for PALM and iPALM with a variety of inertial parameters for the same setting as figure 8. The objective function values (top center) and its summands (bottom) indicate that all algorithms converge to the same critical point. The relative objective function values (top left) show that for this example a higher inertial parameter leads to faster convergence.
Download figure:
Standard image High-resolution image

5.2. Simulated data

This section is dedicated to simulated data where the ground truth image is known. Figure 11 shows the similarity indices SSIM and HPSI of the reconstructed images when varying the image and kernel regularization parameters $\newcommand{\R}{{\mathbb R}} \newcommand{\RegParam}{\lambda} \RegParam_u$ and $\newcommand{\R}{{\mathbb R}} \newcommand{\RegParam}{\lambda} \RegParam_k$ . The results show that the proposed model is stable with respect to the choice of parameters as long as the image regularization $\newcommand{\R}{{\mathbb R}} \newcommand{\RegParam}{\lambda} \RegParam_u$ is not too small. It is remarkable that the success of the model depends only mildly on the choice of $\newcommand{\R}{{\mathbb R}} \newcommand{\RegParam}{\lambda} \RegParam_k$ and a vanishing kernel regularization $\newcommand{\R}{{\mathbb R}} \newcommand{\RegParam}{\lambda} \RegParam_k=0$ has only modest impact. However, it can also be seen that best reconstructions are obtained for non-zero $\newcommand{\R}{{\mathbb R}} \newcommand{\RegParam}{\lambda} \RegParam_k$ which indicates that a mild kernel regularization should be performed.

**Figure 11.** Similarity measures SSIM and HPSI for different values of the regularization parameters $\lambda_u$ and $\lambda_k$ . Top: data set $\newcommand{\DataGTdisk}{\texttt{groundtruth\_disk}} \DataGTdisk$ with shifted side information. Bottom: data set $\newcommand{\DataGTgaussian}{\texttt{groundtruth\_Gaussian}} \DataGTgaussian$ . Yellow corresponds to high and blue to low similarity. All four plots indicate that the reconstructions are not very sensitive with respect to the kernel regularization.
Download figure:
Standard image High-resolution image

In figure 12 we show reconstruction results for the $\newcommand{\DataGTdisk}{\texttt{groundtruth\_disk}} \DataGTdisk$ test case where we fix the image regularizing parameter $\newcommand{\R}{{\mathbb R}} \newcommand{\RegParam}{\lambda} \RegParam_u=0.1$ and vary the kernel regularization $\newcommand{\R}{{\mathbb R}} \newcommand{\RegParam}{\lambda} \RegParam_k$ . Note that we complicated the situation by using a side information which is shifted by several pixels and, thus, not aligned to the data. The proposed approach is well-fit to deal with this problem since blind deconvolution can implicitly perform rigid registration of data and side information by producing a blurring kernel that compensates for the translation by having an off-centered centroid. The results show that our method is able to compensate for unaligned side information images and can achieve very good results, nevertheless. The influence of the amount of kernel regularization is noticeably small although over-regularization clearly leads to different contrasts in the reconstructed image. Note that the barycentric translations of the kernels in figure 12 correspond exactly—up to pixel accuracy—to the number of pixels that the shifted side information deviates from the un-shifted one.

Figure 13 shows in a similar way the reconstruction results for the test case $\newcommand{\DataGTgaussian}{\texttt{groundtruth\_Gaussian}} \DataGTgaussian$ , using the un-shifted side information. In contrast to the previous example, here we fix the parameter $\newcommand{\R}{{\mathbb R}} \newcommand{\RegParam}{\lambda} \RegParam_k=10$ and vary $\newcommand{\R}{{\mathbb R}} \newcommand{\RegParam}{\lambda} \RegParam_u$ instead. Here, the effects of different regularization strengths are considerably larger; particularly, in the case $\newcommand{\R}{{\mathbb R}} \newcommand{\RegParam}{\lambda} \RegParam_u=0.01$ the shift between data and side information—which was induced by convolving with the off-centered kernel—is not fully detected. Consequently, the resulting similarity values, being not translationally invariant, are relatively poor. In the opposite scenario of over-regularizing the image, small structures in the image are not captured well.

$ \newcommand{\DataGTgaussian}{\texttt{groundtruth\_Gaussian}} \DataGTgaussian$ — **Figure 13.** Reconstructions for $\newcommand{\DataGTgaussian}{\texttt{groundtruth\_Gaussian}} \DataGTgaussian$ . Top: low resolution data and side information. Bottom: reconstruction for varying image regularization and kernel regularization being chosen as $\newcommand{\R}{{\mathbb R}} \newcommand{\RegParam}{\lambda} \RegParam_k = 10$ . SSIM and HPSI are given in percentages and higher values indicating better performance.
Download figure:
Standard image High-resolution image

5.3. Hyperspectral data

In this section we apply our method to real data and—similarly to the previous section—investigate the effects of varying image and kernel regularization. To this end, we apply our method to one channel of the hyperspectral environmental and of the multispectral urban data set. Figures 14 and 15 show data, side information, and the corresponding reconstruction results for different values of the regularization parameters. Since no ground truth solutions are known in this case, we rely on our visual impression to choose a set of parameters. Figure 14, which corresponds to near-infrared light, suggests that a good compromise among data fidelity and regularization is achieved for $\newcommand{\R}{{\mathbb R}} \newcommand{\RegParam}{\lambda} \RegParam_u=\RegParam_k=1$ . However, even if no $\newcommand{\tv}{{\rm TV}} \tv$ regularity of the kernel is required, that is in the case $\newcommand{\R}{{\mathbb R}} \newcommand{\RegParam}{\lambda} \RegParam_k=0$ , the resulting images do hardly differ. Bearing in mind the results of figure 12, we will regularize the kernel slightly in the sequel. The situation is similar in figure 15, which also corresponds to near-infrared light. However, a smaller image regularization parameter, e.g. $\newcommand{\R}{{\mathbb R}} \newcommand{\RegParam}{\lambda} \RegParam_u=0.01$ , seems necessary in order to avoid over-smoothing.

**Figure 14.** Varying both regularization parameters for $\newcommand{\DataTreesA}{\texttt{environment1}} \DataTreesA$ which corresponds to a near-infrared channel with wavelength of around 900 nm. Too small image regularization $\lambda_u$ leads to point artefacts and too large $\lambda_u$ to a loss of contrast and wrong estimation of the kernel. While no kernel regularization does not seem to have negative effects on the reconstruction, too large kernel regularization changes the image contrast.
Download figure:
Standard image High-resolution image

**Figure 15.** Results for $\newcommand{\DataUrbanA}{\texttt{urban1}} \DataUrbanA$ (near-infrared) for varying image and kernel regularization. Over- and under-regularization have similar effects like in figure 14. However, due to a lower noise level of the data, the parameters can be chosen smaller.
Download figure:
Standard image High-resolution image

$ \newcommand{\DataUrbanA}{\texttt{urban1}} \DataUrbanA$ — **Figure 15.** Results for $\newcommand{\DataUrbanA}{\texttt{urban1}} \DataUrbanA$ (near-infrared) for varying image and kernel regularization. Over- and under-regularization have similar effects like in figure 14. However, due to a lower noise level of the data, the parameters can be chosen smaller.
Download figure:
Standard image High-resolution image

Finally, we apply the proposed strategy to several hyperspectral channels of the data set $\newcommand{\DataTreesB}{\texttt{environment2}} \DataTreesB$ , using the same parameters for the reconstruction of each channel. Here, we scaled the reconstructed images back to their physical range to allow inter-channel comparisons. Figure 16 shows the reconstruction of six hyperspectral channels ranging from blue to near-infrared light. Note that the resulting kernels only change slightly with respect to the channel. We would like to point out that we were able to use the same set of parameters, namely $\newcommand{\R}{{\mathbb R}} \newcommand{\RegParam}{\lambda} \RegParam_u=\RegParam_k=1$ , for all channels. In figure 17 we show the reconstruction of all channels of the multispectral urban data set $\newcommand{\DataUrbanB}{\texttt{urban2}} \DataUrbanB$ using $\newcommand{\R}{{\mathbb R}} \newcommand{\RegParam}{\lambda} \RegParam_u=0.01$ and $\newcommand{\R}{{\mathbb R}} \newcommand{\RegParam}{\lambda} \RegParam_k=1$ . Additionally, we visualized the results by re-combining the four channels into an RGB and a false color image which has the channels near-infrared, red and green. We would like to highlight that the kernels are consistently estimated up to a pixel accuracy and that no color smearing is visible in the RGB and false color images despite the fact that all channels have been processed independently.

**Figure 16.** Reconstructions of several channels corresponding to different wavelengths given in nanometer (nm) of $\newcommand{\DataTreesB}{\texttt{environment2}} \DataTreesB$ . The regularization parameters for this experiment were chosen constant across the channels as $\newcommand{\R}{{\mathbb R}} \newcommand{\RegParam}{\lambda} \RegParam_u = 1$ and $\newcommand{\R}{{\mathbb R}} \newcommand{\RegParam}{\lambda} \RegParam_k = 1$ . For visualization, reconstructed images are scaled back to their physical range. Note that despite the reconstruction being performed channel-by-channel, all kernels have about the same shape and the same shift.
Download figure:
Standard image High-resolution image

$ \newcommand{\DataTreesB}{\texttt{environment2}} \DataTreesB$ — **Figure 16.** Reconstructions of several channels corresponding to different wavelengths given in nanometer (nm) of $\newcommand{\DataTreesB}{\texttt{environment2}} \DataTreesB$ . The regularization parameters for this experiment were chosen constant across the channels as $\newcommand{\R}{{\mathbb R}} \newcommand{\RegParam}{\lambda} \RegParam_u = 1$ and $\newcommand{\R}{{\mathbb R}} \newcommand{\RegParam}{\lambda} \RegParam_k = 1$ . For visualization, reconstructed images are scaled back to their physical range. Note that despite the reconstruction being performed channel-by-channel, all kernels have about the same shape and the same shift.
Download figure:
Standard image High-resolution image

**Figure 17.** Reconstruction of the multispectral data set $\newcommand{\DataUrbanB}{\texttt{urban2}} \DataUrbanB$ . The regularization parameters were chosen the same for all channels as $\newcommand{\R}{{\mathbb R}} \newcommand{\RegParam}{\lambda} \RegParam_u=0.01$ and $\newcommand{\R}{{\mathbb R}} \newcommand{\RegParam}{\lambda} \RegParam_k=1$ . Kernels are being estimated consistently over the range of the channels up to an accuracy of one pixel. The resulting RGB and false color images have well-defined boundaries and show no color smearing. (NIR = near-infrared)
Download figure:
Standard image High-resolution image

$ \newcommand{\DataUrbanB}{\texttt{urban2}} \DataUrbanB$ — **Figure 17.** Reconstruction of the multispectral data set $\newcommand{\DataUrbanB}{\texttt{urban2}} \DataUrbanB$ . The regularization parameters were chosen the same for all channels as $\newcommand{\R}{{\mathbb R}} \newcommand{\RegParam}{\lambda} \RegParam_u=0.01$ and $\newcommand{\R}{{\mathbb R}} \newcommand{\RegParam}{\lambda} \RegParam_k=1$ . Kernels are being estimated consistently over the range of the channels up to an accuracy of one pixel. The resulting RGB and false color images have well-defined boundaries and show no color smearing. (NIR = near-infrared)
Download figure:
Standard image High-resolution image

6. Conclusions and outlook

We have presented a novel model for simultaneous image fusion and blind deblurring of hyperspectral images based on the directional total variation. We showed that this approach yields sharp images for both environmental and urban data sets that are either acquired by a plane or by a satellite. In addition, the numerical comparison of different algorithms showed that—despite the non-convexity of the problem—the computed solutions are stable with respect to the algorithms. The implicitly included registration by estimating the kernel yields further robustness to imperfections in the image acquisitions. Thus, the proposed approach is appealing for automated processing of large data sets.

Future work will be devoted to multiple directions. First, the proposed approach is channel-by-channel, which makes it computationally appealing. However, a coupling of the spectral channels via their kernel and/or via spectral fingerprint preservation may yield further improvements. Second, higher order regularization such as the total generalized variation [40] has been proven useful for natural images. This naturally leads to the directional total generalized variation which combines higher order smoothness with structural a priori knowledge as presented in this work, thereby generalizing the definition of [36]. Similarly, other kernel regularization functionals such as the total generalized variation or the H¹-semi-norm will be considered in future investigations to better model the smoothness of the kernel. Finally, it is currently not clear how the computational cost of the proposed method depends on the size of the considered data. In particular, it is difficult to predict to which degree larger images also require a higher number of iterations during optimization for the method to yield satisfactory results. However, due to the fact that our approach mostly depends on operations that are of a local nature, it should be highly suitable for parallelization. The development and analysis of implementations that utilize the benefits of distributed computing architectures is also a topic for future research.

Acknowledgments

We would like to express our gratitude to Juheon Lee and Qi Wei for their help with the environmental and urban data sets, respectively. Furthermore, the authors would like to thank Deimos Imaging for acquiring and providing the data used in this study, and the IEEE GRSS Image Analysis and Data Fusion Technical Committee.

MJE and C-BS acknowledge support from Leverhulme Trust project 'Breaking the non-convexity barrier', EPSRC grant 'EP/M00483X/1', EPSRC centre 'EP/N014588/1', the Cantab Capital Institute for the Mathematics of Information, and from CHiPS (Horizon 2020 RISE project grant). Moreover, C-BS is thankful for support from the Alan Turing Institute.

DAC acknowledges the support of NERC (grant number NE/K016377/1). We are grateful to NERC's Airborne Research Facility and Data Analysis Node for conducting the airborne survey and pre-processing the environmental data collected from Alto Tajo, Spain (survey CAM11/03).

Appendix

We test the influence of parameter γ in (12) on a simulated and a real data set. Figure A1 shows the corresponding reconstruction results for $\gamma\in\{0.995, 0.9995, 1\}$ where the middle value is used in all other numerical experiments in this work. Note that the results for $\gamma=1$ suffer from stronger point artifacts whereas those for $\gamma=0.995$ do not capture local structures well enough.

Blind image fusion for hyperspectral imaging with the directional total variation

Article metrics

Submit

Author e-mails

Author affiliations

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction

2. Mathematical model and notation

2.1. The forward operator

2.2. Regularization functionals

2.2.1. Image regularization.

2.3. Kernel regularization

3. Algorithms

3.1. Proximal alternating linearized minimization (PALM)

3.2. Inertial PALM (iPALM)

3.3. Step sizes and backtracking

4. Numerical setting

4.1. Data

4.2. Visualization

4.3. Algorithms

4.4. Parameters

4.5. Initialization

5. Numerical results

5.1. Comparison of algorithms

5.2. Simulated data

5.3. Hyperspectral data

6. Conclusions and outlook

Acknowledgments

Appendix

Blind image fusion for hyperspectral imaging with the directional total variation

Article metrics

Submit

Share this article

Author e-mails

Author affiliations

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction

2. Mathematical model and notation

2.1. The forward operator

2.2. Regularization functionals

2.2.1. Image regularization.

2.3. Kernel regularization

3. Algorithms

3.1. Proximal alternating linearized minimization (PALM)

3.2. Inertial PALM (iPALM)

3.3. Step sizes and backtracking

4. Numerical setting

4.1. Data

4.2. Visualization

4.3. Algorithms

4.4. Parameters

4.5. Initialization

5. Numerical results

5.1. Comparison of algorithms

5.2. Simulated data

5.3. Hyperspectral data

6. Conclusions and outlook

Acknowledgments

Appendix