Joint reconstruction of PET-MRI by exploiting structural similarity

Matthias J Ehrhardt; Kris Thielemans; Luis Pizarro; David Atkinson; Sébastien Ourselin; Brian F Hutton; Simon R Arridge

doi:10.1088/0266-5611/31/1/015001

1. Introduction

In recent years the landscape of medical imaging has changed. For many decades single modality scanners to image either anatomy or function have been available and widely used in clinical practice. Anatomy can be imaged for instance by magnetic resonance imaging (MRI) and computer assisted tomography (CT). Often functionality is measured by radioactive labelled markers using positron emission tomography (PET) or single photon emission computed tomography. Based on these single modality scanners, multi-modality scanners were developed to combine the strength of both regimes and image both structure and function. While PET-CT scanners acquire the data for the two modalities sequentially some recently developed PET-MRI scanners are able to perform both scans simultaneously [1–3].

This development not only opens up new opportunities for clinicians but also poses a new kind of inverse problem. As 'function follows structure for the most part' [4] the images to be reconstructed are likely to share many edges. Some example images are shown in figure 1 where the reconstructions from PET, MRI and CT of a single patient are shown (using standard settings with the scanner software). The images were affinely registered to a common frame as they were acquired on a sequential PET-CT and on a separate MRI scanner. Despite the fact that all three images show completely different biological information, all three images show similar structures due to the same underlying anatomy. Therefore, we propose to jointly reconstruct both modalities with the a priori knowledge that the images are more likely to show similar structures. Some authors refer to this task as joint inversion [5, 6].

**Figure 1.** Reconstructions of data acquired by PET, MRI and CT of the same patient. Although the images contain completely different biological information common shapes can be observed due to the same anatomy. Images courtesy of UCL hospital.
Download figure:
Standard image High-resolution image

We will focus on joint reconstruction of simultaneously acquired PET-MRI data. All methods discussed here are, however, not limited to this application. They take similar structures into account and can therefore be applied to any other multi-modality scanner or even another field of application where similarity in structure is to be expected.

1.1. Contributions

This work is, to the best of our knowledge, the first contribution to joint reconstruction in multi-modality medical imaging. We justify the joint objective function used in [5–10] via probabilistic modelling, which also allows arbitrary noise models to be used in a joint reconstruction framework. Moreover, we analyse a general class of joint priors [6–13] and show how they couple the reconstructions in a gradient based optimization scheme. Furthermore, our numerical results show that joint PET-MRI reconstruction can be beneficial to both of the modalities. Most of the results obtained via joint reconstruction are superior, both in visual quality as well as in ℓ²-error, compared to some widely accepted methods for separate reconstructions.

1.2. Related Work

Joint reconstruction was, as far as we know, first proposed by Haber and Oldenbourg in 1997 [5] where the application of combined PET and MRI reconstruction was already mentioned but the focus was on geophysical applications. Subsequently, different priors coupling two modalities have been proposed [6–9, 14]. Some of these are related to the methods we use in this paper. We will discuss these relations in section 3.

There is a close connection between joint reconstruction and inverse problems in colour image processing as the colour channels can be seen as different modalities which share information about edges [10, 12, 13, 15–20]. All methods used in this work have been proposed first for colour image processing such as denoising and demosaicking [10, 12, 17].

Also closely related is the work where two different MRI data sets of the same subject are enhanced [21] and reconstructed [22].

Similarly, the reconstruction with a priori information from another modality has also been studied. This is sometimes referred to as model fusion [6]. In the example of PET and MRI one might want to reconstruct PET using the additional structural information of MRI. On the one hand, one may include this information via a structural prior which gives information about edges [11, 23–28]. On the other hand, theoretical information priors such as joint entropy and mutual information have been used to inform about likely tissue distributions [29–32]. None of these priors have been studied for joint reconstruction.

2. Problem setting and notation

This section is dedicated to introduce the problem of joint reconstruction of PET-MRI more formally and, in addition, it serves to state our notations.

2.1. Positron emission tomography

In PET a radioactive labelled and positron emitting tracer is injected into the patient. As the positrons annihilate with electrons of the tissue two photons in (almost) opposite directions are sent out which are recorded by detectors around the object. One seeks to reconstruct the tracer distribution from recorded events from pairs of detectors [33, 34]. In a general continuous framework this corresponds to the the x-ray transform which coincides with the Radon transform in 2d [35, 36]. This is of course a simplified model as effects such as scatter are ignored.

Let us denote the activity of the tracer by $u:\Omega \subset {{\mathbb{R}}^{N}}\to [0,\infty )$ , where $N=2$ or 3. We want to reconstruct $u$ on a discretized domain which we denote for the sake of simplicity again by $u\in {{\mathbb{R}}^{M}}$ . The discrete PET forward operator $A:{{\mathbb{R}}^{M}}\to {{\mathbb{R}}^{K}}$ describes how likely it is that a pair of photons is detected by one of the detector pairs, i.e., ${{A}_{k,m}}$ is the probability for an event at voxel $m$ to be detected by the k-th detector pair. More accurate modelling would include the estimation of scatter, randoms and attenuation [34]. The acquired data in PET are commonly assumed to be an instance of a Poisson distributed vector with expectation $Au$ [33, 34, 37].

2.2. Magnetic resonance imaging

MRI exploits the spin of hydrogen nuclei in the human body (or any other material). A magnetic field makes these randomly orientated spins align so that the position of the spin can be encoded by tiny variations in their resonant frequency. The collected signal corresponds to the modulated spin density integrated over the whole domain. The inverse problem in MRI is to reconstruct an image from these Fourier measurements [38, 39].

MRI is a very versatile imaging modality. Depending on the imaging sequence used for acquisition, the MRI image can reflect the proton density of the tissue or some other tissue related parameter [38, 39]. Independent of the imaging sequence we will denote the MRI image by $v:\Omega \to \mathbb{C}$ . We denote the discrete MRI forward operator by $B:{{\mathbb{C}}^{M}}\to {{\mathbb{C}}^{L}}\simeq {{\mathbb{R}}^{2L}}$ which consists of the Fourier transform followed by a measurement or sampling operator being usually a projection on the measured frequencies.

The noise in MRI is commonly modelled as additive Gaussian with standard deviation depending on, among others, the temperature, the receiver bandwidth and the field strength so that we acquire data $g\;\sim \mathcal{N}(Bv,\sigma I),\sigma \gt 0$ [40–42].

It became popular to speed up the MRI data acquisition by omitting 'unnecessary' measurements and 'replacing' them using a priori knowledge. This concept known as compressed sensing [43–46] has been already applied to MRI with great success [22, 47, 48]. Successfully exploited a priori knowledge in MRI includes, for instance, sparsity in the gradient or wavelet domain [47], sparsity in a self-learned dictionary [48] or similarity from different MRI contrasts [22]. In our contribution, the additional a priori knowledge is the similarity of MRI to simultaneously acquired PET.

For simplicity we assume the phase in MRI to be zero and therefore restrict $v$ to be real-valued.

2.3. Probabilistic modelling for joint reconstruction

As we have now introduced both modalities PET and MRI we can discuss how they can be linked. We formulate our beliefs using probabilistic modelling and graphical models [49, 50]. The PET and MRI images $u,v$ are random variables. Our intuitive belief about these unknown images $u,v$ are described in the graphical model shown in figure 2. On the one hand, both $u$ and $v$ depend on a common object and are therefore not independent of each other. The data are modelled as random variables with expectation linearly depending on these images. Hence, also the data of PET f and MRI g are not independent of another. On the other hand, if we have knowledge about $u$ , $v$ does not provide extra information about f and vice versa. This means, that f and g are conditionally independent given $u,v$ . Formally, this leads to the separation of the multi-modality likelihood

$\begin{eqnarray}&&p(f,g\;|u,v)=p(f\;|u,v)p(g\;|u,v)=p(f\;|u)p(g\;|v).\end{eqnarray} \tag{ 1 }$

The posterior probability of the PET-MRI image $(u,v)$ for observed PET-MRI data $(f,g\;)$ can then be simplified as

$\begin{eqnarray}&&p(u,v|f,g\;)\propto p(f\;|u)p(g\;|v)p(u,v)\end{eqnarray} \tag{ 2 }$

using Bayes' rule. When we further assume white, complex Gaussian noise in MRI, Poisson noise in PET and a prior of the form $p(u,v)={\rm exp} (-\alpha \mathcal{R}(u,v))$ then maximizing (2) is equivalent to minimizing

$\begin{eqnarray}&&\mathcal{J}(u,v)=\mathop{\sum }\limits_{i}{{(Au)}_{i}}-{{f}_{i}}\;{\rm log} {{(Au)}_{i}}+\frac{1}{2{{\sigma }^{2}}}{{\left| Bv-g \right|}^{2}}+\alpha \mathcal{R}(u,v),\end{eqnarray} \tag{ 3 }$

where $\left| x \right|\;:=\;{{\left( {{\sum }_{i}}|{{x}_{i}}{{|}^{2}} \right)}^{1/2}}$ denotes the ℓ²-norm.

Remark 2.1. The joint objective functional (3) but with Gaussian noise in both modalities coincides with the functional used in [5–10] up to a scaling of the data fidelity terms.

**Figure 2.** A graphical model encodes our beliefs for joint reconstruction in PET-MRI. See the text for further details.
Download figure:
Standard image High-resolution image

3. Methods for joint reconstruction

This section describes the models for our prior $\mathcal{R}$ which couples the reconstruction of PET and MRI.

3.1. Parallel level sets

3.1.1. Measuring parallelism

A general concept of coupling structure in a multi-channel image was proposed [10] with special cases including [6, 7, 12]. Structure in an image can be identified with its level sets

$\begin{eqnarray}&&\left\{ x\in \Omega \;|\;u(x)=s \right\}.\end{eqnarray} \tag{ 4 }$

In differentiable images the gradient is perpendicular to the level sets, so similar structures can be measured by similarity of the gradients' orientation. Therefore, parallelism of the gradients at each spatial location is a measure of structural similarity of two images.

3.1.2. Measuring parallelism of vectors

It is well known that for any vectors $x,y\in {{\mathbb{R}}^{N}}$ the following relation is fulfilled

$\begin{eqnarray}&&|\left\langle x,y \right\rangle |=\left| {\rm cos} (\theta ) \right|\left| x \right|\left| y \right|,\end{eqnarray} \tag{ 5 }$

with θ denoting the angle between them in the plane they span and $\left\langle x,y \right\rangle \;:=\;{{\sum }_{i}}{{x}_{i}}{{y}_{i}}$ the Euclidean scalar product. Consequently, one can determine how parallel $x$ and $y$ are by the 'distance measure'

$\begin{eqnarray}&&d(x,y)\;:=\;\left| x \right|\left| y \right|-|\left\langle x,y \right\rangle |.\end{eqnarray} \tag{ 6 }$

It is obvious that $d$ is symmetric and $d(x,y)\geqslant 0$ with equality if and only if $x$ is parallel to y. These properties guarantee that $d$ is suitable to measure parallelism. Moreover, it can be generalized to

$\begin{eqnarray}&&{{d}_{\varphi ,\psi }}(x,y)\;:=\;\varphi \left( \psi (\left| x \right|\left| y \right|)-\psi (|\left\langle x,y \right\rangle |) \right)\end{eqnarray} \tag{ 7 }$

with arbitrary strictly increasing functions $\varphi ,\psi \;:[0,\infty )\to [0,\infty )$ . Obviously, ${{d}_{\varphi ,\psi }}$ is still symmetric, non-negative and only parallel vectors have minimal 'distance'.

3.1.3. Measuring parallelism of images

This concept of parallel vectors can be easily applied to (differentiable) images. Instead of arbitrary vectors we measure how parallel their gradients are at any point in their domain $x\in \Omega$ , i.e.,

$\begin{eqnarray}&&{{d}_{\varphi ,\psi }}(\nabla u(x),\nabla v(x))=\varphi \left( \psi (\left| \nabla u(x) \right|\left| \nabla v(x) \right|)-\psi (|\left\langle \nabla u(x),\nabla v(x) \right\rangle |) \right).\end{eqnarray} \tag{ 8 }$

Hence, we end up with a global measure of how 'similar' two images $u$ and $v$ are

$\begin{eqnarray}&&{\rm PLS}(u,v)={{\int }_{\Omega }}{{d}_{\varphi ,\psi }}(\nabla u(x),\nabla v(x))\;\;{\rm d}x.\end{eqnarray} \tag{ 9 }$

In order to make the notation more easily accessible we will suppress the explicit spatial dependence from now on.

In homogeneous regions of $v$ the gradient $\nabla v$ vanishes and this functional does not provide any information for $u$ . Therefore, one instead can take the modified version

$\begin{eqnarray}&&{{d}_{\varphi ,\psi ,\beta }}(\nabla u,\nabla v)\;:=\;\varphi \left( \psi ({{\left| \nabla u \right|}_{\beta }}{{\left| \nabla v \right|}_{\beta }})-\psi ({{\left| \left\langle \nabla u,\nabla v \right\rangle \right|}_{{{\beta }^{2}}}}) \right)\end{eqnarray} \tag{ 10 }$

proposed in [10] with the 'smoothed norm' ${{\left| x \right|}_{\beta }}\;:=\;{{({{\left| x \right|}^{2}}+{{\beta }^{2}})}^{1/2}}$ for some $\beta \gt 0$ and the global measure reads

$\begin{eqnarray}&&{\rm PL}{{{\rm S}}_{\beta }}(u,v)={{\int }_{\Omega }}{{d}_{\varphi ,\psi ,\beta }}(\nabla u,\nabla v).\end{eqnarray} \tag{ 11 }$

Remark 3.1. The choice how to smooth the norm is rather arbitrary. Other common choices are the Huber or the log-cosh approximation.

Remark 3.2. In general, one might want to use different smoothing parameters β and γ for ${{\left| x \right|}_{\beta }}$ and ${{\left| y \right|}_{\gamma }}$ but to keep the notation simple we used the same for both modalities.

Example 3.3. In [10] we applied (11) with $\varphi ,\psi =id$ to denoising and demosaicking of colour images. We will refer to this prior as linear parallel level sets. Another choice which was used in [10] is defined by $\varphi (s)={{(1+s)}^{1/2}}$ , $\psi (s)={{s}^{2}}$ . Due to the quadratic nature of ψ we call it quadratic parallel level sets. This functional was first introduced in [12, 13] where it was motivated by high energy physics. It was observed in [10] that linear parallel level sets leads to sharper images compared to the quadratic version. We will give a theoretical justification for this observation in section 4.

Example 3.4. Another choice used previously in geophysics is (9) with $\varphi =id$ and $\psi (s)={{s}^{2}}$ [6]. This functional coincides with the squared magnitude of the cross product of the gradients proposed by [7–9]. Similar functions have been applied to registration [51, 52] and face recognition [53]. We will only briefly discuss this choice further here because it possesses unfavourable noise suppression properties.

Remark 3.5. The functional proposed in [11] can be rewritten as

$\begin{eqnarray}&&{{\int }_{\Omega }}{{\left| \nabla u \right|}^{2}}-{{\left\langle \nabla u,w \right\rangle }^{2}}.\end{eqnarray} \tag{ 12 }$

It is not exactly a special case of the parallel level sets functional but bears lots of similarities. This functional favours alignment of the gradient of $u$ with the normalized and a priori defined vector-field w. It is beyond the scope of this paper to investigate such asymmetric priors for joint reconstruction.

3.2. Joint total variation

There are many extensions of total variation regularization to multi-modality images. Of importance here is the coupling of the modalities so that one modality can provide information to the other. In the case of two modalities $u,v:\Omega \subset {{\mathbb{R}}^{N}}\to \mathbb{R}$ the joint total variation [6] is defined as

$\begin{eqnarray}&&{\rm JTV}(u,v)={{\int }_{\Omega }}{{\left( {{\left| \nabla u \right|}^{2}}+{{\left| \nabla v \right|}^{2}} \right)}^{1/2}}.\end{eqnarray} \tag{ 13 }$

This functional was first considered as an extension of total variation for colour images [17]. Another generalization of total variation to colour images was proposed in [54]. While the former locally couples the support of the gradients, the latter only provides a global effect. Joint total variation is closely related to block sparsity in the magnitude of the gradient [55–58]. It is known that usual total variation favours sparsity in the gradient domain which is enforced by sparsity in the magnitude of the gradient. Joint total variation favours sparsity in the magnitude of the 'joint gradient' $(\nabla u,\nabla v)$ . Therefore, gradients are more likely to occur at the same position. Another explanation why joint total variation can be useful for joint reconstruction is given in the next section.

We will use here a smoothed version of joint total variation so that we can apply tools from smooth optimization, i.e.,

$\begin{eqnarray}&&{\rm JT}{{{\rm V}}_{\beta }}(u,v)={{\int }_{\Omega }}{{\left( {{\left| \nabla u \right|}^{2}}+{{\left| \nabla v \right|}^{2}}+{{\beta }^{2}} \right)}^{1/2}},\end{eqnarray} \tag{ 14 }$

for some $\beta \gt 0$ . In the notation of the 'smoothed norm' introduced above we can write smooth joint total variation as

$\begin{eqnarray}&&{\rm JT}{{{\rm V}}_{\beta }}(w)={{\int }_{\Omega }}{{\left| \nabla w \right|}_{\beta }},\end{eqnarray} \tag{ 15 }$

where the vector-valued image $w:\Omega \to {{\mathbb{R}}^{2}}$ is defined as $w(x)=(u(x),v(x))$ and the gradient is the Jacobian.

4. Analysis of the generated diffusion process

We have introduced the concepts of parallel level sets and joint total variation to couple the reconstruction of multi-modality images. Priors based on gradient information often lead to a derivative of diffusion type [10], i.e., if a prior is of the form $\mathcal{R}(u)=\int \Psi (\nabla u)$ with $\nabla \Psi (x)=K(x)x$ then the Gâteaux derivative can be identified with

$\begin{eqnarray}&&D\mathcal{R}[u]=-{\rm div}(K(\nabla u)\nabla u).\end{eqnarray} \tag{ 16 }$

The generated diffusion process depends crucially on the diffusivity $K(\nabla u)$ . Its analysis can give more insight how a prior influences the final reconstruction when an algorithm based on derivative information is used.

We will see in the following that in both cases the Gâteaux derivative leads to a nonlinear, inhomogeneous diffusion, i.e., the diffusivity depends on the image and the location. Moreover, in case of parallel level sets the diffusion will be anisotropic, i.e., it depends on the direction [59, 60].

4.1. Parallel level sets

In order to analyse the diffusion generated by parallel level sets we first need to compute the Gâteaux derivative. Most of the derivations have already been shown in [10].

Let us denote the space of continuously differentiable functions with zero Neumann boundary condition by $C_{*}^{1}(\Omega )$ , i.e., the derivative with respect to the normal of the boundary being zero.

Lemma 4.1. ([10]). Let φ, ψ be continuously differentiable functions. The Gâteaux derivative of ${\rm PL}{{{\rm S}}_{\beta }}\;:C_{*}^{1}(\Omega )\times C_{*}^{1}(\Omega )\to \mathbb{R}$ defined by (11) at $(u,v)$ is given by

$\begin{eqnarray}D{\rm PL}{{{\rm S}}_{\beta }}(u,v)=-{\rm div}\left[ \left( \begin{array}{ccccccccccccccc} \kappa (\nabla u,\nabla v) & \tau (\nabla u,\nabla v) \\ \tau (\nabla u,\nabla v) & \kappa (\nabla v,\nabla u) \\ \end{array} \right)\left( \begin{array}{ccccccccccccccc} \nabla u \\ \nabla v \\ \end{array} \right) \right]\end{eqnarray} \tag{ 17 }$

with diffusivities

$\begin{eqnarray}&&\kappa (x,y)\;:=\;\rho (x,y){{\left| y \right|}_{\beta }}\left| x \right|_{\beta }^{-1}{{\psi }^{\prime }}({{\left| x \right|}_{\beta }}{{\left| y \right|}_{\beta }})\geqslant 0,\end{eqnarray} \tag{ 18 }$

$\begin{eqnarray}&&\tau (x,y)\;:=\;-\rho (x,y)\left\langle x,y \right\rangle \left| \left\langle x,y \right\rangle \right|_{{{\beta }^{2}}}^{-1}{{\psi }^{\prime }}({{\left| \left\langle x,y \right\rangle \right|}_{{{\beta }^{2}}}})\in \mathbb{R},\end{eqnarray} \tag{ 19 }$

$\begin{eqnarray}&&\rho (x,y)\;:=\;{{\varphi }^{\prime }}\left( \psi ({{\left| x \right|}_{\beta }}{{\left| y \right|}_{\beta }})-\psi ({{\left| \left\langle x,y \right\rangle \right|}_{{{\beta }^{2}}}}) \right)\geqslant 0.\end{eqnarray} \tag{ 20 }$

Remark 4.2. The derivatives and the matrix operations have to be applied component-wise so that the notation in (17) makes sense.

The flow generated by (17) is difficult to analyse because of the cross-diffusivities τ. Therefore, we write an equivalent formulation which does not rely on cross-diffusivities but on matrix-valued diffusivities.

Proposition 4.3. With the same assumptions as in lemma 4.1 the Gâteaux derivative can be equivalently rewritten as

$\begin{eqnarray}&&D{\rm PL}{{{\rm S}}_{\beta }}(u,v)=-{\rm div}\left( \begin{array}{ccccccccccccccc} K(\nabla u,\nabla v)\nabla u \\ K(\nabla v,\nabla u)\nabla v \\ \end{array} \right),\end{eqnarray} \tag{ 21 }$

where the diffusivities are given by

$\begin{eqnarray}&&K(x,y)\;:=\;\kappa (x,y)I+\mu (x,y)y{{y}^{T}}\in {{\mathbb{R}}^{N\times N}},\end{eqnarray} \tag{ 22 }$

$\begin{eqnarray}&&\mu (x,y)\;:=\;-\rho (x,y)\left| \left\langle x,y \right\rangle \right|_{{{\beta }^{2}}}^{-1}{{\psi }^{\prime }}({{\left| \left\langle x,y \right\rangle \right|}_{{{\beta }^{2}}}})\leqslant 0.\end{eqnarray} \tag{ 23 }$

and κ and ρ by (18) and (20), respectively.

Proof. The assertion for the first equation in (21) follows immediately from the fact that $\left\langle \nabla u,\nabla v \right\rangle \nabla v=(\nabla v\nabla {{v}^{T}})\nabla u$ . Interchanging the roles of $u$ and $v$ completes the proof. □

Remark 4.4. The lemma 4.1 was proven only for ${\rm PL}{{{\rm S}}_{\beta }}$ . Under similar conditions on φ and ψ it also holds for ${\rm PLS}$ with $|\cdot {{|}_{\beta }}$ being replaced by $|\cdot |$ . Hence, this proposition can be applied to ${\rm PLS}$ as well.

For the analysis of the diffusivity we need to remind the reader about the orthogonal complement and the span.

Definition 4.5. For any $y\in {{\mathbb{R}}^{N}}$ we can decompose the space as ${{\mathbb{R}}^{N}}={{y}^{\parallel }}\oplus {{y}^{\bot }}$ where span and the orthogonal complement of y are defined as

$\begin{eqnarray}&&{{y}^{\parallel }}\;:=\;\left\{ sy\in {{\mathbb{R}}^{N}}:s\in \mathbb{R} \right\}\end{eqnarray} \tag{ 24 }$

and

$\begin{eqnarray}&&{{y}^{\bot }}\;:=\;\left\{ x\in {{\mathbb{R}}^{N}}:\left\langle x,y \right\rangle =0 \right\}.\end{eqnarray} \tag{ 25 }$

Lemma 4.6. For all matrices $K=\kappa I+\mu y{{y}^{T}}\in {{\mathbb{R}}^{N\times N}}$ with $\kappa ,\mu \in \mathbb{R}$ the following holds.

(i)
K is symmetric.
(ii)
All $x\in {{y}^{\parallel }}$ are eigenvectors to the eigenvalue ${{\lambda }^{\parallel }}\;:=\;\kappa +\mu {{\left| y \right|}^{2}}$ .
(iii)
All $x\in {{y}^{\bot }}$ are eigenvectors to the eigenvalue ${{\lambda }^{\bot }}\;:=\;\kappa$ .

Proof. Ad (i) There is

$\begin{eqnarray}&&{{K}^{T}}={{(\kappa I+\mu y{{y}^{T}})}^{T}}=\kappa {{I}^{T}}+\mu {{(y{{y}^{T}})}^{T}}=\kappa I+\mu y{{y}^{T}}=K.\end{eqnarray} \tag{ 26 }$

Ad (ii) Let $x\in {{y}^{\parallel }}$ . So there is $x=sy$ for some $s\in \mathbb{R}$ and hence

$\begin{eqnarray}&&Kx=sKy=s(\kappa I+\mu y{{y}^{T}})y=s(\kappa y+\mu {{\left| y \right|}^{2}}y)\end{eqnarray} \tag{ 27 }$

$\begin{eqnarray}&&\quad \ \;=\,(\kappa +\mu {{\left| y \right|}^{2}})x={{\lambda }^{\parallel }}x.\end{eqnarray} \tag{ 28 }$

Ad (iii) Let $x\in {{y}^{\bot }}$ . Then there is

$\begin{eqnarray}&&Kx=(\kappa I+\mu y{{y}^{T}})x=\kappa x+\mu y\left\langle y,x \right\rangle =\kappa x={{\lambda }^{\bot }}x.\end{eqnarray} \tag{ 29 }$

□

To analyse the diffusion we need to introduce local coordinates for the image domain.

Definition 4.7. (local coordinates). Let $f:\Omega \subset {{\mathbb{R}}^{N}}\to \mathbb{R}$ be a differentiable function. We denote by $R[f]$ a local coordinate mapping $R[f]:\Omega \to {{\mathbb{R}}^{N\times N}}$ such that for any point $x\in \Omega$ the columns of $R[f](x)$ form a basis of $\nabla f{{(x)}^{\parallel }}\oplus \nabla f{{(x)}^{\bot }}$ .

Theorem 4.8. Let $R[u]$ and $R[v]$ be local coordinates for $u$ and $v$ . Then the diffusion (21) can be rewritten as

$\begin{eqnarray}&&D{\rm PL}{{{\rm S}}_{\beta }}[u,v]=-{\rm div}\left( \begin{array}{ccccccccccccccc} R[v]\;\Lambda [u,v]\;R{\mkern 1mu} {{[v]}^{T}}\;\nabla u \\ R[u]\;\Lambda [v,u]\;R{\mkern 1mu} {{[u]}^{T}}\;\nabla v \\ \end{array} \right),\end{eqnarray} \tag{ 30 }$

where the diffusivities are defined as

$\begin{eqnarray}&&\Lambda [u,v]\;:=\;{\rm diag}({{\lambda }^{\parallel }}(\nabla u,\nabla v),{{\lambda }^{\bot }}(\nabla u,\nabla v),\ldots ,{{\lambda }^{\bot }}(\nabla u,\nabla v))\end{eqnarray} \tag{ 31 }$

with

$\begin{eqnarray}&&{{\lambda }^{\parallel }}(x,y)\;:=\;\kappa (x,y)+\mu (x,y){{\left| y \right|}^{2}}\in \mathbb{R}\qquad {{\lambda }^{\bot }}(x,y)\;:=\;\kappa (x,y)\geqslant 0\end{eqnarray} \tag{ 32 }$

for κ and μ as defined in (18) and (23).

Proof. This result follows immediately from proposition 4.3 combined with lemma 4.6.

□

Remark 4.9. Theorem 4.8 tells us that the diffusion for the image $u$ generated by the parallel level sets functional has principal directions given by the other image $v$ and vice versa. Moreover, the directions but not the amount of the diffusion are independent of the functions φ and ψ.

Remark 4.10. In terms of the function $\tilde{\psi }(s)\;:=\;{{\psi }^{\prime }}(s)/s$ the diffusivity in the direction of the image gradient can be rewritten as

$\begin{eqnarray}&&{{\lambda }^{\parallel }}(x,y)=\rho (x,y)\left( \left| y \right|_{\beta }^{2}\tilde{\psi }({{\left| x \right|}_{\beta }}{{\left| y \right|}_{\beta }})-{{\left| y \right|}^{2}}\tilde{\psi }({{\left| \left\langle x,y \right\rangle \right|}_{{{\beta }^{2}}}}) \right)\end{eqnarray} \tag{ 33 }$

so the sign of the diffusivity depends on the monotonicity of $\tilde{\psi }$ . This explains our choice to name the different methods of parallel level sets by the choice of ψ.

Example 4.11. (Linear Parallel Level Sets). The principal diffusivities for the linear parallel level sets functional, i.e., (11) with $\varphi ,\psi =id$ , are given by

$\begin{eqnarray}&&{{\lambda }^{\parallel }}={{\left| y \right|}_{\beta }}\left| x \right|_{\beta }^{-1}-{{\left| y \right|}^{2}}\left| \left\langle x,y \right\rangle \right|_{{{\beta }^{2}}}^{-1}\end{eqnarray} \tag{ 34 }$

$\begin{eqnarray}&&=-{{\left| y \right|}_{\beta }}\left| x \right|_{\beta }^{-1}\left| \left\langle x,y \right\rangle \right|_{{{\beta }^{2}}}^{-1}\left( \frac{{{\left| x \right|}_{\beta }}\left| y \right|}{\left| x \right|\ {{\left| y \right|}_{\beta }}}\left| x \right|\left| y \right|-\frac{{{\left| \left\langle x,y \right\rangle \right|}_{{{\beta }^{2}}}}}{|\left\langle x,y \right\rangle |}|\left\langle x,y \right\rangle | \right)\end{eqnarray} \tag{ 35 }$

$\begin{eqnarray}&&{{\lambda }^{\bot }}={{\left| y \right|}_{\beta }}\left| x \right|_{\beta }^{-1}\gt 0.\end{eqnarray} \tag{ 36 }$

It is of interest to analyse the four cases

$\begin{eqnarray*}&&\begin{array}{l} {\rm (i)}\;{\rm vanishing}\;{\rm gradient}\;{\rm information,}\;{\rm i}{\rm .e}{\rm .,}\;\left| x \right|\to 0, \\ {\rm (ii)}\;{\rm vanishing}\;{\rm side}\;{\rm information,}\;{\rm i}{\rm .e}{\rm .,}\;\left| y \right|\to 0, \\ {\rm (iii)}\;{\rm large}\;{\rm gradients,}\;{\rm i}{\rm .e}{\rm .,}\;\beta \to 0,\;{\rm and} \\ {\rm (iv)}\;{\rm large}\;{\rm and}\;{\rm aligned}\;{\rm gradients,}\;{\rm i}{\rm .e}{\rm .,}\beta \to 0,|\left\langle x,y \right\rangle |=\left| x \right|\left| y \right|, \\ {} \\ \end{array}\end{eqnarray*}$

$\begin{eqnarray*}\begin{array}{ccc} ({\rm i}) & {{\lambda }^{\parallel }}\to {{\beta }^{-1}}({{\left| y \right|}_{\beta }}-{{\beta }^{-1}}{{\left| y \right|}^{2}}) & {{\lambda }^{\bot }}\to {{\beta }^{-1}}{{\left| y \right|}_{\beta }} \\ ({\rm ii}) & {{\lambda }^{\parallel }}\to \beta \left| x \right|_{\beta }^{-1}\gt 0 & {{\lambda }^{\bot }}\to \beta \left| x \right|_{\beta }^{-1} \\ ({\rm iii}) & {{\lambda }^{\parallel }}\to -\left| y \right|{{\left| x \right|}^{-1}}|\left\langle x,y \right\rangle {{|}^{-1}}\left( \left| x \right|\left| y \right|-|\left\langle x,y \right\rangle | \right)\leqslant 0 & {{\lambda }^{\bot }}\to \left| y \right|{{\left| x \right|}^{-1}} \\ ({\rm iv}) & {{\lambda }^{\parallel }}\to 0 & {{\lambda }^{\bot }}\to \left| y \right|{{\left| x \right|}^{-1}}. \\ \end{array}\end{eqnarray*}$

Example 4.12. (Quadratic Parallel Level Sets). The principal diffusivities for the quadratic parallel level sets functional, i.e., (11) with $\varphi (s)=\sqrt{s+1}$ and $\psi (s)={{s}^{2}}$ are given by

$\begin{eqnarray}&&{{\lambda }^{\parallel }}={{\beta }^{2}}{{\left( 1+\left| x \right|_{\beta }^{2}\left| y \right|_{\beta }^{2}-\left| \left\langle x,y \right\rangle \right|_{{{\beta }^{2}}}^{2} \right)}^{-1/2}}\gt 0\end{eqnarray} \tag{ 37 }$

$\begin{eqnarray}&&{{\lambda }^{\bot }}=\left| y \right|_{\beta }^{2}{{\left( 1+\left| x \right|_{\beta }^{2}\left| y \right|_{\beta }^{2}-\left| \left\langle x,y \right\rangle \right|_{{{\beta }^{2}}}^{2} \right)}^{-1/2}}\gt 0.\end{eqnarray} \tag{ 38 }$

For the special cases considered in example 4.11 we get

$\begin{eqnarray*}\begin{array}{ccc} {\rm (i)} & {{\lambda }^{\parallel }}\to {{\beta }^{2}}{{(1+{{\beta }^{2}}{{\left| y \right|}^{2}})}^{-1/2}} & {{\lambda }^{\bot }}\to \left| y \right|_{\beta }^{2}{{(1+{{\beta }^{2}}{{\left| y \right|}^{2}})}^{-1/2}} \\ {\rm (ii)} & {{\lambda }^{\parallel }}\to {{\beta }^{2}}{{(1+{{\beta }^{2}}{{\left| x \right|}^{2}})}^{-1/2}} & {{\lambda }^{\bot }}\to {{\beta }^{2}}{{(1+{{\beta }^{2}}{{\left| x \right|}^{2}})}^{-1/2}} \\ {\rm (iii)} & {{\lambda }^{\parallel }}\to 0 & {{\lambda }^{\bot }}\to {{\left| y \right|}^{2}}{{\left( 1+{{\left| x \right|}^{2}}{{\left| y \right|}^{2}}-{{\left\langle x,y \right\rangle }^{2}} \right)}^{-1/2}} \\ {\rm (iv)} & {{\lambda }^{\parallel }}\to 0 & {{\lambda }^{\bot }}\to {{\left| y \right|}^{2}}. \\ \end{array}\end{eqnarray*}$

Remark 4.13. Example 4.11 shows that linear parallel level sets yields in some cases negative principal diffusivities. This is commonly referred to as backward diffusion. Backward diffusion can lead to instabilities but also might result in sharper images. The principal diffusivities of quadratic parallel level sets are all positive which makes it more stable. This analysis explains the observation in [10] where linear parallel sets resulted in sharper images compared to quadratic parallel level sets.

Remark 4.14. These examples also show that the diffusion is truely anisotropic as the diffusion along and perpendicular to the gradient are different except for the case when the side information vanishes.

Example 4.15. The principal diffusivities for example 3.4 are given by

$\begin{eqnarray}&&{{\lambda }^{\parallel }}=0\end{eqnarray} \tag{ 39 }$

$\begin{eqnarray}&&{{\lambda }^{\bot }}={{\left| y \right|}^{2}}.\end{eqnarray} \tag{ 40 }$

The flow generated is very simple as there is only diffusion only along the edges. It stops when side information is flat and is therefore not able to remove noise in this case.

4.2. Joint total variation

In this section we discuss the diffusion which is generated by joint total variation. It follows immediately from the well known Gâteaux derivative of total variation (see [61]) that the Gâteaux derivative of joint total variation can be identified with

$\begin{eqnarray}&&D{\rm JT}{{{\rm V}}_{\beta }}[u,v]=-{\rm div}\left( \begin{array}{ccccccccccccccc} \eta (u,v)\nabla u \\ \eta (u,v)\nabla v \\ \end{array} \right).\end{eqnarray} \tag{ 41 }$

where the diffusivity for both modalities is

$\begin{eqnarray}&&\eta (u,v)\;:=\;{{\left( {{\left| \nabla u \right|}^{2}}+{{\left| \nabla v \right|}^{2}}+{{\beta }^{2}} \right)}^{-1/2}}.\end{eqnarray} \tag{ 42 }$

The diffusivity for both modalities are the same. It reduces to zero when either $\nabla u$ or $\nabla v$ is large, hence information about an edge can be transferred from one modality to the other.

The diffusion of joint total variation is isotropic as the diffusivities are scalar-valued. Similarly to usual total variation [62] one can consider joint total variation with anisotropic diffusion but this is out of the scope of this paper.

5. Numerical experiments

5.1. Technical details

5.1.1. Implementation

In our experiments we aim to compute minimizers of (3) with and without an explicit prior. If no explicit prior is used we terminate the iterations before convergence to regularize the reconstruction. The priors we used include total variation, joint total variation (see section 3.2) and linear and quadratic parallel level sets (see section 3.1). As we want to compare the different priors for joint reconstruction we chose to minimize the functional with the same quasi-Newton method (L-BFGS-B [63], software available at [64] with mex wrapper for use in MATLAB [65]) in every case. The MATLAB implementation is available as supplementary data from stacks.iop.org/ip/31/015001/mmedia can be obtained from the authors' homepage [66].

Quasi-Newton methods rely on gradient information only and approximate the Hessian. The gradient of total variation, joint total variation (41), quadratic and linear parallel level (21) sets all are of diffusion type. Hence, we need to discretize gradients and divergences. Gradients are discretized with forward differences and the divergence with backward differences so that the discretized gradient corresponds to the gradient of the discretized functional.

5.1.2. Data sets

We consider the setting of joint PET and MRI reconstruction. The PET operator is modelled to be the discrete x-ray transform taken from [67]. To model the lack of resolution the operator consists also of a convolution in the sinogram domain in the radial direction with a Gaussian of full width half maximum of five pixels (for sinograms of size 128 × 300). The ground truth models for both modalities together with the blurry sinogram for PET are shown in figure 3. Six different cases of MRI undersampling were tested: the first case of sampling is a fully sampled MRI (full). For the second and third sampling the data were acquired along 20 and 15 radial lines in k-space (radial20, radial15). The fourth and fifth case are spirals with either uniform density (spiralUni) or a higher density in the high frequencies (spiralHigh). In the last test case we performed Cartesian sampling but only acquired every second line (lines2). The MRI data are presented next to the results. Gaussian noise with expected energy of 4% is added to the MRI. The PET data is an instance of a multi-variate Poisson distribution with an expected number of 1e+6 counts in total.

**Figure 3.** The figure shows the ground truth of PET and MRI as well as the PET data. The region of interest for the quantatitive analysis is marked in red.
Download figure:
Standard image High-resolution image

5.1.3. Parameter selection

As both the linear and the quadratic parallel level set priors are non-convex (see appendix A) we might end up with a local minimizer. Therefore, a good initial guess is important to end up at a desirable local minimizer.

The parameters α and β are chosen to minimize the relative ℓ²-error of the reconstruction $x$ and the ground truth ${{x}^{*}}$

$\begin{eqnarray}&&e(x)\;:=\;\mid x-{{x}^{*}}\mid /\mid {{x}^{*}}\mid ,\end{eqnarray} \tag{ 43 }$

where the norms are only taken over the region of interest, see figure 3. This region of interest was chosen to give no weight to the background and less to large uniform regions as well as the outer boundary.

5.2. Results

The joint reconstruction approaches described in section 3 are compared against separated reconstruction which is performed either by total variation or having no explicit prior. The latter means early stopping for PET or minimal energy (zero filled) reconstruction for MRI.

5.2.1. Visual assessment

Figure 4 shows the results for the case full with the reconstructed images shown on the left and the error images (compared to the ground truth) on the right where blue indicates over and red underestimation. The MRI reconstructions are visually perfect for all priors as it can be seen from both the images themself and the error images. The results for PET differ a lot between the different priors. When the iterations have been terminated the reconstruction looks the worst. Most of the introduced artefacts are removed when using total variation but the resolution in PET remains more or less the same. Coupling PET and MRI with joint total variation results in a superior PET image compared to total variation. Shared edges are better reconstructed using quadratic parallel level sets. In the case of linear parallel level sets almost no errors at shared edges can be observed. In areas of shared information the images look almost indistinguishable from the ground truth. The edges which are not shared between the two modalities introduce a fake edge artefact. From the error images in PET we see that the errors are decreasing from top to bottom. The errors for linear parallel level sets are almost only in regions of non-shared edge information; both coincide with the visual impression on the left.

The results for radial20 are shown in figure 5. With only 20 radial spokes a zero-filled reconstruction does not give a reasonable reconstruction for MRI any more but including total variation into the reconstruction resolves most of it. While joint total variation and quadratic parallel level sets are not able to improve on this linear parallel level sets is able to reconstruct with a lot less errors and only the finest lines can not be resolved. The PET reconstructions are similar to the fully sampled case but linear parallel level sets does show a few errors in regions of shared edge information. It can also be seen that the two blobs in PET not being present in MRI are blurrier than total variation and similar to early stopping. This is probably due to the parameter β being constant over and optimized for the whole phantom.

Reducing the number of spokes to 15 (see figure 6 ) the difference between separate and joint reconstruction becomes even larger with linear parallel level sets still being able to reconstruct reasonable images with only a few errors.

More results are shown in figures 7 and 8 where the MRI data are sampled along spiral trajectories.

**Figure 7.** Reconstructions for `spiralUni` on the left and their error images on the right. The methods' abbreviations are TV=total variation, PLS=parallel level sets.
Download figure:
Standard image High-resolution image

**Figure 8.** Reconstructions for `spiralHigh` on the left and their error images on the right. The methods' abbreviations are TV=total variation, PLS=parallel level sets.
Download figure:
Standard image High-resolution image

The reconstructions from Cartesian undersampling lines2 are shown in figure 9. While the separated MRI reconstruction shows a two-fold ghosting of the object, joint reconstruction, especially linear parallel level sets, can help to complete the missing information. The ghosting of the inner circle is not fully removed as there is no information in the PET data about this object.

5.2.2. Quantitative assessment

Quantitative results using the relative ℓ²-error over the whole image can be found in figure 10. The errors for for PET are shown on the left and for MRI on the right. For PET it is easy to see that for all cases of sampling separate reconstruction is always worse than joint reconstruction with joint total variation being weaker than parallel level sets. Among the two parallel level sets priors the linear one performs much better. Furthermore, it can be seen that the results are quite robust on the kind and amount of under-sampling.

For full sampling in MRI all methods give very good results. When MRI is reconstructed from under-sampled data using the zero-filled Fourier transform the results are always the worst. Total variation performs as well as joint total variation and quadratic parallel level sets in some cases. The errors of the linear parallel level sets reconstructions are smaller in all the five cases of under-sampling.

6. Conclusion

This work is the first contribution to joint reconstruction in multi-modality medical imaging. We have shown that the previously used framework for joint reconstruction coincides with our belief about multi-modality imaging systems like PET-MRI. Three different priors have been proposed for joint reconstruction of PET-MRI and the corresponding diffusion has been analyzed. This analysis gives more insight on what images are favoured by these priors.

We presented results where blurry Radon data with Poisson noise (PET) and under-determined Fourier data with Gaussian noise (MRI) have been combined in a joint reconstruction process. The numerical results show that our joint reconstruction framework can successfully couple information from these two imaging systems so that both modalities benefit.

Similar to multiple priors for one modality [68–70] one might be able to include better the given a priori information by using multiple priors. Moreover, the optimization methods have to be adapted to the non-convexity of the parallel level set prior. The effect of noise and non-local parameter tuning for joint reconstruction will also be subject of further research.

Acknowledgments

This research was funded by the EPSRC (EP/K005278/1) and supported by the National Institute for Health Research University College London Hospitals Biomedical Research Centre. MJE was supported by an IMPACT studentship funded jointly by Siemens and the UCL Faculty of Engineering Sciences.

Appendix A.: Non-convexity of parallel level sets

In this section we show that all considered versions of the parallel level sets framework are not convex. Under mild conditions a functional $\mathcal{R}(u,v)\;:=\;\int f(\nabla u,\nabla v)$ is convex if and only if $f$ is convex. We will now analyze $f$ for the special cases considered above.

Proposition A.1. Let X be a vector space and $f:X\times X\to \mathbb{R}$ a convex functional. Then the growth condition

$\begin{eqnarray}&&2f(x/2,y/2)\leqslant f(x,0)+f(0,y)\end{eqnarray} \tag{ A.1 }$

holds for all $x,y\in X$ .

Proof. Let $s=1/2$ , ${{z}_{0}}=(x,0)$ and ${{z}_{1}}=(0,y)$ . As $f$ is convex there is

$\begin{eqnarray}&&f(x/2,y/2)=f(s{{z}_{0}}+(1-s){{z}_{1}})\end{eqnarray} \tag{ A.2 }$

$\begin{eqnarray}&&\leqslant \;sf({{z}_{0}})+(1-s)f({{z}_{1}})=1/2f(x,0)+1/2f(0,y)\end{eqnarray} \tag{ A.3 }$

and the assertion follows directly. □

Example A.2. The quadratic parallel level sets functional (see example 3.3) which corresponds to

$\begin{eqnarray}&&f(x,y)={{\left( 1+\left| x \right|_{\beta }^{2}\left| y \right|_{\beta }^{2}-\left| \left\langle x,y \right\rangle \right|_{{{\beta }^{2}}}^{2} \right)}^{1/2}}\end{eqnarray} \tag{ A.4 }$

is not convex in the joint argument.

Proof. Let $x,y\in {{\mathbb{R}}^{N}}$ so that $\left\langle x,y \right\rangle =0$ and $\left| x \right|=\left| y \right|=1$ . Furthermore, set $\hat{x}\;:=\;sx$ and $\hat{y}\;:=\;sy$ for any $s\in \mathbb{R}$ . Then the necessary growth condition (A.1) for $f(\hat{x},\hat{y})$ reads

$\begin{eqnarray}&&2{{\left( 1+{{\beta }^{2}}{{s}^{2}}/2+{{s}^{4}}/16 \right)}^{1/2}}\leqslant 2{{\left( 1+{{\beta }^{2}}{{s}^{2}} \right)}^{1/2}}\end{eqnarray} \tag{ A.5 }$

which does not hold for $s\gt \sqrt{8}\beta$ . □

Example A.3. The linear parallel level sets functional (see example 3.3) defined via

$\begin{eqnarray}&&f(x,y)={{\left| x \right|}_{\beta }}{{\left| y \right|}_{\beta }}-{{\left| \left\langle x,y \right\rangle \right|}_{{{\beta }^{2}}}}\end{eqnarray} \tag{ A.6 }$

is not convex in the joint argument.

Proof. For $x,y\in {{\mathbb{R}}^{N}}$ defined as in the proof of example A.2 the necessary growth condition (A.1)

$\begin{eqnarray}&&2{{\left( {{\beta }^{4}}+{{\beta }^{2}}{{s}^{2}}/2+{{s}^{4}}/16 \right)}^{1/2}}-2{{\beta }^{2}}\leqslant 2\beta {{\left( {{s}^{2}}+{{\beta }^{2}} \right)}^{1/2}}-2{{\beta }^{2}}\end{eqnarray} \tag{ A.7 }$

which is again violated for $s\gt \sqrt{8}\beta$ . □

Joint reconstruction of PET-MRI by exploiting structural similarity

Article metrics

Submit

Author e-mails

Author affiliations

Dates

Peer review information

Abstract

1. Introduction

1.1. Contributions

1.2. Related Work

2. Problem setting and notation

2.1. Positron emission tomography

2.2. Magnetic resonance imaging

2.3. Probabilistic modelling for joint reconstruction

3. Methods for joint reconstruction

3.1. Parallel level sets

3.1.1. Measuring parallelism

3.1.2. Measuring parallelism of vectors

3.1.3. Measuring parallelism of images

3.2. Joint total variation

4. Analysis of the generated diffusion process

4.1. Parallel level sets

4.2. Joint total variation

5. Numerical experiments

5.1. Technical details

5.1.1. Implementation

5.1.2. Data sets

5.1.3. Parameter selection

5.2. Results

5.2.1. Visual assessment

5.2.2. Quantitative assessment

6. Conclusion

Acknowledgments

Appendix A.: Non-convexity of parallel level sets

Joint reconstruction of PET-MRI by exploiting structural similarity

Article metrics

Submit

Share this article

Author e-mails

Author affiliations

Dates

Peer review information

Abstract

1. Introduction

1.1. Contributions

1.2. Related Work

2. Problem setting and notation

2.1. Positron emission tomography

2.2. Magnetic resonance imaging

2.3. Probabilistic modelling for joint reconstruction

3. Methods for joint reconstruction

3.1. Parallel level sets

3.1.1. Measuring parallelism

3.1.2. Measuring parallelism of vectors

3.1.3. Measuring parallelism of images

3.2. Joint total variation

4. Analysis of the generated diffusion process

4.1. Parallel level sets

4.2. Joint total variation

5. Numerical experiments

5.1. Technical details

5.1.1. Implementation

5.1.2. Data sets

5.1.3. Parameter selection

5.2. Results

5.2.1. Visual assessment

5.2.2. Quantitative assessment

6. Conclusion

Acknowledgments

Appendix A.: Non-convexity of parallel level sets