Paper The following article is Open access

BlindNet: an untrained learning approach toward computational imaging with model uncertainty

, and

Published 18 October 2021 © 2021 The Author(s). Published by IOP Publishing Ltd
, , Special Issue on Interferometric Scattering Microscopy Citation Xiangyu Zhang et al 2022 J. Phys. D: Appl. Phys. 55 034001 DOI 10.1088/1361-6463/ac2ad4

0022-3727/55/3/034001

Abstract

The solution of an inverse problem in computational imaging (CI) often requires the knowledge of the physical model and/or the object. However, in many practical applications, the physical model may not be accurately characterized, leading to model uncertainty that affects the quality of the reconstructed image. Here, we propose a novel untrained learning approach towards CI with model uncertainty, and demonstrate it in phase retrieval, an important CI task that is widely encountered in biomedical imaging and industrial inspection.

Export citation and abstract BibTeX RIS

Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 license. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

1. Introduction

Various classes of inverse problems in computational imaging (CI) can be formulated as the reconstruct of an object image $\hat{X}$ as close as possible to the ground truth, X, from the measurement of Y [1]:

Equation (1)

where the symbol $\|\cdot\|$ denote the $\mathcal{L}_2$ norm, and f the imaging model that relates the object X to the measured data Y. In the simplest case, where Y and X possess a linear relationship characterized by a matrix H, the objective function (1) is reduced to [2, 3]:

Equation (2)

Equation (2) is widely encountered in divergent fields of research, ranging from computer tomography (CT) and MRI to computational ghost imaging (CGI). In CT, H is interpreted as the Radon transform [2]; whereas in CGI, it represents the set of illumination patterns [4]. The general problem described by equation (1) is widely found in many coherent phase imaging, both in the optical [5] and x-ray domains [6]. In this case, the physical model f is represented as the magnitude of a linear canonical transform, such as a coherent free-space propagation of X.

Historically, such inverse problems can be solved by using algorithms such as least squares and descent gradient if the matrix H is invertible. However, in most practical problems, the matrix H is ill-conditioned so that regularization methods are highly required to overcome the stabilization of the solution. The basic idea is to add a regularization term $\alpha\phi(X)$, where α is a parameter, to the objective function (1) or (2) and converse the ill-posed problem into a nearby well-posed problem. One can adopt a regularizer, such as the Tikhonov regularization [2, 3], $\mathcal{L}_1$ regularization [7, 8], or total variation [9, 10] dependent on the prior knowledge about the object. Various algorithms have been proposed to solve the regularized objective functions that are encountered in CI. Examples includes tomography [11], phase retrieval [1], digital holography (DH) [12], and deconvolution confocal [13] and light field microscopy [14].

Usually, all these algorithms work on the grounds that H is explicitly known. For example, when the Tikhonov regularization is applied, the solution to the inverse problem is given explicitly by [2]: $\hat{X} = (\mathbf{H}^{\,T}\,\mathbf{H}+\alpha\mathbf{I})^{-1}\mathbf{H}Y$, where HT is the transpose of H, and I is the unit tensor that matches the dimension of $\mathbf{H}^T\mathbf{H}$. When $\mathcal{L}_1$ regularization is applied, one can solve it iteratively by using an algorithm like the iterative shrinkage-thresholding algorithms [7] in which the matrix H enters the iteration process, i.e. $\hat{X}_{K+1} = \mathcal{T}_{\alpha t}(\hat{X}_k - 2~t \mathbf{H}^T(\mathbf{H}\hat{X}_k-Y))$, where t is a stepsize and $\mathcal{T}_{\alpha t}:\mathbb{R}^n \rightarrow\mathbb{R}^n$ is the shrinkage operator. As a consequence, the solution is strongly dependent on the accuracy of the mathematical model H with respect to the true physics.

However, model uncertainty unavoidably exists owing to several factors. First, it is apparent that a mathematical model H is only an approximation to the true physics in the first place. One of the well known examples in optics is the Fresnel approximation of the scalar diffraction that is frequently used in the modeling of optical imaging systems [15]. Needless to say, for a complex system, such as the scattering of coherent light by irregular particles, only approximate models are available [16]. Second, in the case where the model is described by a set of parameters, some of these parameters may not be accurately known or measured. One encounters such parameter uncertainty in many tasks, ranging from evaluating the distance parameter of free-space propagation as in refocusing a digital hologram [17, 18], and evaluating the phase step in structure illumination microscopy [19], to measuring the projection angles in quantitative photoacoustic tomography [20]. Third, in the case that the imaging channel is imperfect, the theoretical model H can be deviated from the true physics. Imaging through scatter [2124] is one of the typical cases.

Conventionally, the way to address the issue of model uncertainty is to quantify it by using Bayesian approximation or ensemble learning techniques. For example, in the case that a system parameter is inaccurate, one can adopt a Monte Carlo [25] or a data-driven deep learning [26] approach to optimize its value. Alternatively, one can take it as a random vector, and characterize its probability distribution [27]. Recently, a new approach of deep neural networks [28, 29] has been demonstrated to be very efficient to quantify the uncertainty. As most of the conventional DNN techniques for CI [30], a large set of labeled data is required to train the DNN for the uncertainty quantification task [28, 29].

Here, we propose an untrained DNN approach for CI with model uncertainty and demonstrate it in a platform of phase imaging. It is well known that phase imaging is a powerful label-free imaging technique for non-destructive imaging of transparent biological structures [31, 32]. In addition, it allows for a rapid data acquisition time even under low-intensity illumination [33], and therefore a reduction of phototoxicity in comparison to fluorescence imaging. Phase imaging can be implemented on various platforms, ranging from ultra-portable instruments to custom-engineered modules incorporated into standard microscopes [34]. In this study, we take the propagation distance uncertainty in phase retrieval as an example to demonstrate the principle. Other than works presented in [28, 29], the proposed Blind reconstruction deep neural networks (BlindNet) attempts to retrieve the phase information even when the distance between the object and the camera is unknown.

2. The model of phase imaging and the concept of BlindNet

We first describe the model of phase imaging. In this model, a coherent plane wave of wavelength λ illuminates a phase object $\phi(x,y,0)$, forming a complex amplitude $U_{0}(x, y, 0) = \exp [i \phi(x, y, 0)]$ immediately behind the phase object. After propagating over a distance z = d, we can write the diffraction pattern of it as [35]:

Equation (3)

where $\widehat{U}_{0}\left(\,f_{x}, f_{y}\right)$ is the Fourier transform of U0, fx and fy are the spatial frequencies in the x and y directions, and $D\left(\,f_{x}, f_{y}, d\right) = \exp \left(i 2~\pi d \sqrt{1-\lambda^{\,2} f_{x}^{\,2}-\lambda^{\,2} f_{y}^{\,2}} / \lambda\right)$, where $\left(\,f_{x}^{\,2}+f_{y}^{\,2}\lt\frac{1}{\lambda^{\,2}}\right)$, is the transfer function. The intensity of the diffraction pattern captured by the camera is:

Equation (4)

It can be expressed abstractly as:

Equation (5)

where $H[\cdot]$ represents the physical model that transforms the phase $\phi(x, y, 0)$ to the captured diffraction pattern $I(x, y, d)$.

The aim of phase imaging is to retrieve the phase $\phi(x, y, 0)$ from $I(x, y, d)$. This problem is encountered not only in biological imaging [31, 32] but also in the evaluation of autofocusing distances in DH [17, 18, 26, 34], spanning a wide spectra ranging from Terahertz to diffraction imaging in x-ray [6]. Clearly, there are two unknowns here, namely, the phase object $\phi(x,y,0)$ and diffraction distance d. All the existing methods are developed based on an assumption or fact that d is known or can be accurately measured. This is in particular true for diffraction imaging in the x-ray domain [6] where d takes an effective value of $\infty$, and equation (3) is simplified to the Fourier transform. In general, solving for these two unknowns simultaneously is quite time-consuming, as conventionally one employs a brute force strategy, running an iterative phase retrieval algorithm for every possible d.

The proposed BlineNet takes the unknown d together with the unknown phase $\phi(x,y,0)$ as parameters to be optimized. The corresponding objective function can be formulated as:

Equation (6)

where Rθ is defined as a neural network specified by the set of parameters θ that map the diffraction pattern I back to the phase object φ, the physical model $H[\cdot]$ defined as equation (5) propagates the estimated phase φ over an unknown diffraction distance d to the camera plane and calculates the resultant intensity. It is worthy of pointing out some key characteristics of the BlineNet model described by equation (6). First, one can see that only the recorded diffraction pattern I enters as the argument of the objective function (6). This means that the BlindNet does not need to be trained on any labeled or unlabeled dataset, characterizing a significant feature in comparison to other training-based methods such as those reported in [28, 29]. Second, similar to PhysenNet [1], there are two parts in BlindNet, a conventional neural network Rθ and a physical model H. The neural network Rθ maps the diffraction pattern inversely back to the phase whereas the forward physical model H calculates the diffraction pattern from an estimated phase. It is the interplay between the physical model and the neural network that drives the finding of a feasible solution. But the significantly difference of BlindNet from PhysenNet is that the diffraction distance d is not necessary to be known. Third, other than leaning-based phase imaging methods [28, 33, 34, 36] the BlindNet output satisfies the physical constraint imposed by the model H and thus is interpretable.

The basic idea of BlindNet is schematically illustrated in figure 1. The diffraction intensity pattern $I(x, y, d)$ captured by the camera is the only input of the neural network. Then the neural network outputs an estimated $\tilde{\phi}$ of phase object. The free-space propagation of the estimated $\tilde{\phi}$ over a distance d' is calculated according to equation (5), resulting in an estimated intensity pattern $\tilde{I}\left(x, y, d^{^{\prime}}\right)$. Then, the error between the estimated diffraction pattern $\tilde{I}\left(x, y, d^{^{\prime}}\right)$ and the experimentally captured one, $I(x, y, d)$, is evaluated, and used to adjust the parameter θ of the neural network and the distance d' according to the back propagation algorithm [37]. The updated neural network is then used to estimate a new phase from $I(x,y,d)$ again; and the algorithm enters the next iteration. The above iteration process does not stop until the neural network is optimized when $\tilde{\phi}(x,y,0)\rightarrow\phi(x,y,0)$ and $d^{^{\prime}}\rightarrow d$. This indicates that the physical constraint $\tilde{I}\left(x, y, d^{^{\prime}}\right)\rightarrow I(x,y,d)$ is explicitly satisfied.

Figure 1.

Figure 1. Schematic illustration of the basic concept of BlindNet.

Standard image High-resolution image

Now, let us go into the detail of the structure of the neural network. The neural network we designed was actually inspired by UNet++ [38], a powerful neural network architecture that was originally developed for medical image segmentation. Here, we adopt it for phase reconstruction. As schematically shown in figure 2, the pyramid-shape architecture of UNet++ essentially consists of an encoder and a decoder that are connected through a series of nested, dense skip pathways, bridging the semantic gap between the feature maps of the encoder and the decoder prior to fusion. This is a significant improvement in comparison to U-Net that has been widely employed for CI [1, 36]. Specifically, a convolution layer is denoted as a node $X^{i, j}$ in UNet++, where the superscript i is the index of the down-sampling layer along the encoder and, j, of the convolution layer of the dense block along the skip pathway. The stack of feature maps presented by $x^{i,j}$ is the output of the node $X^{i,j}$, and computed as [38]:

Equation (7)

where $\mathcal{H}(\cdot)$ is the convolution operation followed by an activation function, $\mathcal{V}(\cdot)$ denotes an up-sampling layer and $[\cdot]$, the concatenation layer.

Figure 2.

Figure 2. A schematic illustration of the BlindNet architecture.

Standard image High-resolution image

The input of the neural network is the data one wishes the network to process. In this study, the diffraction pattern is recorded by the camera. The size of it is $256\times256\times1$. When passing through each down-sampling convolution layer all the way down to $X^{4,0}$, it produces a feature map of the size $256\times256\times32$, $128\times128\times64$, $64\times64\times128$, $32\times32\times256$, and $16\times16\times512$, respectively, according to equation (7). As shown in figure 2, each of these feature maps then undergo a dense up-sampling deconvolution block, which has a number of deconvolution layers depending on its level in the pyramid. Each deconvolution layer is preceded by a concatenation layer that fuses the output from the upstream layer of the same dense block with the corresponding up-sampled output of the lower dense block. The up-sampling is conducted all the way up to $X^{0,4}$, which outputs an image with the size of $256\times256\times32$. We use an additional convolution layer to resize it to $256\times256\times1$. This image then passes through the model determined by equation (5), and obtains the simulated diffraction pattern with the size of $256\times256\times1$. The parameter d' in the model is a learnable variable similar to the parameter of the neural network. The activation function used by the neural network is the Leaky ReLU [39]. The down-sampling operation is $2\times2$ maximum pooling, which reduces the area of the image to half. The up-sampling operation doubles the area of the picture by deconvolution. In order to prevent overfitting, batch-normalization layers are applied where necessary.

In the learning process, the loss function is defined as the mean square error (MSE) between the predicted diffraction pattern and the measured one:

Equation (8)

where M and N are the width and height of the diffraction pattern, respectively. We used the optimizer Adam [40] to calculate the gradient and update the network parameters. The learning rate η was set to 0.1. BlindNet was implemented by using Python 3.6.9 on the platform of TensorFlow (ver. 1.13), and sped up by using a graphics processing unit (GeForce GTX1060).

3. Simulations and performance analysis

In this section, we perform numerical simulations to demonstrate the proposed BlindNet. In the simulation, we assume that a phase object is illuminated by a coherent plane wave whose wavelength is 632.8 nm; and the diffraction pattern is captured by a virtual camera with $256\times256$ pixels, each of which is 8 µm. The free-space propagation described by equation (3) is numerically computed by using the transfer function method [35].

3.1. Convergence

In the first simulation, we analyze the convergence. A phase object, as shown in figure 3(a), is placed at the object plane, and the camera is placed at a distance d = 32 mm away. Under the illumination and sampling conditions mentioned above, we obtain a diffraction pattern as shown in figure 3(b). This is the only input data for our randomly initialized BlindNet to process; the accurate diffraction distance d is unknown to it.

Figure 3.

Figure 3. The reconstruction process of a phase object. (a) The original phase object. (b) The diffraction pattern. The retrieved phase at the epoch of (c) 900, (d) 10 300, and (e) 19 800. The behavior of (f) the loss function, (g) estimated distance, and (h) the SSIM value as a function of the number of epoch.

Standard image High-resolution image

If we wish BlindNet to find the parameter d as well, we can set up a range of d' that BlindNet needs to search for. In this simulation, we arbitrarily set $d^{^{\prime}}\in [27~\textrm{mm}, 47~\textrm{mm}]$. In the implementation of BlindNet, d' is randomly initialized by picking any value between 27 and 47 mm. Specifically, in this study, d' takes a value of 46.27 mm as its initialization, which is far enough from its ground truth.

The convergence of the algorithm is plotted in figure 3(f). As the iteration proceeds, the loss function (equation (8)) decreases dramatically. This means that the estimated pattern $\tilde{I}(m,n),d^{^{\prime}}$ with respect to the retrieved phase $\tilde{\phi}(m,n,0)$ converges gradually to the measured diffraction pattern $I(m,n,d)$. One can explicitly see this process in figures 3(c)–(e), which is the estimated phase at epoch 900, $10\,300$, and $19\,800$, respectively. Indeed, the estimated phase still ensembles the diffraction pattern at epoch 900, while it becomes very similar to the true phase at epoch $10\,300$. The associated MSE value between them drops to 0.001868. Along with this process, the estimated diffraction distance $d^{^{\prime}} = 31.99$ mm approaches the true value d as well (figure 3(g)).

One can evaluate the quality of the retrieved phase by using the Structural SIMilarity (SSIM) between the it $\tilde{\phi}(m,n,0)$ and original one $\phi(m,n,0)$. The SSIM is defined as:

Equation (9)

where µf and $\sigma_f^{\,2}$ are the mean and variance of f, respectively, $\sigma_{\tilde{\phi} \phi}$ is the covariance between $\tilde{\phi}$ and φ, and c1 and c2 are constants. The convergence of SSIM as a function of the iteration epoch is plotted in figure 3(h). When the network is optimized, the SSIM value increases from 0.0205 to 0.8464, well consistent with the convergent behavior of the loss function.

3.2. Tolerance of the distance uncertainty

It is important to examine the tolerance of the distance uncertainty. For this, one can take an arbitrary phase object to study this issue. First, we calculated the diffraction patterns of the phase object over the distance of 1, 10, 60, 100 and 150 mm, respectively, under the virtual illumination of a coherent plane wave [35] and obtain the corresponding diffraction patterns. Then we performed phase retrieval from these calculated diffraction patterns by using the proposed BlindNet with a random initialization distance parameter. Ideally, one may expect that it is possible to eliminate the uncertainty and yield the ground truth d no matter what the initial value of d' is, although the algorithm may take some more time to converge. However, our results suggest that this is not the case in our study: BlindNet has a very high probability of finding the ground truth value of d only when the initial guess of the uncertainty d' is not too far away from it; and this range of initialization is highly distance-dependent. As plotted in figure 4, the proposed BlindNet does not work so well when d is about 1 mm. In this case, one needs to know exactly the diffraction distance d to retrieve the phase accurately. This means that BlindNet is reduced to PhysenNet [1] in this case. When d is larger than 10 mm; however, BlindNet allows for an amount of uncertainty for d of at least a few millimeters in both the + and − directions. Within this range (error bars in figure 4) of initialization, BlindNet has a probability of larger than $90\%$ to find the ground truth d. The closer that d' is to d, the higher probability that d can be found. This of course has a practical significance, as from a practical point of view the propagation distance d is at least approximately, if not accurately, measurable. Thus, it is reasonable to set up a small range for d' within which it should be initialized.

Figure 4.

Figure 4. The tolerance of the distance uncertainty. Error bars denote the maximum tolerance of the initial value of d' with respect to d that BlindNet is allowed.

Standard image High-resolution image

3.3. Comparative study and robustness to noise

In this subsection, we conducted a comparative study of the proposed BlindNet with respect to some typical phase retrieval algorithms (i.e. the modified Gerchberg–Saxton (GS) algorithm [41], and the newly proposed PhysenNet [1]) in terms of robustness to noise. The main results are plotted in figure 5. In this comparative study, we took the standard 'cameraman' image as the phase object of interest (column 1, figure 5(a)). First, we calculated the Fresnel diffraction of this phase object over a distance of 46 mm. Then, we added Gaussian white noise with the variance σ2 of 0, 0.0025, 0.0050, 0.0075, 0.0100 and 0.0125, respectively, to the normalized diffraction pattern (column 2, figure 5(a)). The retrieved phase objects by using the GS algorithm [41], PhysenNet [1], and BlindNet are plotted in columns 3–5, figure 5(a), respectively. It should be noted that for GS and PhysenNet, the exact diffraction distance d = 46 mm should be known, whereas BlindNet does not require any information about it. The simulation results plotted in figure 5(a) show that all the three algorithms perform quite well in the ideal case where the additive Gaussian noise is absent. The SSIM between the three corresponding reconstructed phases and the original one is about 1, as plotted in figure 5(b). However, in the presence of noise, BlindNet outperforms the GS algorithm as the latter is very sensitive to noise [42]. BlindNet has a similar noise performance as PhysenNet, but it has the advantage that it allows uncertainty of the diffraction distance. However, we also observe that the phase retrieved by BlindNet becomes worse when the noise level is up to 0.0125. This is because the noise is too large to suppress and affects the inference ability of BlindNet.

Figure 5.

Figure 5. Comparison with other phase retrieval algorithms. (a) Add Gaussian white noise with the variance of 0, 0.0025, 0.0050, 0.0075, 0.0100 and 0.0125 to the diffraction pattern, and use Gerchberg–Saxton, PhysenNet and BlindNet algorithms to reconstruct the phase. (b) SSIM of various algorithms results and original image under different noise levels.

Standard image High-resolution image

4. Experimental demonstration

We carried out experimental demonstration of BlindNet with the setup schematically shown in figure 6(a). The laser light emitted from a He-Ne laser (HNL100L, Thorlabs) with the wavelength of 633 nm was filtered by a microscope objective lens (UPLFLN10X2PH, Olympus) and a pinhole (with an aperture size of 15 µm), and then collimated by a lens with a focal length of 300 mm. The collimated laser beam was then illuminated a phase object and the resulting diffraction pattern acquired by using an sCMOS camera (PCO.edge 4.2, pixel pitch: 6.5 µm).

Figure 6.

Figure 6. Experimental setup and results. (a) Experimental setup. (b) Experimental diffraction pattern (d = 24.11 mm). (c)–(f) Images reconstructed by algorithm (c) digital holography, (d) Gerchberg–Saxton, (e) PhysenNet and (f) BlindNet. (g) The intensity of the four reconstructed images in the same horizontal section.

Standard image High-resolution image

Figure 6(b) shows the diffraction pattern acquired at a diffraction distance of d = 24.11 mm. We will use it alone to reconstruct the phase. Different algorithms were used to examine the practical performance of BlindNet. As in [1], DH [43] was used as a testbed for comparison. The interference-based coherence imaging technique allows one to record the whole wavefront of the diffraction field, and reconstruct the phase by using a fairly simple and straightforward calculation [43]. The phase reconstructed by DH is shown in figure 6(c). The phase retrieved by using the GS algorithm is shown in figure 6(d). Since noise is unavoidably present in the measurement, the GS algorithm is not possible to retrieve a perfect phase. The SSIM between the phase retrieved by it and the ground truth (figure 6(c)) is 0.3646 owing to the presence of noise. The phase retrieved by PhysenNet and BlindNet is plotted in figures 6(e) and (f), respectively, the associated SSIM between which and figure 6(c) is 0.6506 and 0.6526. For DH, GS and PhysenNet, the diffraction distance d = 24.11 mm should be exactly known. Whereas for BlindNet, the guess of d' was randomly initialized between 20 and 30 mm. Along with the optimization of the neural network, d' converges to 24.09 mm, the accuracy of which is as small as $0.08\%$. This is in high agreement with the simulation results we observed in section 3. These can be more clearly seen in the intersections of these retrieved phase maps plotted in figure 6(g).

We also experimentally verified the influence of the different initial-value of d' to the accuracy of finding d. Explicitly, we set the initial values of d' equal to 22, 25, 28 and 31 mm, respectively, whereas the ground truth distance d = 24.11 mm as mentioned above. The phase maps retrieved by BlindNet are plotted in figures 7(c)–(f), with the corresponding estimated distances equal to 24.06, 24.09, 24.13, and 24.09 mm, respectively. It can be seen that the accuracy is within $0.2\%$. Indeed, we calculated the SSIM values between these reconstructed phase images and ground truth ones (reconstructed using DH, as shown in figure 6(c)), and the values all about 0.65, with a small variance. This indicates that the performance of BlindNet is highly stable.

Figure 7.

Figure 7. Reconstruction with different d'. The SSIM value is calculated with respect to the phase reconstructed by using DH.

Standard image High-resolution image

5. Conclusion

In conclusion, we have proposed BlindNet, a novel untrained deep neural network framework for CI with model uncertainty. The proposed framework can be applied to a wide class of problems in CI (such as computed tomography reconstruction with uncertain view angles [44], misalignment correction of Fourier ptychographic microscopy [45]), although we have demonstrated it simply in a scenario of phase imaging with the uncertainty of the diffraction distance. The proposed BlindNet does not require any data to train, but relies on the interplay between the physical model and the neural network to automatically optimize the network parameters as well as the uncertain physical parameters.

Acknowledgment

Thanks to the phase object provided by the Institut für Technische Optik, Universität Stuttgart.

Data availability statement

The data that supports the findings of this study are available upon reasonable request from the authors.

Funding

This work was supported by the National Natural Science Foundation of China (62061136005, 61991452), the Sino-German Center (GZ1391), and the Key Research Program of Frontier Sciences of the Chinese Academy of Sciences (QYZDB-SSW-JSC002).

Please wait… references are loading.
10.1088/1361-6463/ac2ad4