Abstract
Fourier ptychography (FP) involves the acquisition of several low-resolution intensity images of a sample under varying illumination angles. They are then combined into a high-resolution complex-valued image by solving a phase-retrieval problem. The objective in dynamic FP is to obtain a sequence of high-resolution images of a moving sample. There, the application of standard frame-by-frame reconstruction methods limits the temporal resolution due to the large number of measurements that must be acquired for each frame. In this work instead, we propose a neural-network-based reconstruction framework for dynamic FP. Specifically, each reconstructed image in the sequence is the output of a shared deep convolutional network fed with an input vector that lies on a one-dimensional manifold that encodes time. We then optimize the parameters of the network to fit the acquired measurements. The architecture of the network and the constraints on the input vectors impose a spatiotemporal regularization on the sequence of images. This enables our method to achieve high temporal resolution without compromising the spatial resolution. The proposed framework does not require training data. It also recovers the pupil function of the microscope. Through numerical experiments, we show that our framework paves the way for high-quality ultrafast FP.
Export citation and abstract BibTeX RIS
Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 license. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
1. Introduction
In Fourier ptychography (FP) [1], hundreds of low-resolution intensity images are acquired by illuminating the object of interest with a coherent light source with varying incidence angles. This task is typically performed using a LED array and a microscope with a low numerical aperture (NA) objective lens, which makes FP a low-cost and label-free imaging modality. The collection of measurements is then algorithmically combined into a high-resolution complex-valued image of the sample over a large field of view. Thus, FP has a high space-bandwidth product.
Building upon the pioneering work of Zheng et al [1], the capabilities of FP have been extended in a variety of ways by improving the optical acquisition setup. For instance, in [2, 3], the sequence of illuminations is optimized via an importance metric and neural networks, respectively. Multiplexed FP is introduced in [4], where one illuminates the sample with multiple LEDs and is able to reduce the number of measurements. Further, optimal combinations of LEDs are studied in [3, 5–7].
There have also been several improvements on the computational side for FP. At its core, the reconstruction process involves the solution of a phase-retrieval (PR) problem—the recovery of phase information from intensity measurements. In [1], this task is performed by using the iterative Gerchberg–Saxton (GS) algorithm [8]. As PR is a non-convex problem, the solution obtained by GS depends on the starting point. This problem of initialization is tackled in [9]. In [10–12], PR is formulated as a convex optimization problem with the help of a lifting scheme. However, this elegant approach comes at the cost of a large computational burden. As the acquired measurements are typically corrupted by noise, maximum-likelihood estimation offers an adequate framework for one to incorporate the noise statistics [13]. The resulting optimization problems are solved efficiently by gradient-based or higher-order methods [14, 15]. A thorough comparative study of different methods for PR can be found in [16]. In addition to solving the PR problem, algorithms that include the estimation of the pupil function of the microscope [17] and correction of the LED positions [18, 19] have also been proposed.
While FP has matured into a versatile modality with numerous applications [20], high-quality high-speed imaging remains a challenge. The temporal resolution in FP is inherently limited by the large number of measurements that need to be acquired in order to reconstruct the high-resolution image of the sample. To alleviate this problem, ad hoc acquisition setups [5, 6, 21] have been devised. They allow one to obtain a higher temporal resolution without a significant deterioration of the spatial resolution. Alternatively, there has been a lot of interest in the development of sophisticated computational methods to solve the PR problem with only a few measurements. In such ill-posed scenarios, regularization techniques can be used to incorporate some prior knowledge about the sample of interest. These are typically applied by formulating PR as an optimization problem where the cost functional consists of a data-fidelity term and a regularization term. The data-fidelity term ensures that the solution is consistent with the observed data while the regularization promotes solutions with the desired properties. For example, the popular total-variation (TV) regularization [22] favors piecewise-constant images and has been adapted for FP in several works [23–26]. Group-sparsity-based priors have been successfully deployed in FP as well [27]. An online plug-and-play approach for FP has also been proposed in [28], where sophisticated denoisers such as BM3D [29] are used for (implicit) regularization.
Over the past few years, deep-learning-based methods have yielded impressive results, outperforming the model-based regularized methods in a variety of imaging modalities, especially in ill-posed settings [30, 31]. In the context of FP, deep neural networks have been trained in a supervised manner as nonlinear mappings that take the low-resolution measurements and output the high-resolution image of interest [32–34]. Further, in [35, 36], pre-trained deep generative priors are used to solve the PR problem. For more details regarding FP, we refer the reader to recent comprehensive reviews [20, 37].
In dynamic FP, when it is desired to image a moving sample, the computational methods described above must be applied in a frame-by-frame manner to obtain the sequence of high-resolution images, without accounting for the temporal dependencies in the measurements. Yet, one can decrease the number of measurements required per frame (thus increasing the effective imaging speed) by exploiting the temporal correlations in the sequence of images to be recovered. Based on this idea, the concept of low-rank FP is introduced in [38], where a low-rank constraint is enforced on the matrix formed by stacking the (vectorized) images.
1.1. Contributions
In this work, we propose a novel computational framework for dynamic FP. Inspired by the method developed in [39] for dynamic magnetic resonance imaging, we use a deep neural network (deep prior) to impose a spatiotemporal regularization on the sequence of complex-valued images to be recovered. More specifically, we parameterize each image in the sequence as the output of a single convolutional network corresponding to some fixed latent input vector. These input vectors are chosen to lie on a one-dimensional manifold. The parameters of the network are then optimized such that the generated images collectively fit the acquired measurements under the action of the specified forward model. The architecture of the generative network imposes an implicit spatial prior on the images while the constraints on the input latent vectors allow the network to associate their proximity with temporal variations in the sequence. Our method does not require any training data. It also estimates the pupil function together with the complex-valued images, which means it can be readily applied for different settings. We assess the performance of our framework on simulated data with a single measured low-resolution image per reconstructed frame and show that it paves the way for high-quality ultrafast FP.
The paper is organized as follows. In section 2, we describe a continuous-domain physical model for FP along with its computationally efficient discretization. We present the proposed reconstruction framework in section 3 and the experimental results in section 4.
2. Physical model
In this section, we first formulate the physical model that relates the acquired measurements and the sample of interest in the continuous domain. Then, we present a discretized version of the forward model that can be implemented in a computationally efficient manner.
2.1. Continuous-domain formulation
The optical system in FP usually involves an array of L LEDs (see figure 1), where the lth LED illuminates the specimen with a tilted plane wave with wave vector () and wavelength λ > 0. In this work, we consider the case where only one LED is turned on for each measured image. However, our framework is also compatible with more sophisticated acquisition settings [4].
We model the sample of interest as a 2D complex object, which is a valid assumption for thin samples. Therefore, we can represent the moving sample as a complex-valued function , where includes the region of interest of the sample. Let be the uniformly-spaced timestamps, with spacing , at which we are interested in observing the sample. We assume that the sample moves very slowly in the intervals . Thus, during Tq , we can acquire multiple measurements , where and where includes the support of the measurement, of the object . Here, the tradeoff between the temporal resolution and the spatial resolution can be understood in terms of and W: a small value of (high temporal resolution) implies a small value of which yields a low spatial resolution.
Let , where , be the set of LEDs that are switched on during Tq ; the cardinality of this set is . Further, for , we introduce to denote the wth entry of . The measurement image is obtained when is illuminated by the th LED with the tilted plane wave . As mentioned in [4, 16], it is given by
Here, the operators and denote the Fourier transform and its inverse, respectively, is the 2D spatial frequency variable, and the quantity denotes the Fourier transform of . The pupil function 5 models the pupil aperture and is compactly supported on a disk of radius , where NA is the numerical aperture of the system, thus cutting off high frequencies.
2.2. Camera sampling
In practice, the camera in the acquisition setup samples on a uniform grid with stepsize Δ and records a discrete image of size 6 such that
where the image is the sampled version of given by
for and , and the operator models the corruption of by noise. Consider the quantity
Due to the compact support of the pupil function , the maximum angular frequency of is . Note that the Fourier transform of can be written as
where denotes the complex conjugate of which is given by , and denotes the convolution operation. Thus, the maximum angular frequency of is . Consequently, the Nyquist criterion dictates that the sampling step Δ of the camera should satisfy
2.3. Discretized forward model
In this work, we obtain a discrete version of by sampling it on a uniform grid with pixel-size , as
for and . The image size is given by , where is the upsampling factor. Now, consider the 2D discrete Fourier transform (DFT) of . The corresponding pixel size in the Fourier domain (or angular frequency resolution) is . Thus, we discretize the pupil function such that
for and . Note that the choice of Δ and ensures that the support of the pupil function lies within the sampling grid for . Moreover, in our discretization scheme, we assume that the wave vector can be written as , where .
We now introduce some additional notations to specify the discrete forward model. Let , , , and be the vectorized versions of , , , and , respectively. Then, let be matrices that represent the 2D DFT and its inverse of a image, respectively. Next, we define to be a diagonal matrix whose entries are the values in . Finally, is a boolean matrix that restricts an N2-dimensional vector to an M2-dimensional vector depending on the illumination wave vector .
Proposition 1. The discrete counterpart of (1) can be computed as
We discretize the integrals in (10) using Riemann sums. A step-size is used for the integral with respect to k and a step-size is used for the integral with respect to s. The samples for and , are then given by
where
The limits in the sums in (11) and (12) are dictated by the supports of and , respectively. By rearranging some terms and using the fact that , we rewrite (12) as
where is the (N, N)-point DFT of and the shifts in the DFT are applied in a circular manner. On plugging (13) into (11), we get that
Next, we group all the exponential terms involving k1 and k2 and use to obtain that
Let be the (M, M)-point inverse discrete Fourier transform (IDFT) of . Then, the discrete measurements can be expressed as
Note that the computation of involves taking the (N, N)-point DFT of , (circularly) shifting it according to the wave vector , restricting the shifted DFT to an image, performing pointwise multiplication with , and then taking the (M, M)-point IDFT. This allows us to write (16) in vectorized form as in the right-hand side of (9).
While the discrete forward model (9) has previously been used in works such as [16], to the best of our knowledge, a systematic derivation of (9) from the continuous model (1) has not been presented in the literature.
3. Reconstruction framework
The goal in dynamic FP is to reconstruct the images from the recorded measurements . We first present our neural-network-based framework for the case of a well-characterized pupil function. Then, we describe a way to incorporate the recovery of the pupil function into our reconstruction algorithm.
3.1. Deep spatiotemporal priors
The concept of using untrained convolutional neural networks (CNNs) as regularization for solving inverse problems was first introduced in [40] under the name 'deep image prior' (DIP). There, the image of interest is represented as the output of a CNN with adjustable parameters and some fixed input. It is then shown that the fitting of the network to the measurements yields high-quality image reconstructions for several applications such as denoising, superresolution, and inpainting. This is attributed to the observation that CNNs have a remarkable tendency to favor natural-looking images ('good' solutions) over noisy ones ('bad' solutions). In some scenarios, DIP is deployed with early stopping as deep networks have the capacity to fit noise. In other words, there is a point beyond which running the optimization process for more iterations degrades the quality of the reconstruction. Thus, the architecture of a CNN (and the optimization procedure) can be used as an implicit prior for natural images.
In our reconstruction framework, we propose to use an extended version of DIP to impose spatiotemporal regularization on the sequence of images. We parameterize each of the Q images as the output of a single CNN , with adjustable parameters , applied to some fixed input latent vector , . We choose these latent vectors such that they lie on a straight line, in accordance with
where the end-points are fixed beforehand (for example, by drawing two samples from some multivariate probability distribution). We then optimize the parameters of the network to fit the measurements according to
where is the data-fidelity term and the reconstructed sequence is . The rationale behind our choice of the latent vectors is to allow the CNN to associate the spatial proximity between them with the temporal proximity of the images. In this manner, the architecture of the network imposes spatial regularization while the use of a shared network for all images and the design of the latent space impose temporal regularization. A schematic illustration of our framework is given in figure 2.
Download figure:
Standard image High-resolution image3.2. Optimization strategy
The relation between the measurements and the underlying images is nonlinear, which makes the inverse problem very challenging. The fact that only one LED is switched on for each measurement further adds to the difficulty. Thus, in order to avoid bad local minima while solving the optimization problem in (18), we initialize the parameters of the network according to
where are low-quality reconstructions obtained via a standard frame-by-frame method. The magnitude and phase operations in (19) are applied component-wise. We can solve (19) using off-the-shelf minibatch stochastic gradient-descent algorithms. However, it is not desirable to run these algorithms till convergence as the network then overfits the artifacts present in the low-quality reconstructions. Thus, in our initialization routine, which is described in algorithm 1, we deploy early stopping by choosing suitable values for the tolerance εtol and the maximum number of iterations nmax (see section 4.2.3 for details).
Algorithm 1. Initialization of network parameters. |
---|
Input: Low-quality reconstructions , latent vectors , batch size BQ , tolerance εtol, maximum number of iterations nmax. |
Randomly initialize θ |
, |
while do |
Randomly sample a batch of size BQ from |
Compute |
Update θ with gradient |
if then |
Exit the while loop |
end if |
end while |
Output: Network parameters θ |
After the initialization, we can solve (18) using again some minibatch stochastic gradient-descent algorithm. In some cases (for example, when the measurements are corrupted by a non-negligible amount of noise), running the optimization process beyond a certain number of iterations leads to deterioration of the reconstruction quality as the network begins to overfit the measurements. Thus, we also adopt early stopping when necessary.
For both the initialization and reconstruction tasks, we use (minibatch) stochastic gradient-descent algorithms instead of deterministic ones. This introduces additional hyperparameters (batch sizes) that must be set appropriately. However, stochastic methods with small batch sizes require much less memory than the deterministic ones. In fact, if the number of frames Q is large, applying a deterministic gradient-descent method is infeasible. Further, such stochastic methods are also more likely to escape bad local minima and thus reach better solutions. Indeed, in our experiments, we observed that using reasonably small batch sizes () led to better reconstructions than using large batch sizes ().
3.3. Recovery of the pupil function
So far, we have assumed complete knowledge of the pupil function in our reconstruction framework. However, the pupil function is typically not well-characterized in FP. Thus, similar to the work in [17, 26], we estimate it along with the sequence of images.
Following [26], we use Zernike polynomials to represent the pupil function with only a few parameters (). These functions are orthogonal on the unit circle and are often used in optics for modeling aberrations. We express the pupil function in polar coordinates as
where Za
is the ath Zernike polynomial according to Noll's sequential indices (refer to
Finally, the optimization problem for the joint recovery of the pupil function and the sequence of images is
where is the data-fidelity term. We can solve (22) using a minibatch stochastic gradient-descent algorithm coupled with early stopping if required. Our complete reconstruction algorithm is summarized in algorithm 2.
Algorithm 2. Joint recovery of dynamic sample and pupil function. |
---|
Input: Measurements , LED indices , latent vectors , initial network parameters , initial Zernike coefficients , batch sizes , number of epochs nep. |
, |
, |
for nep epochs do |
for nW iterations do |
Randomly sample a batch of size BW from |
for nQ iterations do |
Randomly sample a batch of size BQ from |
Compute the loss |
Update θ with gradient |
Update c with gradient |
end for |
end for |
end for |
Output: Reconstructed images , Zernike coefficients c |
4. Numerical results
4.1. Simulated setup
We demonstrate the advantages of our reconstruction method on simulated data. We consider an FP setup consisting of L = 100 LEDs arranged in a uniform grid with a spacing of . The maximum illumination NA of the LED array, which is placed at distance from the sample, is 0.27. The LEDs emit light with wavelength . The NA of the objective is . We have chosen these values of and based on the experimental setup in [20]. The pupil function is defined according to (20) using the first nine Zernike polynomials with coefficients . We take the low-resolution measurements acquired by the camera to be of size with pixel-size and we set the oversampling ratio as . Consequently, the pixel size for the high-resolution image is and the step-size for discretizing the pupil function is . The LED array and the pupil function are shown in figure 3.
Download figure:
Standard image High-resolution imageOur ground truth is a sequence of complex-valued images of size which we created from experimental phase images 7 . We place ourselves in the extremely challenging ultrafast regime where only one measurement is acquired for each image in the sequence. For each measurement, a single LED of index 8 lq is randomly activated and a low-resolution image8 is simulated according to (21). The recorded measurement image is then generated according to
where is a realization of a zero-mean Gaussian random vector with covariance matrix . Specifically, we consider two settings for our simulations. In the first case, is the zero matrix, which means that the recorded measurements are noiseless. In the second case, is a diagonal matrix with entries . There, (23) corresponds to a Gaussian approximation of the Poisson noise model with a photon budget of 1000.
We show some of the frames in the ground-truth sequence and the corresponding measurements for both the settings (noiseless and noisy) in figure 4. The full sequences are provided in the supplementary material.
Download figure:
Standard image High-resolution image4.2. Implementation of the deep spatiotemporal prior
In this subsection, we describe the implementation of our reconstruction method—the deep spatiotemporal prior (DSTP).
4.2.1. Network architecture.
It has been observed that the choice of the network architecture can greatly affect the performance of DIP [40]. Therefore, the common practice when deploying DIP (or other related schemes), is to select the architecture in an empirical trial-and-error manner for the specific task at hand. For our experiments, inspired by [39], we adopt a convolutional decoder-like architecture for , which, as we demonstrate in sections 4.5 and 4.6, yields high-quality reconstructions. It takes a low-dimensional input vector and outputs a complex-valued (vectorized) image . The architectural details are described in table 1. In particular, the complex-valued image is generated from a pair of magnitude and phase images. The initial part of the network creates feature maps of size . These are then fed into both the magnitude and phase branches of the network. The magnitude branch consists of a convolutional layer followed by the pointwise differentiable rectified linear unit (DReLU) activation function, which we define as
where γ > 0 is set a priori. We use DReLU (with γ = 0.1) instead of ReLU to avoid the 'dead-neuron' issue during the first few iterations of the optimization, while ensuring that the magnitude is positive. Meanwhile, the phase branch consists of a convolutional layer followed by the nonlinearity to constrain the phase to lie within the range .
Table 1. Architecture of the network . Size of input: . Conv: convolutional layer with kernels and reflective boundary conditions. BN: batch normalization layer. Upsampling: nearest neighbor interpolation. The amplitude and phase branches take the same input of size and output the magnitude and phase images of size , respectively. DReLU is described in (24). The combination layer generates a complex-valued image from the magnitude and phase images. This network consists of 1628 546 learnable parameters.
Layers | Output shape |
---|---|
Reshape | |
(Conv + BN + ReLU) | |
Upsampling + (Conv + BN + ReLU) | |
Upsampling + (Conv + BN + ReLU) | |
Upsampling + (Conv + BN + ReLU) | |
Upsampling + (Conv + BN + ReLU) | |
Upsampling + (Conv + BN + ReLU) | |
Magnitude: Conv + DReLU | |
Phase: Conv + | |
Combination: Magnitude | |
Reshape |
4.2.2. Latent vectors.
As mentioned in section 3.1, the latent vectors are chosen such that they lie on the straight line defined in (17). We fix the end-points of this line by drawing two samples from the standard multivariate normal distribution in 82 dimensions.
4.2.3. Initialization.
In all our experiments, we initialize the parameters of the network using reconstructions obtained from the GS algorithm (briefly described in section 4.3). We run algorithm 1 using the AMSGrad solver [42] with a learning rate of 10−3, batch size , tolerance and maximum number of iterations . We then freeze the tunable parameters of the batch-normalization layers. For experiments involving the estimation of the pupil function, we initialize the Zernike coefficients as .
We have observed that the initialization of the network parameters has an impact on the reconstruction quality. For example, randomly initializing the parameters does not lead to satisfactory results. However, initializing the network by simply fitting it to low-quality solutions of the GS algorithm (along with early stopping to avoid overfitting the artifacts) allows us to obtain excellent reconstructions (see sections 4.5 and 4.6).
4.2.4. Choice of the data-fidelity term.
The data-fidelity term in (18) and (22) measures the discrepancy between the observed and the simulated measurements. For optimization-based FP reconstruction, it has been experimentally shown in [16] that a cost function of the form
where , is robust and leads to better reconstructions than the popular mean-squared-error cost function . In particular, the cost function in (25) has a gradient similar to that of the Poisson-likelihood-based cost function, which suggests that it can also handle Poisson-like noise well [16]. In our framework, we use the slightly modified version of (25) given by
where is a vector with all entries equal to 1 and helps us avoid numerical instabilities in the computation of the gradient.
Note. The details regarding the optimization process for (18) and (22) are provided in sections 4.5 and 4.6.
4.3. Comparisons
We compare our proposed framework to the following methods.
4.3.1. GS algorithm.
The GS algorithm [8] is a classical method for phase retrieval. Assuming that the Zernike coefficient vector c is known, it aims at solving the feasibility problem
for , by alternately updating the image plane and the object plane. We refer the reader to [8] for more details. When the pupil function is not well-characterized, we do not incorporate its recovery within the GS algorithm. Instead, we solve (27) assuming an idealized pupil function with no phase aberrations that corresponds to .
4.3.2. Data-consistency estimator (DC).
Based on the work in [16], we consider a data-consistency (DC) estimator that minimizes the (slightly modified) 'amplitude-based' cost function (26). For the joint recovery of the images and pupil function, it is given by
where is defined in (26).
4.3.3. Spatially total-variation-regularized estimator (STV).
In our numerical simulations, we also consider a regularized estimator where the cost function in (28) is augmented with spatial anisotropic TV regularization for each frame. It is given by
where the operator computes finite differences in both the directions for the underlying image, and are hyperparameters that control the strength of the regularization.
4.3.4. Spatiotemporally total-variation-regularized estimator (STTV).
Finally, we also implement a spatiotemporally-regularized estimator where the cost function in (28) is augmented with both spatial and temporal TV regularization. It is given by
where is the finite-difference operator and are the regularization hyperparameters.
4.4. Evaluation metric
We quantify the performance of a method by computing the regressed signal-to-noise ratio (RSNR) for the entire reconstructed sequence of images. Let and denote vectorized versions of the ground truth and reconstruction, respectively. These are created by concatenating the vectorized representations of each frame in the sequence. The RSNR is computed as
We also report the SNR for the pupil function whenever it is jointly estimated with the sequence of images. This metric is computed as
where c and are the ground-truth and estimated Zernike coefficients, respectively.
4.5. Reconstruction from noiseless measurements
We now present two experiments involving noiseless measurements. In both of them, we run the iterative algorithm for each method for a sufficient number of iterations (details are provided below), beyond which the reconstruction does not change significantly. In other words, we do not deploy early stopping for any method as the measurements are noiseless.
4.5.1. Perfectly characterized pupil function.
We first consider an idealized setting where the pupil function is perfectly characterized and is therefore not estimated during the reconstruction of the images of interest. In this scenario, the DC and STV estimators can be computed in frame-by-frame manner (similar to the GS method) by decomposing the overall optimization problems into Q = 100 smaller ones. We solve these by running AMSGrad with a learning rate of 10−3 for 1000 iterations. In order to improve their performance, we initialize the GS, DC, and STV methods for the timestamp tq with the reconstructed images from the previous timestamp . The GS solution is used for initializing the STTV method. We solve (30) by using AMSGrad for epochs with a learning rate of 10−3 and a full batch size of 100. The optimal hyperparameters and for the STV and STTV methods, respectively, are chosen via a grid-search. For DSTP, the network parameters are initialized with the help of the GS solution. We then solve (18) by running the AMSGrad optimizer for epochs with a learning rate of and a batch size of .
We present the RSNR values for all the methods in table 2. Further, we display some slices of the (2D + time) reconstructions in figure 5. The entire reconstructed sequences can be found in the supplementary material. We observe that the proposed method significantly outperforms the GS, DC, STV and STTV methods. Even though only one measurement is acquired per frame, it yields a high-quality reconstruction, unlike the other methods which exhibit various artifacts (for example, the features marked by arrows in figure 5).
Download figure:
Standard image High-resolution imageTable 2. Reconstruction from noiseless measurements with a perfectly characterized pupil function.
Method | GS | DC | STV | STTV | DSTP |
---|---|---|---|---|---|
RSNR (dB) | 17.24 | 9.66 | 17.85 | 18.58 | 28.61 |
Note: The bold values indicate the method with the best performance.
4.5.2. Joint recovery of dynamic sample and pupil function.
Next, we consider a setting where the pupil function is not well-characterized and is therefore estimated jointly with the dynamic sample in our framework and in the DC, STV and STTV methods. (We do not adapt the GS algorithm for the recovery of the pupil function; we simply assume the idealized pupil function .) For the DC, STV and STTV methods, the sequence of images is initialized with the GS solution and the Zernike coefficients are initialized as . We solve (28) and (29) by running the AMSGrad optimizer for epochs with a learning rate of 10−3 and a batch size of 10. For solving (30), we run AMSGrad for epochs with a learning rate of 10−3 and a full batch size of 100. In the STV method, we select two global hyperparameters via grid search and share them among all frames. The hyperparameters for the STTV method are also tuned for best performance with the help of a grid search. In our method, we initialize the network parameters using the GS solution and we initialize the Zernike coefficients as . We solve (18) by running AMSGrad for epochs with a learning rate of and a batch size of .
We present the RSNR and SNR values for the reconstructed sequence and the pupil function, respectively, in table 3. We also show some slices of the (2D + time) reconstructions and the recovered pupil functions (phase) in figure 6, as well as the recovered Zernike coefficients in figure 7. The full reconstructed sequences are provided in the supplementary material. Here, the DC, STV and STTV methods fail to recover the Zernike coefficients (i.e. the pupil function) accurately and yield poor reconstructions of the dynamic sample. On the contrary, our method provides a good estimate of the pupil function along with a high-quality reconstruction of the moving sample.
Download figure:
Standard image High-resolution imageDownload figure:
Standard image High-resolution imageTable 3. Joint recovery of the dynamic sample and the pupil function from noiseless measurements.
Method | GS | DC | STV | STTV | DSTP |
---|---|---|---|---|---|
Sequence RSNR (dB) | 14.70 | 14.82 | 15.88 | 17.10 | 28.04 |
Pupil SNR (dB) | N.A. | 8.28 | 9.57 | 12.74 | 31.22 |
Note: The bold values indicate the method with the best performance.
4.6. Reconstruction from noisy measurements
Finally, we consider the joint recovery of the sequence of images and the pupil function from noisy measurements. In this case, we observe that the GS, DC and DSTP methods require early stopping as running the corresponding iterative algorithm beyond a point leads to overfitting the noisy measurements. Thus, we run each method for a sufficiently large number of epochs () and we report the reconstruction that achieves the best RSNR during these epochs. For each method, we use the initialization, optimizer, learning rate and batch size described in section 4.5.2. The hyperparameters for the STV and STTV methods are also tuned in the same way as in section 4.5.2.
We summarize the quantitative results for all the methods in table 4. We display some slices of the (2D + time) reconstructions and the estimated pupil functions (phase) in figure 8, and we present the recovered Zernike coefficients in figure 9. The entire reconstructed sequences are available in the supplementary material. In this setting, as shown in figure 4, the dark-field measurements are corrupted by significant amounts of noise, which makes the recovery problem quite challenging. Remarkably, our method still yields reconstructions of very good quality and outperforms the DC, STV and STTV methods by a big margin.
Download figure:
Standard image High-resolution imageDownload figure:
Standard image High-resolution imageTable 4. Joint recovery of the dynamic sample and the pupil function from noisy measurements.
Method | GS | DC | STV | STTV | DSTP |
---|---|---|---|---|---|
Sequence RSNR (dB) | 14.09 | 14.14 | 14.65 | 16.39 | 24.86 |
Pupil SNR (dB) | N.A. | 9.36 | 10.66 | 14.98 | 28.36 |
Note: The bold values indicate the method with the best performance.
4.7. Computational cost
In all our experiments, we used an Intel Xeon Gold 6240R (2.6 GHz) CPU for the GS method and an NVIDIA V100 GPU for the DC, STV, SSTV and DSTP methods. While DSTP achieves substantially better reconstruction quality than the other methods, its computational cost is also higher. For example, the run time for DSTP was around 5.5 h as opposed to min for the other approaches when jointly estimating the sequence and the pupil function from noiseless measurements.
5. Conclusion
We have presented a neural-network-based framework that does not require training data for the reconstruction of high-resolution complex-valued images of a moving sample in dynamic FP. In our method, we have parameterized the sequence of images to be reconstructed using a shared convolutional network with adjustable parameters. We have encoded the temporal behavior of the sample in the input vectors of the network by constraining them to lie on a one-dimensional manifold. In this manner, we have leveraged both the structural prior of a neural network and the temporal regularity between consecutive frames. Further, we have incorporated the recovery of the pupil function of the microscope within our framework. Finally, with the help of simulations, we have shown that the proposed approach drastically improves the quality of reconstruction over standard frame-by-frame methods.
Acknowledgments
This research was supported in part by the Swiss National Science Foundation under Grant 200020_184646/1 and in part by the European Research Council (ERC Project FunLearn) under Grant 101020573.
Data availability statement
The data that support the findings of this study are openly available at the following URL/DOI: https://github.com/pakshal23/DynamicFP.
Appendix: Zernike polynomials
In the polar coordinates , the Zernike polynomials are given by
where , , , , and
For , Noll's sequential indexing defines a mapping such that
Footnotes
- 5
The pupil function is described directly in the Fourier domain.
- 6
We consider square even-sized images for the sake of simplicity.
- 7
The experimental phase images are from [41] and are available at http://celltrackingchallenge.net/2d-datasets/.
- 8
We have dropped the index w as W = 1.
Supplementary data (4.2 MB AVI)
Supplementary data (9.7 MB AVI)
Supplementary data (7.2 MB AVI)
Supplementary data (7.3 MB AVI)