Autofocusing algorithm for a digital holographic imaging system using convolutional neural networks

Kyungchan Son; Wooyoung Jeong; Wonseok Jeon; Hyunseok Yang

doi:10.7567/JJAP.57.09SB02

1. Introduction

Digital holographic imaging systems are promising next-generation imaging systems that can obtain three-dimensional information. Digital holographic imaging systems acquire a holographic interference pattern, which is generated by the superposition of reference and object waves, using a digital imaging device such as a charge-coupled device (CCD) or a complementary metal–oxide–semiconductor (CMOS).¹^–⁴⁾ The holographic interference pattern is called a hologram. Using various techniques, the object wave can be obtained from the digital hologram as an array of complex numbers representing the amplitude and phase of the optical field.⁴^–⁷⁾ The propagation of the optical field is described by the scalar diffraction theory.¹^,⁸⁾ Using the diffraction theory, the image can be numerically reconstructed at any distance from the hologram plane. Since digital holography allows for the reconstruction of a three-dimensional shape, it is widely used in three-dimensional imaging techniques such as three-dimensional microscopy.¹^–³^,⁹^–¹²⁾

In order to reconstruct an on-focus object image, determining the exact distance of the object from the hologram is a critical issue, however a difficult one. In the early days of image capture, the observer determined the focal point subjectively. Subjective focus processing by the observer is not only slow but also imprecise. Therefore, it is better to use the automatic focusing algorithm. With the autofocus system used in conventional photography, focal length is determined by the configuration of an optical system, for example, by adjusting the lens position. Focusing metrics such as sharpness are computed to find the focused image and adjust the lens position.¹³^–¹⁷⁾ Many studies have examined autofocusing in digital holography using several different focusing metric criteria.¹⁰^,¹¹^,¹⁴^,¹⁵^,¹⁷^–²⁶⁾ With the conventional autofocusing algorithm in digital holography, focusing metrics are used to numerically evaluate the reconstructed images and determine the focal point via iterative image reconstruction. Considerable computational time is required for both iterative image reconstruction and evaluation of the focusing metrics.¹⁷⁾ As a result, it takes a long time for the conventional autofocusing method to find the focal point, so it cannot be applied to some applications such as real-time sensing.

In this paper, an autofocusing algorithm for a phase-shift digital holographic imaging system using convolutional neural networks is proposed. The proposed autofocusing algorithm directly determines the distance of the object from the reconstructed object wave at the hologram plane using pattern recognition via a convolutional neural network. This process is conducted by simulation via numerical diffraction and computer-generated holography. The proposed system reduces the computational time and has the potential to be implemented in applications such as real-time particle tracking.¹²⁾

2. Backgrounds

In this section, the basic theories and assumptions required to understand this research are introduced. First, the scalar diffraction theory, which is the foundation of the digital holographic imaging system, and optical simulation are introduced. Second, digital holography concepts and reconstruction methods are presented. Finally, the convolutional neural network algorithm used in this research is introduced.

2.1. Scalar diffraction theory

The diffraction theory is the foundation of digital holographic imaging systems. The propagation and diffraction of the optical field are described based on the scalar diffraction theory.⁸⁾ Light is an electromagnetic wave that can be described by a wave equation. As shown in Eq. (1), the electromagnetic wave field E_P at the observation point P is calculated from a point source through an aperture, where Σ, k, and r represent the aperture, the frequency of the wave field, and the distance between aperture and observation point, respectively.

$\begin{equation} E_{\text{P}} = -\frac{1}{2\pi}\int d\Sigma\,E_{\Sigma}\frac{\exp(ikr)}{r} \end{equation} \tag{ 1 }$

Essentially, the imaging system deals with a two-dimensional image considered using a two-dimensional aperture. Figure 1 shows the geometry of the scalar diffraction of a two-dimensional aperture. Equation (2) can be derived from Eq. (1).

$\begin{align} E(x,y;z)& = - \frac{ik}{2\pi z}\iint dx_{0}\,dy_{0}\,E_{0}(x_{0},y_{0})\\ &\quad\times \exp\left[ik\sqrt{(x - x_{0})^{2} + (y - y_{0})^{2} + z^{2}}\right] \end{align} \tag{ 2 }$

**Fig. 1.** Geometry of the scalar diffraction of two-dimensional aperture.
Download figure:
Standard image High-resolution image

The paraxial approximation, also called the Fresnel approximation, is used for theoretical development. The paraxial approximation assumes that a light ray lies close to the optical axis. Using paraxial approximation, r is approximated as

$\begin{align} r &= \sqrt{(x - x_{0})^{2} + (y - y_{0})^{2} + z^{2}}\\ & \approx z + \frac{(x - x_{0})^{2} + (y - y_{0})^{2}}{2z}. \end{align} \tag{ 3 }$

The Fresnel point spread function S_F is given by

$\begin{equation} S_{\text{F}}(x,y;z) = -\frac{ik}{2\pi z}\exp(ikz)\exp\left[\frac{ik}{2z}(x^{2} + y^{2})\right]. \end{equation} \tag{ 4 }$

The diffraction field is expressed in terms of a two-dimensional Fourier transform of spatial frequencies. The diffraction field at point $(x,y;z)$ is given by Eq. (5). This is also referred to as the Fresnel transform.

$\begin{align} E(x,y;z)& = 2\pi \exp\left[\frac{ik}{2z}(x^{2} + y^{2})\right]\\ &\quad \times\mathfrak{F}\{E_{0}(x_{0},y_{0})S_{\text{F}}(x_{0},y_{0};z)\}[k_{x},k_{y}] \end{align} \tag{ 5 }$

In this paper, propagation is simulated by the Fresnel transform method. The Fresnel transform is numerically computed and applied to the optical simulation to generate hologram images.

2.2. Digital holography

The digital holographic imaging system acquires a holographic interference pattern generated by the superposition of the reference and object beams. It has advantages of ability to post-process acquired holograms and directly deal with a light wave field.¹^,²^,⁴^,⁷^,²⁷⁾

The imaging device samples the light intensity at each pixel as a two-dimensional array of real numbers. The intensity of the hologram is given by Eq. (6).

$\begin{equation} I = |E_{\text{R}} + E_{\text{O}}|^{2} = |E_{\text{R}}|^{2} + |E_{\text{O}}|^{2} + E_{\text{R}}^{*}E_{\text{O}} + E_{\text{R}}E_{\text{O}}^{*} \end{equation} \tag{ 6 }$

In Eq. (6), E_R and E_O represent the reference wave field and object field, respectively.

The object wave field contains object information, and it is necessary to reconstruct the object wave field from the hologram. There are several methods of deriving the object wave field based on the intensity of the hologram.⁵^,⁶⁾ One method is phase-shift holography, in which four images with phase differences of π/2 radians are sampled. Figure 2 shows the optical configuration of the phase-shift digital holographic imaging system. The spatial light modulator makes object wave fields and the phase retarder makes the phase difference of the object wave field. The object wave field is accurately derived using Eqs. (7) and (8), where ε_R, ε_O, and φ represent the amplitude of the reference wave field, the amplitude of the object wave field, and the phase of the object wave field, respectively.

$\begin{align} I_{0} & = \varepsilon_{\text{R}}^{2} + \varepsilon_{\text{O}}^{2} + 2\varepsilon_{\text{R}}\varepsilon_{\text{O}}\cos\varphi\\ I_{\pi/2} & = \varepsilon_{\text{R}}^{2} + \varepsilon_{\text{O}}^{2} - 2\varepsilon_{\text{R}}\varepsilon_{\text{O}}\sin\varphi\\ I_{\pi} & = \varepsilon_{\text{R}}^{2} + \varepsilon_{\text{O}}^{2} - 2\varepsilon_{\text{R}}\varepsilon_{\text{O}}\cos\varphi\\ I_{3\pi/2} & = \varepsilon_{\text{R}}^{2} + \varepsilon _{\text{O}}^{2} + 2\varepsilon_{\text{R}}\varepsilon_{\text{O}}\sin\varphi \end{align} \tag{ 7 }$

$\begin{align} E_{\text{O}}(x,y) & = \frac{1}{4\varepsilon_{\text{R}}}[(I_{0} - I_{\pi}) + i(I_{3\pi/2} - I_{\pi/2})] \end{align} \tag{ 8 }$

The object wave field is numerically extracted and contains both amplitude and phase information. Figure 3 shows computer-generated hologram images normalized to values between 0 and 255. The digital holographic imaging system applies the amplitude and phase information to three-dimensional systems such as measurement and display.

**Fig. 2.** Optical configuration of phase-shift digital holographic imaging system.
Download figure:
Standard image High-resolution image

**Fig. 3.** Computer-generated digital hologram images: (a) reference beam, (b) object image, (c) propagated object wave field at hologram plane, (d) I₀ hologram, (e) I_π/2 hologram, (f) I_π hologram, (g) I_3π/2 hologram, and (h) extracted object wave field.
Download figure:
Standard image High-resolution image

2.3. Convolutional neural networks

The convolutional neural network algorithm is a biologically inspired artificial intelligence algorithm. It is good at identifying high-level features and is widely used in pattern recognition problems, such as image classification, and in other vision-related tasks.²⁸^–³²⁾ Convolution neural networks usually consist of four kinds of layers: the convolution, activation, pooling, and fully connected layers. The architecture of convolutional neural networks involves a combination of these layers. The role of each layer is described in this section. However, because the pooling layer is not used in this study, a detailed explanation of the pooling layer is not provided.

The convolutional layer employs a linear operation on two functions. This operation is defined in Eq. (9), where x represents a signal and w represents a weighting function.

$\begin{equation} s(t) = (x*w)(t) = \int x(\tau)w(t - \tau)\,d\tau \end{equation} \tag{ 9 }$

The convolution operation provides weighted averages for any given moment. This operation is widely used in the linear control theory and signal processing. In the convolutional neural network terminology, x is referred to as the input, w as the kernel, and the output s as the feature map.

The discrete and two-dimensional convolution operations are defined by Eqs. (10) and (11), where I represents a two-dimensional image and K represents a two-dimensional kernel.

$\begin{align} s(t) & = (x*w)(t) = \sum_{\tau = -\infty}^{\infty}x(\tau)w(t - \tau) \end{align} \tag{ 10 }$

$\begin{align} s(i,j) & = (\mathbf{I}*\mathbf{K})(i,j) = \sum_{m}\sum_{n}\mathbf{I}(m,n)\mathbf{K}(i-m,j-n) \end{align} \tag{ 11 }$

The convolutional operation is commutative; thus, Eq. (12) can be represented by

$\begin{equation} s(i,j) = (\mathbf{K}*\mathbf{I})(i,j) = \sum_{m}\sum_{n}\mathbf{I}(i-m,i-n)\mathbf{K}(m,n). \end{equation} \tag{ 12 }$

Equation (12) is more straightforward for implementing in machine learning applications due to less variation in the ranges of m and n.³¹⁾

The convolution operation uses adjacent data corresponding to the size of the kernel. Thus, the convolution operation is shift-invariant, that is, independent of time or position. Therefore, it can extract the feature map of the data, which is correlated with adjacent data such as the image, audio wave, and video data. Figure 4 shows image changes by convolution operation. The object wave field I changes to the feature maps S through the kernels. The hyperparameters of the convolutional layers are the size of the kernel, the number of kernels, strides, and padding. The size of the kernel is related to the extracted feature and the number of kernels is related to the computational costs. Stride is the interval of the convolution operation, corresponding to downsampling of the output of the convolution function. Padding sustains the spatial size of the output and prevents loss of information at the boundary.

**Fig. 4.** Images of (a) object wave field I and (b) feature map S by the convolution operation.
Download figure:
Standard image High-resolution image

The activation layer employs nonlinear functions such as sigmoid, hyperbolic tangent, and rectified linear unit functions. This layer is usually located after the convolution layer and provides nonlinearity to the feature map. Without nonlinearity, convolutional neural networks can solve only linear problems because the convolution operation is a linear operation. Nonlinearity from the activation layer allows convolutional neural networks to solve nonlinear problems. Furthermore, the activation layer improves the performance of convolutional neural networks.

The fully connected layer, also called the dense layer, has the same architecture as neural networks; it classifies the input into N classes based on the extracted feature maps. Convolutional neural networks update the weights of the kernels and nodes to extract the feature maps and classify input images properly.

In this paper, convolutional neural networks are updated by a stochastic gradient descent algorithm, which is an optimization algorithm. Learning rate is a positive scalar hyper-parameter that determines the size of the optimization step. By changing learning rate, the convergence of networks varies. Therefore, it is important to set appropriate learning rate to find optimum weights. Equation (13) is a formula of the stochastic gradient descent algorithm, where θ is the weight matrix, α is the learning rate, and C is the cost function.

$\begin{equation} \theta = \theta - \alpha \frac{\partial}{\partial \theta_{i}}C(\theta) \end{equation} \tag{ 13 }$

3. Proposed methods

The objective of the autofocusing algorithm is to find the focused image. In previous studies, a digital holography focus metric, similar to that of conventional photography, was used to evaluate the reconstructed image and determine the focus.¹⁷⁾ Unlike autofocus in conventional photography, the image is iteratively reconstructed when focusing, which requires considerable mathematical computation. To reduce the computational costs of iterative reconstruction, the relationship between the hologram and the object image must be determined directly. However, as shown in Sect. 2, the propagation of a light wave is described by complex operations such as integrals and Fourier transform. As a result, the mathematical derivation of the inverse functions is impossible. Yet clearly, the distance between the object and the hologram can be mathematically represented. Convolutional neural networks approximate the distance of the object from the hologram pattern by extracting the latent features. In this study, an autofocusing algorithm using convolutional neural networks for a phase-shift digital holographic imaging system is proposed.

In this paper, the autofocusing problem is considered a supervised classification problem that can be solved using convolutional neural networks. The autofocusing algorithm classifies hologram images into N classes of the distance of the object. The architecture of the proposed convolutional neural networks is shown in Fig. 5. The input image is set to the propagated object wave field image at the hologram plane. The object wave field can be extracted by the phase-shift digital holography method using a digital holographic imaging system shown in Fig. 2. The size of the input image is 256 × 256. The network consists of six convolution layers with activation layers using a rectified linear unit (ReLU) and two fully connected layers for classification. The first convolution layer has 6 × 6 kernels with stride 1 and extracts 32 feature maps. The second convolution layer has 6 × 6 kernels with stride 1 and extracts 64 feature maps. The third convolution layer has 6 × 6 kernels with stride 2 to down-sample feature maps and extracts 64 feature maps. The fourth convolution layer has 6 × 6 kernels with stride 2 and extracts 128 feature maps. The last two convolution layers have 3 × 3 kernels with stride 2 and extracts 256 feature maps. As a result, 256 feature maps, each 16 × 16 in size, are obtained. The fully connected layer with one hidden layer which consists of 1024 nodes, using softmax activation, classifies the distance of the object into 10 classes. The distance range of the digital holographic imaging system is set to 0.05–0.5 m at 0.05 m intervals. Figure 6 shows a computationally propagated wave field of a circular object. The cost function of learning algorithm C is defined as shown in Eq. (14), where D, S, H(X), and L represent a cross-entropy function, a softmax function, the hypothesis function of a multinomial classifier, and the output label, respectively.

$\begin{equation} C = \frac{1}{N}\sum D(S(\mathbf{H}(\mathbf{X})),L) \end{equation} \tag{ 14 }$

**Fig. 5.** Schematic diagram of architecture of convolutional neural networks for autofocusing.
Download figure:
Standard image High-resolution image

**Fig. 6.** Propagated object wave field w.r.t. distances: (a) 0, (b) 0.05, (c) 0.10, (d) 0.20, (e) 0.45, and (f) 0.50 m.
Download figure:
Standard image High-resolution image

Networks require a training set for learning and a test set for evaluating generalization abilities. All images are generated by computation using the Fresnel transform. Using computer generated holography, the dataset can be easily generated. The parameters required to generate the image dataset through Fresnel transform are shown in Table I. The object images for the training set consisted of circles, rectangles, and English alphabet shapes of various sizes and different positions. Speckle noise, which occurs in the coherent light source used in the digital holographic imaging system, was added to the hologram to create a robust autofocusing algorithm.³³⁾ The 40,180 training set images used are shown in Fig. 7. The object images of the test set consisted of words as well as circles, rectangles, and English alphabet shapes of various sizes and positions. The size and position of each object in the test set were totally different from those of the images in the training set. The number of test set images was 16,450. The test set images are shown in Fig. 8.

Table I. Parameters for optical simulation.

Image size	256 × 256
Wavelength	532 µm
Pixel pitch	7.6 µm
Variance of speckle noise	0.01

**Fig. 7.** Training set images: (a)–(c) object images and (d)–(f) propagated object wave field images.
Download figure:
Standard image High-resolution image

**Fig. 8.** Test set images: (a)–(d) object images and (e)–(h) propagated object wave field images.
Download figure:
Standard image High-resolution image

4. Simulation results

The simulation was conducted using TensorFlow. Network learning was conducted in 10,000 steps with 4 mini batches. The learning rate was set to 0.0001. After the learning steps, the performance of the network was evaluated with the test set images. To evaluate the performance of networks, accuracy is used as the performance index. The accuracy shows how the networks correctly determine the output label of the test set images. For the final test 500 randomly chosen test set images, i.e., 50 images in each class, are used. The accuracy is 95.4%, which is considered good performance. Although the images in the test set were totally different from those in the training set, the autofocusing algorithm exhibited good performance in the test set. The proposed network could learn the generalized relationship between the distance of the object and the hologram, demonstrating fast and accurate results.

The network system requires 14 min to train and takes 0.00575 s to evaluate an image using an Intel i7-7700K CPU and two NVIDIA GTX 1080 GPUs. The autofocusing algorithm using the Tamura coefficient¹⁸⁾ takes 0.0657 s in the same system. Since the proposed algorithm shows 8.75% computation time, it shows better performance in terms of computational costs than the conventional algorithms. Furthermore, as the number of classes of the distance of the object increases, the computational costs also increase in conventional algorithms because of the iterative reconstruction. Since the proposed method does not require iterative reconstruction, the computational costs will be much lower than those of conventional algorithms in a situation where distance is unknown, like in an actual experiment.

5. Conclusions

In this paper, an autofocusing algorithm for digital holographic imaging systems was proposed. The conventional autofocusing algorithm for digital holographic imaging systems evaluates reconstructed images using focusing metrics. However, this method requires iterative reconstructions and extensive computational time. The proposed algorithm uses convolutional neural networks, which can extract feature maps. The convolutional neural network finds the optimal kernel and, extracts feature maps from the object wave field image, which is reconstructed by phase-shift digital holography, at the hologram plane and classifies the data into 10 classes of the distance of the object. The system took 0.00575 s to find the distance of an object with 95.4% accuracy. This autofocusing algorithm for phase-shift digital holographic imaging systems is fast and accurate, and can potentially be applied to time-sensitive tasks such as real-time sensing.

Acknowledgment

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2016R1D1A1B03930730).

Autofocusing algorithm for a digital holographic imaging system using convolutional neural networks

Article metrics

Permissions

Author e-mails

Author affiliations

ORCID iDs

Dates

Abstract

1. Introduction