Calculation of a binary diffractive optical element to increase the imaging system depth of field in the task of classifying images by a neural network

Using an example of a real-world data set, it is shown that the accuracy of the image classifier based on a convolutional neural network does not deteriorate when using only one color channel. The binary diffractive optical element was calculated, which allows to increase the imaging system depth of field by several times. This is achieved by using the different color channels for various defocus values. A comparison of the MTF curves of the original and apodized imaging systems for a given minimum acceptable value of image contrast is presented.


Introduction
Image classification refers to one of the tasks in the implementation of computer vision. For example, an image classification algorithm may be developed to determine if the image of the number on the container in the warehouse corresponds to the number of the container that the robot should move. Although such a task is trivial for humans, reliable classification of images is still a problem in computer vision applications. The task is even more complicated if the analyzed image is distorted, in particular defocused.
Optical systems are known to be sensitive to defocusing and chromatic aberration. An increase in the depth of field of the optical system makes it possible to weaken this sensitivity and its negative consequences in blurring defocused images. However, a simple increase in the depth of field (DOF) by reducing the pupil or numerical aperture of the system leads to deterioration in resolution. One way to increase DOF without degrading resolution is to "encode" the wavefront, which is actually a phase apodization of the pupil of the lens [1][2][3]. As a rule, apodization is accompanied not only by positive effects (an increase in DOF and a decrease in the size of the focal spot), but also by a significant change in the structure of the point scattering function (PSF) and the growth of side lobes that worsen the image properties. Apodization is widely used in focusing [4,5] and scanning [6-9] optical systems, as well as in microscopes to increase contrast [10][11][12], as well as in various applications for resolving two nearby radiation sources [13][14]. In imaging systems, the use of apodization requires additional, as a rule, digital decoding operation [15][16][17][18][19][20][21]. Despite the development of various decoding methods,  [22][23][24][25], when optimizing the apodizing function, it is desirable to maintain a compromise between increasing DOF and PSF distortion.
In the task of classifying images, specific requirements are imposed on the imaging system. In particular, in this paper it is shown that for a relatively reliable classification of the image, it is sufficient to have a focused image in only one color channel. This observation is based on the study of only one data set. However, it logically follows from the very specifics of the task, in which the structure of the object is important, not its color.

Radially symmetric binary phase apodization
The amplitude S in the focal plane of the imaging system can be written as follows: where P(ρ) is the pupil function, r and z are the radial and axial coordinates: , .
Here sinα is a numerical aperture, and R and Z denote the normalized radial and axial coordinates. Along the optical axis, the amplitude of the light field is written as follows: . (2) We assume that P(ρ) is a real function. This assumption is justified, because we consider binary phase elements, where the phase takes the values 0 and , therefore the amplitude is 1 or -1. In this case, you can write the following ratio: . (3)

Results of simulation
To classify the images we use a VGG-like [26] convolutional neural network (CNN). The CNN architecture contains seven convolutional layers and two fully connected layers (Figure 1). Figure 1a shows initial layers of CNN and Figure 1b [27] dataset is selected. SVHN is a real-world image dataset for developing machine learning and object recognition algorithms with minimal requirement on data preprocessing and formatting. This data set contains about 70000 digits for training, 25000 digits for testing. CNN was consistently trained in three conditions: (1) on color images (3 color channels), (2) on images that contained two color channels, and (3) on images that consisted of one color channel. Unused color channels were set equal to zero. For all cases, the classification accuracy on the test data was about 93% (table 1). Thus, it was concluded that one color channel is sufficient to classify the images of the SVHN dataset. Given the variety of lighting conditions, color palettes, and image position angles in SVHN, this conclusion can probably be generalized to other similar data sets, where the structure of the object is important for classification, not its color. For the last CNN layer, the softmax activation function was used, which calculates the probability of correct image classification. This, firstly, made it possible to demonstrate a decrease in CNN "confidence" in the correct classification for defocused images. Secondly, the calculated probabilities for each color channel allow you to choose the most reliable channel for classification. Figure 2 shows 9 test images and bar graphs of classification probabilities. In the upper right corner is a defocused image of "3". The probability bar graph to the right of the image reflects the uncertainty of CNN.

Figure 2. Examples of test images and bar graphs of classification probabilities
Binary diffractive optical elements introduce significant chromatic aberration. As a rule, the compensation of these aberrations requires additional efforts in optimizing the optical element. In this work, chromatic aberration was used to optimize PSF separately for each color channel in order to increase DOF. For this, the phase apodization of the aperture using a radially symmetric (ring) binary phase element is used. When optimizing this element, the value of the maximum resolvable frequency is set (in fractions of the cutoff frequency, 0.5 was set) and the value of the minimum allowable contrast at this frequency (20% was set). The optimized parameters are the height of the phase ring and its inner radius (the normalized outer radius is assumed to be 1).   Figure 3 shows MTF plots for the source and apodized imaging systems. These graphs are calculated for various defocus values. We assume that the imaging system (aperture 1.8 mm) is focused at a distance of 1.2 m from the lens. Figures 3a, b, c, d show MTF at distances of 9.0, 1.2, 0.65, 0.44 m, respectively. Comparison of MTF plots for the initial and apodized imaging systems allows us to conclude that in the apodized system the focus depth is increased by about an order of magnitude.

Conclusions
The binary optical element for phase apodization of the pupil function of the optical system is calculated. This optical element provides an increase in the depth of focus of the optical system through at least one color channel for a certain value on the optical axis. The calculated optical element can be used in machine vision problems for image classification. For example, when using robots in storage facilities. Using the appropriate training data set, the obtained results can be generalized, for example, to the task of monitoring the technical condition of railway rolling stock or the task of classification examination of ships and offshore structures.