Abstract
Dermoscopy is the main tool for early detection of skin cancer. Non-contact dermoscopes often suffer from a small depth of field leading to images of skin topographies with regions that are not in focus. We aim to provide an easy-to-implement focus stacking-based approach to ensure all-in-focus images from a non-contact dermoscope. Further, we aim to extract additional information about the skin topography from the image stacks. The focus stacking procedure itself is implemented in a non-contact dermoscope with an electrically adjustable focus realized by using a tunable liquid lens. We show that all-in-focus imaging is possible for non-contact dermoscopy and deliver a method to extract topographical information for dermatologists from the acquired image stacks. Our finding indicate that the approach can be valuable for non-contact dermoscopic examination as well as for the early detection of skin diseases such as cancer as it possible to derive hyperfocus images and information on the skin topography. With this, we were able to develop a software for the acquisition of the raw image data and its processing into a high resolution hyperresolution dermoscopic image. In the next steps, we plan to apply the approach in the clinical environment for skin cancer diagnostics or imaging of inflammatory skin diseases.
Export citation and abstract BibTeX RIS
Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
1. Introduction
Skin cancer is one of the most common types of cancer. In 2021, approximately 1.9 million cancer cases were diagnosed in the USA alone. Of these, about 5.6% are melanoma, and more than 7000 deaths were reported in conjunction with the latter [1]. In Europe, the number of new cases amounts to more than 140,000 per year. Early detection is vital for melanoma treatment. According to statistical data, the 5-year relative survival rate of melanoma drops rapidly from 99% to below 20% as stages evolve [2].
Dermoscopes are currently the widely used equipment for lesion detection and are usually operated in contact to the skin. Compared to the naked eye, a dermoscope can extract information from below the surface of the skin. There are two types lighting-systems of dermoscopes used in the clinical environment: non-polarized dermoscopes (NPD) and polarized dermoscopes (PD). The main difference between NPD and PD is the depth of visualized structures. While NPD aims to inspect the lesion on the skin surface, PD can filter out the reflection of the skin's surface and reveal subcutaneous structures [3].
When examining the patient's lesion, a contact-based dermoscope is pressed against the skin. The contact increases the risk of lesion rupture and distorts the skin geometry. Furthermore, the pressure may cause change of blood perfusion and therefore might change the color of the lesion as well. However, the geometry and color of the lesion are considered important criteria in melanoma diagnosis according to the well-known ABCDE rule (asymmetry, border, color, diameter, evolution) [4].
Non-contact dermoscopy is proposed to counteract the problems mentioned above. With the help of a liquid lens with tunable focus, non-contact dermoscopy provides a less invasive approach for lesion examination [5–7]. The skin is in its natural state when imaged. A downside to non-contact dermoscopy can be that, for topographies of a depth greater than the depth of field of the imaging system, the skin under study is not always fully in focus. Another possibility for the occurrence of this problem is failure of integrated auto-focus systems or movement of the patients. The problem of some regions being out of focus is visualized in figure 1.
In this work, we report on a method of obtaining dermoscopic images with full focus for all skin topographies by the employment of focus stacking. Furthermore, we show the possibility of extracting topographical information from the same stack of images used for all-in-focus imaging.
1.1. State of the art
The approaches for skin disease diagnosis from dermoscopic images usually follow three steps: image segmentation, feature extraction, and classification [8]. Image segmentation labels every pixel to a category and is often used to determine the lesion's boundaries. Traditional segmentation methods include thresholding [9], the watershed-method [10] and graph cuts [11]. With the development of computer vision, many deep learning-based methods were proposed [12–14]. Feature extraction for melanoma is currently an active field of research as well. The extracted features can be based on e.g. the ABCDE rule [4], the ELM 7 point checklist [15] or the pigment network [16]. The skin lesion classification is nowadays dominated by Convolutional Neural Network (CNN) based methods [17–21]. All these approaches rely on a sharp contrast of the features in the dermoscopic data which underlines the need for all-in-focus imaging in non-contact dermoscopy.
Focus stacking is often employed in microscopic bioimaging. It can occur that objects in a scene inhibit different distances to the camera's focal point. This can lead to a blur of the objects that are outside the depth of field [22]. In dermoscopy, reliable diagnosis would be prevented by this limitation.
1.2. Overview
This work structures as follows: In section 2 we explain the working principle of the liquid lens and its implementation in the focus stacking imaging system. In section 3 we describe the proposed method step by step. In section 4 we present the details of the experiments, including the composition of the setup for the acquisition of focus stacks based on a liquid lens. In addition, we present the experimental results, showing the effectiveness and limitations of the proposed method. Finally, in section 5, we conclude the main findings and discuss possible improvements.
2. Principle of image stacking with liquid lens
2.1. Working principle of liquid lens
A liquid lens is an optical lens manipulating the light with different shapes of a liquid surface. It is possible to modulate its focal length by electrical controls. According to the working principle, liquid lenses can be divided into different groups. In this work, an electromagnetically actuated liquid lens was employed. Figure 2 displays the working principle.
Download figure:
Standard image High-resolution image2.2. Principle of focus stacking with liquid lens
The depth of field (DOF) of an image is limited, which means that for some skin topographies only parts of the lesion are in focus. In order to obtain an all-in-focus image of the lesion, the focused regions from every captured image are extracted and stacked together. This method is called focus stacking [25]. Figure 3 shows the workflow for focus stacking as implemented in this work. Through changing of the current of the electromagnet in the liquid lens, differently focused images are captured in one sequence. Because of the existence of misalignment within the images of the stack, the images need to be aligned first with respect to each other, so that they are overlaid pixel by pixel. Extracting the in-focus area from each image is called focus measure. The focus measure is calculated for every pixel in the stack to evaluate its focus. Based on the calculated focus results, the images are fused to create an all-in-focus image.
Download figure:
Standard image High-resolution image2.3. Image fusion metrics
Multi-focus image fusion metrics can be categorized into four groups [26]. In order to evaluate the fusion result objectively, we selected one metric from each group to assess the fusion results. The chosen metrics in this work are:
- (a)Normalized mutual information-based metric: QMI from the group of information theory-based metrics. It quantifies the distance between the fused image and input images [27].
- (b)Spatial-frequency-based metric: QSF from the group of image feature-based metrics. It measures the first-order gradient error between the fused image and input images in four directions [28].
- (c)Yang's metric: QYang from the group of image structural similarity-based metrics. It evaluates the fusion result by a structural similarity index measure (SSIM) [29].
- (d)Chen-Varshney metric: QChen from the group of Human perception inspired fusion metrics. This metric calculates a global quality measure based on the edge information, local region saliency, and similarity [30].
3. Proposed method
3.1. Image acquisition
In the following the basics for the image acquisition are being described. As shown in figure 2, the diopter of the liquid lens is controlled by the current of the electromagnet. A certain current corresponds to a certain focal plane. Figure 4 sketches the optics for collecting a stack of images from different focal planes by using a thin lens model.
Download figure:
Standard image High-resolution imageFrom left to right, the three vertical planes are the focal plane, liquid lens, and the image plane. The focal length can be calculated by the thin lens equation.
Equation (1) represents the thin lens equation, where is the distance to the object, is the image distance and is the focal length. During the acquisition of the stack, the focal length of the liquid lens changes from to with a constant interval According to equation (1), the focal plane changes from to by an interval of As a result, images with different focal planes are captured sequentially.
3.2. Image alignment
An unavoidable cause for misalignment between the frames is that objects at different distances are magnified by different degrees. This is because of the change of the focal length between frames. Another reason for misalignment between frames is a slight movement of the patient (e.g., breathing) or the camera. An Enhanced Correlation Coefficient (ECC) [31] based method was employed on the captured images to eliminate the misalignment from these two sources.
Equation (2) uses the Euclidean norm to quantify the error between the reference image and the warped input image are unknown parameters. The alignment problem is to estimate
In the experiment, the most magnified image in the image stack is used as the reference image. The other images of the stack are being correlated to the reference image sequentially. The ECC method aims to minimize An estimated affine transformation matrix is calculated by minimizing the difference between the reference image and the warped input image.
3.3. Focus measure
The focused areas of each image are calculated by a Fast Fourier Transform (FFT) [32, 33] based method in the frequency domain. In this approach, we assume that the focused area of an image has clear edges and texture. The aligned images are transferred from the spatial domain to the frequency domain by FFT. The focused edges and texture have a large gradient in the spatial domain, which means they exhibit high-frequency signals in the frequency domain. A Gaussian high-pass filter is employed on the aligned image in the frequency domain to filter out low-frequency signals. The amount of the residual signal depends on the size of the applied Gaussian kernel. By choosing a suitable kernel-size, the in-focus areas of each image can be optimized.
3.4. Image fusion
The aligned images are fused by a weight-based method [34]. A focus value of a pixel in the input image can be obtained by applying a focus measure to each input image. A high focus value indicates that the pixel is in-focus. On the contrary, a low focus value indicates that the pixel is not in-focus. As described in equation (3), for each pixel (u,v), the value of the fused image is the weighted sum of input images The number of images in the stack is
Figure 5 visualizes the fusion process. The pixel on the fused image is a weighted sum of the pixels in the input image stack.
Download figure:
Standard image High-resolution image3.5. Topography measurement
In this work, we assume that the in-focus areas in one image are in the same plane. The focus measure helps to determine the in-focus areas of each image, and the corresponding image distances of those areas are calculated.
Equation (1) describes the relationship between the object distance and the focal length Furthermore, the focal length can be tuned by the current value of the liquid lens. In the experiment, the correlation between the object distance and the current value of the liquid lens is obtained in a calibration process. We obtain the object distance as a function of the current value Afterwards, the in-focus current can be calculated by focus measure in equation (4).
The pixel with maximal focus value is used for the topography estimation of the lesion. For example, given input images, through focus measure, the calculated focus values of pixel (u,v) on each input image are Among these, is the maximal focus value. The corresponding input image was captured under the current value Based on the fitting curve derived from the distance calibration, the depth of this pixel can be determined. Iterating through the above process for all pixels of the whole image can determine the topography map for all pixels.
For absolute topography measurements we employ a distance calibration method for focus stacking. In this approach, a flat object is placed at several working distances and a set of images under different currents is captured for each distance. The best focused image among all captured images is selected by a focus measure of the whole image. The operating current for the image with the largest focus measure is recorded as the in-focus current for the corresponding distance. By adjusting the distance to the camera, different current values are recorded. In the distance calibration, 30 working distance and current value pairs are measured. Figure 7 shows the fitting curve to the distance calibration results. With the derived regression function, equation (4), we can quantify the distance to the object U by the current of the liquid lens c as follows:
3.6. Instrumentation and image stack acquisition
Figure 6 shows the setup of the liquid lens-based non-contact dermoscope used in this work. The setup consists of an ultra-bright white light-emitting diode (CBT-90 White LED, Luminus Inc., Sunnyvale, California, USA) (LED), a custom-designed collimator (1) and a polarizing beam splitter cube (PBS511, Thorlabs, Newton, USA) (2) (PBS) for the illumination of the skin. The imaging part contains a polarizer (4), a liquid lens (EL-16-40-TC, Optotune AG, Dietikon, Schweiz) (5), a fixed-focus lens (6), a magnifier (NMV-75M1, Navitar, Rochester, New York, USA) (7), and a charge-coupled device (CCD) camera (BFS-U3-32S4M-C, FLIR Integrated Imaging Solutions Inc., Richmond, British Columbia, Canada) (8). The light source has a color rendering index (CRI) of 76 which makes it comparable to daylight [35]. Together with the custom-made collimator it provides an evenly distributed illumination to the skin. The emitted light partly transmits through the PBS, is being polarized linearly and illuminates the skin. Only the light that is scattered within the skin changes its polarization while the light reflected at the surface maintains its polarization [36]. Therefore, the polarizer filters out the surface reflections in a cross polarization configuration of the PBS and the polarizer. In addition, the liquid lens is employed to adjust the focal plane of the imaging system electrically, rapidly and without moving parts. It has an aperture of 16 mm, and the diopter of the liquid lens can be adjusted from −10 to 10 by changing the shape of the fluid. An overall reproducibility of −/+0.05 diopters is achievable. The response and settling times are 5 and 25 ms respectively [24]. The aperture of the fixed lens was adjusted between f/1.8 to f/16 while the shutter of the camera was adjusted correspondingly from 3 ms to 80 ms to maintain an adequate exposure. Image stacks were captured with automatically changing focal lengths, which are controlled by changing the diopter of the liquid lens. The diopter adjustment range of the liquid lens is −10 to 10. The corresponding control current is from −290 mA to 290 mA. During the acquisition, images were captured with an interval of 1 mA. The initial current value is the value that has the farthest point of the object in focus, and the end current value is the value that makes the nearest point of the object in focus. The aim is that every point of the object is focused on during the capture at least once. Together with the liquid lens, a lens with fixed focal length is used for the imaging. That latter lens has an equivalent focal length of 75 mm. The aperture is adjustable from f/1.8 to f/16. Finally, the magnified image is captured by the camera, which is equipped with a 1/1.8' CCD sensor and has a resolution of 1928 × 1448 pixels. From center to edge, the resolution changes from 120 l p mm−1 to 80 lp/mm. The captured images are transferred via a 1000 Mbit s−1 GigE interface from the camera buffer to a computer. The computer controls the camera and the liquid lens, to adjust the shutter of the camera and the diopter of the liquid lens.
Download figure:
Standard image High-resolution imageThe image stack acquisition was implemented in Python (version: 3.6.4) with the libraries PyCapture2 for the camera and Opto for the liquid lens. For rapid and convenient capture, a graphical user interface was designed based on Qt (5.15.4) while the enhancement of the depth of field and the topography measurements are implemented in MATLAB (MATLAB, 2021. version 9.11.0 (R2021b), Natick, Massachusetts: The MathWorks Inc.).
Download figure:
Standard image High-resolution imageFigure 8 Shows a 3D rendering of a phantom of melanoma designed to be the sample for the focus stacking approach.
Download figure:
Standard image High-resolution imageThe phantom design is based on the ABCD rule for melanoma diagnosis. Therefore, it is asymmetric, has irregular borders due to the printing process, and a diameter and height of 10 mm. The phantom consists of steps with different heights. The height is relatively high to show the effect of the approach on extreme skin topographies.
4. Experimental results and discussion
4.1. Enhanced depth of field
Figure 9 shows the captured image stack of a skin lesion with changing focal planes. From the stack of the lesion, it is visible that the focal plane changes from one image to the other. The focus is moving through the image showing that possible body movement in the optical axis does not affect the approach in this case. Figure 10 shows the fusion results for the same lesion (right panel). The first three images are images from the stack while the very right image shows the result after fusion. The used image stack contains more than the three displayed images. Usually, the image stacks contain 15 to 25 images. Compared to each input image, the DOF of the fused image is extended. In the enhanced depth of focus image, the details in the background and foreground are shown in one image. Figure 11 shows the results for the same lesion illuminated without cross polarized light. Without cross polarized illumination, more surface features are visible in the human skin. Therefore, the approach with non-polarized light shows a slightly improved performance for the all-in-focus image. In figure 12 we evaluate the effectiveness of focus stacking for an ex vivo melanoma. The fused image shows all the regions of the melanoma in focus while the example images of the stack hold blurred areas. A similar result is presented for the custom-designed phantom in figure 13. Figure 13 shows three images from the image stack of the phantom. The increased current moves the focal plane from the background to the foreground. Each image contains a part of the phantom, which is in-focus. In contrast to the images of the stack, the fused image on the very right displays all the regions of the phantom in-focus. Focus stacking has a better performance on the phantom than on the skin. In comparison, the image of the lesion observed with unpolarized light shows a better result compared with the observation with cross polarized light. This might be because the unpolarized illumination leads to more reflections from the skin surface which are easily detected as in-focus features. The results for the enhanced depth of field depends on several acquisition settings, i.e., aperture, illumination, polarization, samples, and parameters in the post-processing, like the size of the Gaussian kernel for focus measure. For the acquisition of the stacks we used the parameters optimized for skin imaging with our setup as the dermoscopic images are the most valuable data for the dermatologist. Especially the aperture and illumination have a strong effect on the depth of field. The fusion result is mainly influenced by the quality of the image alignment and the application of the most-suitable focus measure. During the stack acquisition, in contrast to the phantom, it is difficult for the patient to remain static. There are two kinds of possible movements: movements within the focal plane and movements in the direction of the optical axis. Movements within the focal plane impose difficulties for the image alignment. The black bars in the images of figure 11 show the degree of misalignment that has been corrected. If the lesion is textured, like in the case of the mole observed under unpolarized illumination, automated image alignment can easily solve this type of motion and alignment problem. However, for dermatologic diagnosis, images of the lesion under cross polarization are typical. Under cross polarization the skin shows the subcutaneous information of the lesion, which has less features. Unfortunately, this limits the alignment performance. Due to the small field of view (FOV), the lesion may even move out of view during the acquisition as displayed in figure 14.
Download figure:
Standard image High-resolution imageDownload figure:
Standard image High-resolution imageDownload figure:
Standard image High-resolution imageDownload figure:
Standard image High-resolution imageDownload figure:
Standard image High-resolution imageDownload figure:
Standard image High-resolution imageMovements in the depth direction hamper the effectiveness of the focus measure. In some settings, the depth of field in an image can be as small as 1–2 mm. Here, a slight movement may cause the lesion to lose focus. Moreover, it can occur that not each region of the object can be in-focus at least once. Except for our FFT-based fusion method, many spatial-based or wavelet-based focus measure methods have been proposed for focus stacking [37]. Figure 15 shows the fusion result of Absolute central moment (ACMO) [38], Variance of Laplacian (LAPV) [39], Sum of wavelet coefficients (WAVS) [40], and our FFT-based method.
Download figure:
Standard image High-resolution imageFrom subjective criteria the FFT-based method shows the best performance for the all-in-focus imaging of the phantom. The ACMO and WAVS methods both generate blurred all-in-focus images, and the image fused by LAPV suffers the most noise. Compared to that, table 1 shows the results when applying the objective fusion assessment criteria explained in section 2.
Table 1. Objective evaluation of the fusion results.
Metrics | ||||
---|---|---|---|---|
Methods | ||||
Absolute central moment (ACMO) | 0.9353 | 0.0164 | 0.3085 | 21.5563 |
Variance of Laplacian (LAPV) | 0.1727 | 0.0386 | 0.0192 | 28.2701 |
Sum of Wavelet coefficients (WAVS) | 0.9265 | 0.0172 | 0.3045 | 13.0465 |
FFT-based | 0.8687 | 0.0222 | 0.3287 | 8.1528 |
The best method for each metric is given in bold values. QMI, QSF and QYang are positive metrics, which means the larger value represents better fusion quality. QChen is a negative metric with a smaller value indicating better fusion quality. The FFT-based method gets the best score in QYang and QChen. This validates the effectiveness of our FFT-based multi-focus fusion method.
4.2. Topography measurement
Figure 16 shows the experimental results for the topography maps of the phantom. The topography map of the phantom not only has clear edges but also shows the protrusions well. Still, the topographical information is affected by noise. In figure 17 we overlaid the depth information of the phantom derived from the focus stack with the CAD model of the phantom. Figure 17 shows that the dimensions of the phantom in X- and Y-direction exceed the FOV of the imaging system. Furthermore, the depth information derived from the focus stack shows good agreement with the designed depth information of the phantom. In addition, the measured height of the phantom is matches the height of the phantom at each of the steps. Compared to that, the topography map of the lesion displayed in figure 18 is flatter as the protrusions of the lesion are shallow.
Download figure:
Standard image High-resolution imageDownload figure:
Standard image High-resolution imageDownload figure:
Standard image High-resolution imageIt can be observed that the mole does not have a strong elevation. It has to be noted, that movement during the image stack acquisition did not move any of the captured images out of focus. In addition, the in-focus areas might not be detected in the out-of-focus images because of the motion blur. Meanwhile, because of the lack of ground truth of the topography map for in vivo measurements, an objective evaluation is not available.
In the following we discuss the results for another type of phantom for which two differently shaped raisins which have been attached to human skin. The raisin resembles the optical appearance of a lesion as displayed in figure 19.
Download figure:
Standard image High-resolution imageFigure 20 shows the depth maps derived from the focus stacks of the two phantoms displayed in figure 19. The two different geometries of the raisins are clearly distinguishable in the depth maps. This shows that it is possible to detect changes in the geometry of a lesion via focus stacking. Furthermore, we show that it is possible to obtain the topology of the human skin in vivo by the example of a human acromastium with the depth map displayed in figure 21. In this example, the topography of the acromastium and its contours are roughly visible. Compared to the raisin phantoms, this in vivo measurement performs worse. It indicates that the color contrast might aid the topography measurements derived by focus stacking. The depth maps for the ex vivo melanoma are displayed in figure 22. The depth maps in figure 22 of the melanoma prove that the geometries of the melanoma are measurable by focus stacking.
Download figure:
Standard image High-resolution imageDownload figure:
Standard image High-resolution imageDownload figure:
Standard image High-resolution image5. Conclusion & outlook
In this work, we have proposed a focus stacking method for non-contact dermoscopy based on a liquid lens. An all-in-focus image and the corresponding depth map can be obtained at the same time. In addition, the FFT-based method is evaluated via both subjective and objective metrics. The estimated depth for the in vivo measurements map could only be assessed by subjective criteria because of the lack of ground truth. The body movements during the capture limits the performance of focus stacking in non-contact dermoscopy. For patients that cannot remain static during a non-contact dermoscopic imaging procedure (e.g. due to Parkinson's disease), focus stacking could help to obtain at least one image from the stack that shows sufficient focus. For total body dermoscopy scanners that employ focus stacking to ensure that a focused image is obtained in the case of patient movement, the focus stacks will be a valuable by-product with additional information. The hyperfocus images are the main result from the optical system of the non-contact dermoscope. The resulting improvement in image quality can potentially aid the diagnosis of the dermatologist. Furthermore, the data quality that intelligent machines for computer aided diagnostics are build on can be enhanced. This can lead to potentially higher classification accuracy.
In further works, improved and more reliable registration algorithms need to be developed to aid the difficulties in the image alignment. In addition, the depth resolution of the proposed depth resolution can be determined and the clinical application can be evaluated. Furthermore, the potential of machine learning on the depth measurement by focus stacking and its noise reduction needs to be explored. Considering the limitations of monocular depth estimation, a stereo system may provide depth information for dermatologic diagnosis as well.
Acknowledgments
This work has been supported by iToBoS (Intelligent Total Body Scanner for Early Detection of Melanoma), project funded by the European Union's Horizon 2020 research and innovation programme, under grant agreement No 965221. Also, financial support by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany's Excellence Strategy within the Cluster of Excellence PhoenixD (EXC 2122, Project ID 390833453) is acknowledged.
Data availability statement
The data that support the findings of this study are available upon reasonable request from the authors.
Materials
The performed experiments were approved by the Ethics Committee of the University Medical Center Rostock (A 2016-0115). The results in figures 9, 10, 11, 14, 18–21 were measured on the skin of the authors.
Disclosures
The authors declare no conflict of interests.