Image enhancement using thermal-visible fusion for human detection

An increased interest in detecting human beings in video surveillance system has emerged in recent years. Multisensory image fusion deserves more research attention due to the capability to improve the visual interpretability of an image. This study proposed fusion techniques for human detection based on multiscale transform using grayscale visual light and infrared images. The samples for this study were taken from online dataset. Both images captured by the two sensors were decomposed into high and low frequency coefficients using Stationary Wavelet Transform (SWT). Hence, the appropriate fusion rule was used to merge the coefficients and finally, the final fused image was obtained by using inverse SWT. From the qualitative and quantitative results, the proposed method is more superior than the two other methods in terms of enhancement of the target region and preservation of details information of the image.


Introduction
The fusion of thermal and visible images become more popular in image enhancement where it combines the advantages in thermal and visible images to produce a higher quality of image. A good quality of image is very important for the systems to detect and monitor the presence of a human, hence give alarms for suspicious activities [1]. Close Circuit Television (CCTV) or visible cameras are commonly used for monitoring systems. These cameras captured images with high resolution and provide details of the scene but they only works well with good lighting conditions [2]. The characteristics of visible camera are totally different with those of thermal camera. An image captured by a thermal camera has low contrast, is very sensitive to temperature changes and lack of background details but it can detect an infrared radiation from different objects in dark environment [3]. Therefore, the fusion image of visible and thermal cameras will overcome the weaknesses in both cameras and give better information about the image.
Extensive works have been done on thermal-visible fusion for indoor or outdoor human detection. It is a challenging task because both cameras have different modalities with different fields of view. Some of the challenges are in handling different data modalities, data imperfection, data alignment etc [4]. Multiscale transform at pixel level become more popular in thermal-visible fusion. It converts the raw input images to a more convenient representation and has low complexity compared to fusion at feature or decision level.
Curvelet [5] and contourlet transform [6] are not translation invariant due to upsampling and downsampling in the transformation process. Besides, the edge of the fused images is not smooth and it may affect the detection of a human. Therefore, Cunha et al [7] proposed a Non-Subsampled Contourlet Transform (NSCT) that is more suitable and can overcome the drawback in contourlet transform. Then, several researchers applied and enhanced this method for human detection [8,9]. The results are slightly better than traditional NSCT but some of the images still have low contrast. Other than that, saliency analysis [10], combination of saliency analysis and non-subsampled Shearlet transform [11], gradient transfer [12], Gaussian [13] and sparse representation [14] can also be used. Some of these methods produced good results but with low computational speed. This paper proposed another fusion approach using Stationary Wavelet Transform (SWT). The wavelet transform is widely used in digital signal processing and the concept of SWT are introduced in the context of image fusion in Section 2. Section 3 briefly presents the experiments. Section 4 provides results and discussion based on qualitative and qualitative analysis. Finally, Section 5 concludes the paper.

Fusion Techniques
It is very important to preserve as much as information in input images to get a better quality of fused image. In SWT, the input images of m n × pixels, thermal image, IR I and visible image, VS I will be decomposed into approximation, A , vertical details, VD , horizontal details, HD and diagonal details, DD components respectively as shown in figure 1. All of these components represent lowpass and highpass filter for each decomposition level. In SWT, it modifies the filters at each level by padding them with zeroes to remain the coefficients during the decomposition process. Then, the coefficients for approximations, IR A and VS A were merged using maximum absolute value fusion rule because it contains main energy of the image. For merging the highpass coefficients or details components, the appropriate fusion rule was chosen to produce the best fused image. Finally, fused image, F I was obtained using inverse SWT.

Dataset and experimental setting
Thermal-visible nighttime imagery dataset were obtained from online dataset [15,16] where it consists of three different scenarios. All the images have the same size, 256 × 256. These images captured a human in outdoor environment. The experiments were performed using MATLAB R2014a with 2.30Ghz Intel® Core™ i3-2350M CPU and 8GB of main memory.

Qualitative Analysis
Instead of objective assessment, all fused images are also evaluated by visual interpretation. A group of 20 people needs to answer two questions (overall quality of the image and human physical appearance) based on the given input and fused images for three different scenarios. The viewer can rate the given question based on the absolute and relative measure as shown in table 1. This evaluation is simple and easy to analyze, however it depends largely on the observer's experience, personal preference and viewing condition [17].

Quantitative Analysis
These images have been measured quantitatively using Image Quality Index [18] without a reference image and also based on fusion factor [19]. The chosen quality matrices for objective assessment are as follows: where AF I and BF I are the similarity information between input images and fused image. Higher value of FF indicates that the image has better quality.

Results and discussion
The results of the fused images are divided into two parts; qualitative and quantitative evaluation. Two methods have been chosen to be compared with the proposed method. The chosen methods are Weighted Averaging Method (AVG) and Discrete Wavelet Transform (DWT). The first two images in figure 2, 3 and 4 represent grayscale visible light and thermal images and also known as input images.
In the thermal images, the thermal radiation of an object in the scene was detected, while the grayscale visible light images provide details of the background information.    table 2 for IE and SD also show that less information can be obtained from the fused image using AVG and DWT methods. Besides, among 20 respondents, 75% of them agreed that the fused image by SWT provide the best quality of image.   As in figure 3, image B shows two person standing side by side in front of the house. The comparison between these three images (figure 3(c), (d) and (e)) clearly shows that the fused image by SWT has high contrast and all the details are very clear. From the results in table 3, the proposed method has the highest values for IE, SD and FF. From the questionnaires, 18 out of 20 respondents stated that the fused image by SWT is clearer and can detect human better than other two methods. As in figure 4(c), (d) and (e), the fused images show a person walking with an umbrella. Although it preserved the thermal image characteristic, it is still lack of background details in figure 4(c) as well as low contrast as indicated in table 4 by the value of IE and SD. The fused image by DWT is clear and quite similar with the input images where the FF value is above 1.0 however, some noise is still present compared to the proposed method. 80% of the respondents agreed that figure 4(e) is the best image in terms of human physical appearance and background details.

Conclusion
This paper proposed a shift invariant wavelet transform known as SWT at pixel level fusion where the two source images were decomposed using SWT, then the appropriate fusion rule were chosen for low and high frequency components. Finally, the fused image was obtained using inverse SWT. The proposed method was compared with two state-of-the-art methods and the experimental results clearly  indicate that SWT provide better quality of images in terms of both objective and subjective evaluations.