Low illumination image enhancement based on attention mechanism and global illumination estimation

Aiming at the problems of insufficient or excessive local brightness enhancement, color distortion, and excessive noise in the existing low-light image enhancement algorithms, a low-light image enhancement method combining attention mechanism and global illumination estimation is proposed. First, the illumination distribution map of low illumination image is obtained through the illumination distribution estimation network coupled with the attention gate mechanism. Then, the weight of the light distribution map is learned in the feature attention module. Finally, the image details are fused by the detail reconstruction module to create improved image. According to the experimental results, this method may effectively improve image brightness, contrast, and color in subjective visual effects while also enhancing objective evaluation indicators like PSNR, SSIM, and MSE when compared to some conventional methods.


Introduction
The quality of the image that is obtained will change as the lighting intensity changes because the lighting conditions have an impact on the image acquisition process.Low signal-to-noise ratio, low contrast, and low resolution are common issues with images taken in low-light environments, and they present significant challenges for subsequent image processing, target detection, semantic segmentation, and other tasks.
Relevant academics have proposed a variety of low-light image enhancement techniques to solve the issue.There are two traditional methods: (1) Histogram equalization (HE) based method.By modifying the image's histogram distribution, this technique improves the contrast and brightness of the image.Despite being quick and simple to calculate, this approach has certain drawbacks, including color distortion and obscured details. (2) Method based on Retinex theory [1] .Retinex theory is based on the consistency of color perception of human visual system.The original image is regarded as the product of the illuminated image and the reflected image.By reducing the influence of the illuminated picture on the original image and deconstructing the reflected image, the approach principle based on Retinex theory aims to enhance the visual impression of the image.On the basis of Retinex theory, single-scale Retinex algorithm [2] (SSR), multi-scale Retinex algorithm [3] (MSR) and multi-scale Retinex algorithm with color recovery [4] (MSRCR) were suggested by Jobson et al; Guo et al. [5] proposed a LIME low-light image enhancement method in combination with illumination area estimation; Ren et al. [6] combine denoising and weak light enhancement to achieve low-light image enhancement.Although traditional methods based on Retinex theory can enhance an image's brightness and contrast to some extent, challenges such as color distortion and low contrast still remain.
Inspired by the problem of image noise reduction [7] , detail retention [8] and illumination estimation [9] , based on the effectiveness of the most recent depth learning technology [10] , this research suggests a low-light image enhancement method.The network is composed of three types of modules, namely, illumination distribution estimation module, feature attention module and detail reconstruction module.The idea of this method is: a) obtain the illumination distribution of low-light image through the illumination estimation module; b) In the feature attention module, the weight of the light distribution map is learned; c) The detail reconstruction module learns more image details from the original image to obtain the final output.In this way, the technique described in this research can fully accomplish the low-light enhancement assignment and increase the image quality from different aspects.

Related work
This section introduces the attention gate mechanism [10] (AG), introduces the macro concept of attention mechanism, and focuses on a class of attention algorithm CBAM [11] (Convolutional Block Attention Module) used in the proposed model, paving the way for the detailed description of this algorithm.

Attention gate mechanism
The attention gate mechanism, as shown in Figure 1, is an attention mechanism used for low-light image enhancement.By inputting image features  and  , it finally obtains the image feature  . is the output of the down-sampling layer, while  is the image feature obtained by the up-sampling layer.Different brightness information may be found in these two picture features.Brightnessinsensitive features, such as noise, respond less after going through the attention gate module, allowing the output feature  to carry more brightness information.It is subsequently fed into the subsequent up-sampling layer, which enhances the network's capacity to learn about brightness characteristics.

Mixed domain attention mechanism CBAM
The convolutional block attention module (CBAM) is implemented as follows: First, take the image to be processed as input, and this module will first calculate along the channel attention module to get the channel attention map.The second step is to multiply the input feature map with the result map of the previous step, and calculate along the spatial attention module to get the spatial attention map.In the third step, the input feature map is multiplied by the result map of the previous step to obtain the refined adaptive feature map.    3. The link between channels of features is used to construct the channel attention graph.Each feature map channel functions as a feature detector.Reduce the spatial dimension of the input feature map in order to determine the channel attention.So as to gather a lot of spatial info, average pooling is widely used.Since maximum pooling can collect the most prominent features of a unique object, it can calculate the attention of a more detailed channel direction.Therefore, the channel attention module uses both average pooling and maximum pooling.The channel attention module's implementation goes as follows: In order to create two spatial context descriptors,  and  , the spatial information of the feature map must first be gathered using average pooling and maximum pooling.Sending them into a shared network in order to create a channel attention map  ∈  × × , Call it ′, is the second stage.

Spatial attention module PA.
The spatial attention module(PA) in the convolution attention mechanism module is shown in Figure 4. Spatial attention concentrates more on the informational component of particular positions in order to enhance the channel attention.Average and maximum pooling in the direction of the channel axis are required to determine spatial attention, and we must then parallel this result to create an efficient feature descriptor.The purpose of this is to effectively highlight the information area.On the feature descriptor, the spatial attention graph  () ∈  × is generated by using convolution operation.The spatial attention module is implemented as follows: First, gather the channel information of the feature map by using the average pooling and the maximum pooling to generate two feature maps.The second step is to connect the two feature maps through standard convolution layer.Finally, it is sent to the network for convolution operation to generate spatial attention map.

Light distribution estimation module
The light distribution estimation module has three sub-steps: scale the input image to a specific resolution, predict the illumination distribution through the U-net network based on the attention gate mechanism, and re-scale it to the original resolution.First, the input is down-sampled to 96×96 characteristic graphs by bilinear interpolation.Then comes the convolution layer with ReLU.Then, the feature map is down-sampled through a series of cascaded down-sampling blocks.We introduce the attention gate [10] (AG) into the light distribution estimation module to improve the model accuracy and sensitivity to the foreground pixels, and at the same time do not need a lot of calculations.The attention gate can stably reduce the characteristic response in the irrelevant background region.In this paper, the additional attention gate is realized by skipping the connection before the cascade operation.The gradient from the background region is lessened throughout the back-propagation phase.In each multi-scale level, this permits the model parameters in the preceding layer to be changed in accordance with the geographic area linked to the given job.After a series of symmetrical upsampling blocks, 96 × 96 Characteristic diagram.Rescale the feature map using an additional upsampling block to the size of the original input.

Feature attention module
The original input contains more details, so it can provide more information for detail recovery, and the shallow and deep feature maps can be fused to complement each other, thus enhancing the expression ability of features.Cascade rather than skip layer connection is used to combine the characteristic images of the output and input images of the illumination distribution estimation module, so that the original information and illumination estimation information can be completely retained and transmitted to the next step.After the cascade layer is the convolution attention mechanism module CBAM, which is a lightweight feedforward convolution neural network, which is very suitable for computer vision tasks.
CBAM module can focus on important features, and at the same time can suppress redundant features.The process of information feature extraction by convolution operation is as follows: The module first employs the channel attention module to discover the information about the local features on the channel.Second, in order to improve the network's information flow, the spatial attention module is utilized to learn the position feature information in space.
After multiplying the input feature map by the channel attention map to obtain  ,the spatial attention map of  is computed.Finally, the two are multiplied to get  , and  represents the refined feature map.The redundant color characteristics are refined by the network under the direction of the CBAM channel attention module.The spatial attention sub-module integrates the channel information from each space and makes advantage of the non-local correlation in the image to concentrate on the distribution of light in various regions, enabling focused detail reconstruction and enhancing illumination recovery.

Detail reconstruction module
Since there is no true value of the illuminance component, for the enhancing process to be learned flexibly, a mechanism must be built between the regular image and the low-light image.The detailed reconstruction module is shown in Figure 5.The network is lightweight and mainly consists of four convolution layers.A channel attention module (CA) is added between the two convolution layers.The relationship between the characteristic channels is used to create the channel attention map, so that the distribution of illumination information can be accurately observed, and a mapping function is learned to arbitrarily convert images of multiple illumination levels into images of another illumination level, This will guide the enhancement network to carry out adaptive enhancement of various bright areas in the image, and then realize the enhancement of illumination components.At the end of the network, the characteristic graph of the connection is reduced to three channels through a 1×1 convolution layer.

loss function
The usage of standard error indicators like MSE and MAE is insufficient to address the issue of improving the image quality from both subjective and objective viewpoints.Therefore, the regional difference between the structure information and the image is further considered, and Formula (1) displays the calculating formula for it.
Where:  ,  and   represent the average absolute error, structural loss and regional loss respectively,  ,  and  represents the parameter used to adjust the loss weight of  ,  and   .Different data sets for different training may affect the value of  ,  and   , but its value is to balance the magnitude and convergence speed of different parts of the loss function.In general, use the super parameter to adjust the loss to the same magnitude first, and the value in this paper is  =5,  =1 and  =3。These losses will be further explained below.

Mean absolute error.
The enhanced picture E of the low illumination image I and the equivalent normal exposure ground-truth image G are distinguished by the mean absolute error (MAE).In formula (2), the loss function is displayed.
N stands for the number of training samples in the formula, ‖ • ‖ represents the L1 norm.
3.4.2.Structural loss.Some scholars proposed to use the structural similarity (SSIM) evaluation standard based on human visual system heuristic as the loss function.The pixel level SSIM mathematical expression is shown in equation (3).
Where: image E's and image G's average pixel values are represented by  and  , respectively;  ,  corresponds to the variance of the image respectively;  is the covariance of two images;  = ( ) and  = ( ) are two constants used to maintain the stability of the function, which can prevent the denominator from zero, Where L is the pixel value's dynamic range, and  = 0.01,  = 0.03。 Where: N stands for the quantity of training samples.
3.4.3.Regional losses.The above loss functions take the image as a whole, but for dim light enhancement tasks, more attention should be paid to those dim light areas.Referring to the idea of MBLLEN [12] and defining the region awareness function as shown in formula (5).
Where:  and  are the weak light areas of the enhanced image and ground-truth;  and  make up the remainder of the picture, and  =4,  =1.

Experimental environment, parameter setting and data set
In this paper, the GPU used in the experiment is configured as Nvidia GTX 2080 GPU, the deep learning architecture used in the training is TensorFlow-GPU, epochs is 200, batch-size is 16 whereas the patch-size is 256.The Adam optimizer is used to improve the models, and the learning rate is 1 × 10 , the learning rate decreases to 0.99 times after each epoch.
The LOL data set [1] was used in this experiment.There are 485 pairs of training sets, 15 pairs of test sets, and the size of pictures in the data set is 600×400×3.

Subjective and objective evaluation and comparison with mainstream methods
In order to confirm the model's efficacy in this study, a comparative test is carried out with the traditional MSRCR [4] ,LIME [5] methods and Zero-DCE [13] ,GLADNet [8] , RetinexNet [14] , MBLLEN [12] and EnlightenGAN [15] algorithms based on deep learning.For a more fair comparison, this paper uses the open source code and recommended parameter settings provided by the above algorithm authors to generate results.Figure 6 shows how badly exposed the improved picture of the MSRCR approach is, with a substantial lack of details and an unauthentic image.LIME algorithm has better overall image processing effect than MSRCR, but there is a certain gap between the local brightness of the enhanced image and the real image, and there is a certain loss of detail information.Problems with the GLADNet and Zero-DCE methods include color distortion and inadequate lighting augmentation.RetinexNet algorithm enhances the image with clear colors, but it is too enhanced, and there is more noise compared with the real image, resulting in distortion.The MBLLEN approach produces an augmented image that is often dark and shadowy.In terms of subjective effects, EnlightenGAN and the method suggested in this research are comparable, and both significantly improve the lowillumination image.However, the image improved by our approach is superior in terms of detail and brightness.
4.2.2.objective evaluation.The objective evaluation results of peak signal-to-noise ratio (PSNR), structural test standard (SSIM), mean square error (MSE), and other indicators are used in this work to more properly compare the performance of various approaches.PSNR is a peak signal to noise ratio index.SSIM represents a full-reference image evaluation quality index, representing the structural similarity index, including brightness contrast and the specific structure of the image.The image's MSE stands for mean square error.The outcomes are displayed in Table 1.In the table,  indicates that the higher the value of the evaluation index, the better the result;  means that the lower the value of the evaluation index, the better the result; The best and suboptimal results are indicated by bold and underlined numbers, respectively.Table 1 shows that the method used in this research has the greatest PSNR value of the three test photos, as well as the highest SSIM value of the three images.This shows that the enhanced image of the algorithm in this paper is less affected by noise, less distortion and high quality.The improved image is the most faithful to the original, and its quality outperforms that of other methods.Secondly, for MSE, the algorithm in this paper is the lowest, which shows that the enhanced image in this paper is of higher quality than other algorithms, and the processed image is the one that is most similar to the original.Overall, the objective evaluation index of the low light enhancement algorithm in this paper is the best.

Conclusion
The current research splits the illumination enhancement problem into various modules for processing and develops a novel low illumination picture enhancement approach to address the issue of insufficient or excessive local brightness enhancement in the existing low-light image enhancement methods.By introducing the attention gate mechanism into the illuminance distribution estimation module, the learning ability of image brightness features is enhanced, and the low-light image enhancement task is targeted.This technique's network is made up of three parts: light distribution estimation module, feature attention module and detail reconstruction module.The light distribution of the low-light image is obtained through the light estimation module, and the weight of the illumination distribution map is learned in the feature attention module.Finally, more image details are learned from the original image through the detail reconstruction module to achieve higher image quality.After the low-light image is enhanced by this method, the color is more natural, with better brightness and contrast, and less artifacts and noise.The method in this paper is reasonably effective when using multiple evaluation indicators to evaluate while ensuring the enhanced quality when compared to other widely used methods.

Figure 2
displays the structure diagram of the CBAM.

2. 2 . 1 .
Channel Attention Module CA.The channel attention module (CA) within the convolution attention mechanism module is shown in Figure

Figure 3 .
Figure 3. Channel Attention Module (CA).The channel attention module's implementation goes as follows: In order to create two spatial context descriptors,  and  , the spatial information of the feature map must first be gathered using average pooling and maximum pooling.Sending them into a shared network in order to create a channel attention map  ∈  × × , Call it ′, is the second stage.

Figure 4 .
Figure 4. Spatial attention module (PA).The spatial attention module is implemented as follows: First, gather the channel information of the feature map by using the average pooling and the maximum pooling to generate two feature maps.The second step is to connect the two feature maps through standard convolution layer.Finally, it is sent to the network for convolution operation to generate spatial attention map.

Figure 5
depicts the paper's network structure.The image enhancement problem is decomposed into different sub-problems, which can be solved separately through multi-module fusion to produce the final output.A low-light color image serves as the network's input, and its output is a correspondingly ICCEE-2023 Journal of Physics: Conference Series 2589 (2023) 012007 sized improved clear image.The whole network architecture and data processing process are shown in Figure 5.The image enhancement network is composed of three types of modules: light distribution estimation module, feature attention module and detail reconstruction module.

Figure 5 .
Figure 5. Low illumination image enhancement model based on attention mechanism and global illumination estimation.

4. 2
.1.subjective evaluation.Comparing this method with other seven methods, Figure6displays the outcomes of the experiment.

Figure 6 .
Figure 6.Comparison of enhancement effects of several enhancing techniques on low-light image.Figure6shows how badly exposed the improved picture of the MSRCR approach is, with a substantial lack of details and an unauthentic image.LIME algorithm has better overall image processing effect than MSRCR, but there is a certain gap between the local brightness of the enhanced image and the real image, and there is a certain loss of detail information.Problems with the GLADNet and Zero-DCE methods include color distortion and inadequate lighting augmentation.RetinexNet algorithm enhances the image with clear colors, but it is too enhanced, and there is more noise compared with the real image, resulting in distortion.The MBLLEN approach produces an Therefore, the loss function based on SSIM is shown in equation (4). 6∈

Table 1 .
Comparison of image quality evaluation indicators.