Swin enhancer: A low-light smart meter image enhancement method with a multi-layered window attention network

Accurate and efficient identification of visual faults in smart meters is significant to the stable operation of the power acquisition system. However, the electricity meter images collected in dim environments have low brightness problems and lack of detail, which will impact subsequent fault identification based on computer vision. From an unsupervised perspective, this paper proposes a low-light smart meter image enhancement method, Swin enhancer. Originating from the Swin Transformer’s window attention mechanism, this paper designs a Multi-layer Swin Transformer Block (MSTB) to extract regional brightness features through the local window’s attention calculation and provide varying degrees of illumination compensation. At the same time, a shift window mechanism is introduced to interact features among the regions and reduce the possibility of overexposure. Extensive experiments on several benchmark and electric meter datasets demonstrate that our method outperforms state-of-the-art methods on multiple image evaluation metrics.


Introduction
Low-light image enhancement (LLIE) makes information hidden in low lighting visible, thereby improving image quality.In the early research on LLIE, traditional methods usually use artificially designed fixed structures to enhance image brightness.Among them, Lee et al. [1] proposed LDR, which uses the two-dimensional histogram of adjacent pixel differences for enhancement.Based on the Retinex principle, Guo et al. [2] decompose the input into lighting and reflection components through a priori or regularization methods.Recently, deep learning methods have achieved unprecedented brightness enhancement.Several methods use supervised learning training by pairs of normal and low-light images.Retinex-Net [3] uses smooth lighting modules to enhance low-light photos.However, an overly strict supervision process is limited by strict data requirements and may also cause overfitting.Therefore, some researchers have used unpaired images for training based on unsupervised learning.Jiang [4] proposed EnlightenGAN, which uses an attention-guided Unet to perform enhancement tasks and a global-local discriminator for adversarial training.Mapping images into curves, the curve estimation network Zero-DCE [5] performs pixel-level adjustments on the curves within a dynamic range.Ma et al. [6] used a self-calibrating learning method SCI, simplified the network structure, and used a weight-sharing lighting learning mechanism to enhance the image.However, these methods cannot make good use of the global information when extracting image features.If the illumination distribution is uneven, the enhanced image will be underexposed or overexposed.
Therefore, we propose a multi-layered window attention network named Swin enhancer for low-light image enhancement from an unsupervised perspective.We design a multi-layer illumination enhancement network for image generation using a shifting window attention mechanism.Through it, we can simultaneously realize the attention calculation in the window and the information fusion between the windows to guide the image enhancement process with the attention feature map.We conducted comparative experiments on the proposed method on multiple datasets.The Swin enhancer can adaptively perceive the differentiated illumination levels of images and avoid over-enhancement.

Model details
Based on shifted window attention [7] and generative adversarial theory [8], our Swin enhancer offers a multi-layer illumination enhancement network to balance brightness.
Multi-layer illuminance enhancement network.As shown in Figure 1(a), There is a symmetric encoder and decoder in the multi-layer illumination enhancement network.The encoder comprises multiple Multi-layer swin transformer blocks (MSTB) to assign different weights to the image's bright and low-light areas through the window self-attention calculations.If the i-th MSTB module's feature is given as , ij F , the extracted features of the encoding part can be expressed as: where

∋ (
, ij MSTB H φ represents the j-th Swin Transformer layer of the i-th MSTB module.

Multi-layer swin transformer block (MSTB). Figure 1(b)
shows the multiple swin transformer layers (STL) [7], patch segmentation, patch vectorization, and patch unembedding modules in MSTB.The local attention and shifted window mechanism of STL give it obvious advantages in image enhancement.The patch segmentation and patch vectorization layers process images in two steps.First, the image patches are generated, and each patch is mapped to a vector for attention calculation.The vector is then upsampled by the patch unembedding layer and backs to an image.
Local-Global discriminator network.Referred to the Local-Global Discriminator network [4], each branch uses the relative discriminator structure.The standard function is shown in Equations ( 2) and ( 3), where D represents the discriminator's scoring value, r I represents samples in real distribution, and f I represents fake samples.
∋ ( Figure 1.The network structure of Swin enhancer and Multi-layer illuminance enhancement.

Loss function
We will introduce Adversarial loss, Self Fature Preserving loss, and Identity invariant loss used during training in this section.
Self Feature preserving loss.We use Self Feature Preserving Loss as a penalty term to maintain consistency before and after enhancement.The preservation loss SFP L is defined as Equation ( 4), where Adversarial loss.The adversarial loss comes in two parts: global and local loss.Instead of the original Sigmoid function, these two loss functions use the least squares GAN loss [8].For global discriminator, global loss can be defined as: α and 2 α are set to 0.9 and 0.999.The learning rate value is 5e -5 and continues to decay with the training process.

Datasets and evaluation metrics:
There are 485 training image pairs and 15 test pairs in the LOL dataset.At the same time, we used five reference-free natural dark image datasets, including DICM, LIME, MEF, NPE, and VV.Image evaluation metrics mainly include PSNR, SSIM, LPIPS, and NIQE.

Comparative experiment on LOL datasets
Table 1 shows the quantitative results.Our Swin enhancer significantly outperforms other existing unsupervised algorithms in both SSIM and LPIPS metrics, which means that the enhanced images have more natural and realistic visual effects.

Analysis of model generalization ability and electric meter image enhancement
In this section, models trained on the LOL dataset mentioned above are used to perform image enhancement testing on five unsupervised datasets.As shown in Table 2, except for DICM and MEF, the Swin enhancer has the best results on the other three unsupervised datasets, outperforming all supervised and unsupervised methods on the average of five datasets.Figure 2 shows part of the noreference test images and the enhancement effects.The enhanced images of the Swin enhancer have the best visual effects.Especially when dealing with images with uneven illumination, our method can adaptively balance the enhanced brightness according to the different brightness between regions so that the overall brightness of the image is more natural, and the shadow is reduced.However, most of the other comparison methods have the problems of over-enhancement and under-enhancement.The first row of Figure 2 shows a comparison of the results of the enhancement experiment conducted on the electric meter dataset.Our method can automatically adjust the brightness according to the different brightness across areas and make the overall brightness of the picture more natural and with fewer shadows.The details of the enhanced meter image are significantly improved.As shown in Table 2, the first column shows the NIQE of the contrasting methods on the electric meter validation set.Our Swin enhancer gets the best score.This demonstrates that the Swin enhancer model can complete the enhancement task well under qualitative and quantitative conditions, especially on the electric meter datasets.

Conclusion
This paper proposes an unsupervised LLIE model Swin enhancer based on the window attention mechanism.We design a Multi-layer Swin Transformer Block (MSTB) using self-attention computation to guide image light compensation.On the experimental side, we compare the proposed model with multiple traditional and deep learning methods.The results prove that our model is better than other methods on multiple evaluation metrics such as SSIM and LPIPS.For low-illumination electricity meter images, the darker areas are obviously enhanced, and the originally brighter areas are not overexposed, significantly contributing to subsequent fault identification.
() ι √ represents feature map extraction.i I and o I represent the input and enhanced images.

Figure 2 .
Figure 2. Comparison of the enhancement effects of each method on some non-reference datasets.

Table 2 .
NIQE of each method on the non-reference dataset.