The Crack Generation Algorithm of Underwater Bridge Based on Improved Generative Adversarial Network

To address the challenge of obtaining underwater bridge crack images for bridge defect detection, this paper proposes an enhanced CycleGAN algorithm based on a generative adversarial network. Within the encoder-decoder architecture, two key enhancements have been introduced. First, to prevent the loss of information at different scales during training, residual connections with 1x1 convolutional kernels have been added. Second, to prioritize useful feature information during model training, the CBAM attention mechanism has been incorporated. Experimental results demonstrate that the improved model significantly enhances performance, with a 29% increase in the FID index, as well as a 9% improvement in PSNR and a 7% improvement in SSIM.


Introduction
As an important infrastructure related to people's life and safety, bridge condition monitoring is particularly urgent.The rapid development of bridge defect detection technology has great social value.Timely detection and repair of bridge structural problems can effectively prevent bridge collapse and other serious accidents, so as to ensure public safety.Bridge cracks can be divided into water cracks and underwater cracks according to their location.For water cracks, large data sets can be obtained relatively easily with camera shots or drone shots.However, underwater crack detection faces unique challenges, including the complexity of underwater scenes and the difficulty of data acquisition, so the acquisition of large-scale data sets becomes extremely difficult.
With the rapid advancement of deep learning and convolutional neural networks, style transfer techniques have seen increasingly mature applications in image processing.Style transfer refers to a process where the content features of one image are extracted and then represented using the style and texture of another image [1].In 2014, Goodfellow et al. first introduced Generative Adversarial Networks [2].This framework introduced two neural networks, the generator and the discriminator, engaged in an adversarial process to generate realistic data.In 2015, Gatys et al. accomplished the neural network-based style transfer in their research [3].In the same year, Radford et al. introduced Deep Convolutional Generative Adversarial Networks (DCGAN) [4], which incorporated a convolutional neural network structure, enhancing the performance of GANs in image generation tasks.In 2017, Luan et al. described a deep learning method that can transfer the style of one image to another while preserving its content [5].In the same year, Isola and his team, based on the concept of Conditional Generative Adversarial Networks (CGAN), constrained the conditional requirements to paired images as input.They combined this with Convolutional Neural Networks to introduce the Pix2Pix network [6].In the same year, Zhu et al. introduced CycleGAN [7], they broke through the limitation of paired datasets required by the Pix2Pix network.In 2018, Li et al. introduced a closed-form solution that achieved high-quality photo style transfer [8].
In this paper, by applying style transfer techniques, we have extended the traditional CycleGAN model structure.We have introduced residual connections [9] and integrated CBAM [10] attention mechanism residual blocks.This allows us to transform easily obtainable images of surface bridge cracks into images of underwater bridge cracks, thereby addressing the issue of insufficient underwater bridge crack data to some extent.

Theoretical Basis
Although traditional CycleGAN can realize the style transfer between the crack pictures of overwater bridges and the underwater environment pictures to generate the crack pictures of underwater Bridges, the generated pictures have problems such as edge blur, artifacts and noise.In order to solve these problems, this paper improves the network structure and the fusion of CBAM residual blocks.The experimental results show that the improved network structure can better focus the crack information of the original image and retain the original crack information as much as possible in the image generated after style transfer.

Generator Network Structure with Residual Connections
In the training process of traditional CycleGAN, because the convolution kernel continuously extracts the features of the input image information, from a theoretical point of view, each convolution will introduce a new signal, and after continuous accumulation, the information of the original image will be changed, which leads to the loss of the details of the generated image.Therefore, this paper has improved the original network, as shown in figure 1.Through the introduction of residual connections, part of the encoder information is directly transmitted to the decoder output.This structure can well enable the input data to retain its own information in the continuous convolutional output.At the same time, it also has the characteristic of residual, combining the advantages of retaining original information and preventing network degradation.Therefore, in this paper, 1×1 convolution check input is added after each splicing to reduce the dimension of the channel.1×1 convolution kernel can be used to mix and combine the information in the channel.This operation introduces more nonlinear transformations to help the model learn features better and increase the expression ability of the network.

Residual Structure of Integrated CBAM Attention Mechanism
In order to make CycleGAN generator pay more attention to details, this paper introduces fusion CBAM residual block into residual block.Firstly, CBAM represents the attention mechanism module of convolutional module, which is a combination of space and channel.Its network structure is shown in figure 2. As can be seen in figure 2, the convolution input data first passes through a channel attention mechanism to get weighted results, and then passes through the spatial attention mechanism to finally output weighted results.The network structure of the channel attention mechanism is shown in figure 3. First, global average pooling and global max pooling are applied to the feature maps of each channel to obtain the average and maximum values for each channel.These values are processed through a shared MLP to learn relationships between different channels.The output of the shared MLP is then passed through a sigmoid activation function, generating channel attention weights.These weights are applied to the feature maps of each channel to weight the features, emphasizing the information from important channels.The network structure of the spatial attention mechanism is shown in figure 4. First, the input information undergoes per-channel global max pooling and global average pooling operations.Subsequently, the results of these two operations are concatenated along the channel dimension.Following this, a convolutional operation is applied to reduce the dimension to one channel.Then, a spatial attention weight feature is generated through the Sigmoid function.Finally, this weight feature is element-wise multiplied with the input feature of the module to obtain the ultimately generated feature.The fused residual block structure incorporating CBAM, as illustrated in figure 5, is designed in this study.Input image information is processed simultaneously through a convolutional layer, a residual connection, and a CBAM attention module.This design was chosen to preserve both the global contextual information provided by the self-attention mechanism and to prevent network degradation through the additional residual connection.Formula (1) serves as the mathematical representation of this design.

Experimental Environment and Dataset
This paper is based on windows10 system and uses deep learning framework Pytorch1.12.0 and programming language python3.9.12 to build the CycleGAN model.The GPU model is NIVIDIA Geforce RTX3060 and the video memory size is 12GB.
The training set selected in this paper is taken from the Crack Detection [11], a total of 1,500 photos, and the underwater dataset is taken from the UIQS [12], a total of 1,500 photos.Meanwhile, an additional 400 photos are selected as the test set.
The learning rate linear attenuation strategy was adopted in this paper, and 200 epochs were trained.The first 100 epochs kept the learning rate unchanged at 0.0002, while the last 100 epochs linearly attenuated.A smaller learning rate could prevent mode collapse during style transfer, so it was set to 0.0002.At the same time, the output image size is 256×256.

Subjective Analysis of Results
In this paper, aiming at the problems of fog and blurred shooting in underwater environment images, Dark Processing [13] and DeblurGAN [14] were adopted to remove fog, and the results were shown in the figure 6.There is ambiguity in the details of the original drawing.For example, a lot of texture information on the rocks in the figure 6(a) is lost.Therefore, DeblurGAN algorithm is introduced in this paper to achieve the deblurring of underwater images, so as to get the figure 6(b).After the image is blurred, the details of the picture are more clear, but there are still shortcomings, from the overall point of view, the color is not delicate enough, that is, there is fog.Therefore, on the basis of deblurring, this paper further introduces the dark channel processing algorithm to achieve the function of de-fogging.
The resulting figure 6(c) is better than the previous two pictures in terms of texture and detail performance, and the picture quality has been improved from an intuitive point of view.In this paper, subjective comparative experiments were conducted, encompassing real underwater crack images, the output results of the original CycleGAN processing the original and improved datasets, and the results obtained using our algorithm, as shown in figure 7. The ultimate objective of image style transfer is to transform the underwater environment into underwater bridge cracks, emphasizing the importance of preserving the bridge cracks while considering the color variations in the underwater environment.By analyzing figure 7(a) and figure 7(b), it becomes evident that the crack images generated by the original model are severely blurred and exhibit artifacts.To mitigate the artifacts seen in figure 7(b), this paper applies dark channel dehazing and DeblurGAN deblurring techniques to the original dataset, resulting in figure 7(c).Although the artifacts have been eliminated, the cracks in the image remain blurred.
Therefore, this paper introduces an improved CycleGAN algorithm based on the fusion of residual connections and CBAM attention mechanism residual blocks, leading to figure 7(d).It can be observed from figure 7(d) that the proposed algorithm ultimately maximizes the preservation of the original crack information while enhancing the overall image quality.

Objective Analysis of Results
The results of ablation experiments in this paper are shown in table 1.We choose the FID [15] index as the main, PSNR and SSIM [16] index as the auxiliary evaluation method.Compared to experiments 1 and 2, the FID value of the results obtained by only adding the CBAM attention mechanism decreased by 11%, indicating a greater difference in feature distribution between the two images.Additionally, PSNR and SSIM decreased by 5% and 2%, respectively.When only the CBAM attention mechanism is added, the absence of residual connections results in the loss of original information.The attention mechanism excessively focuses on irrelevant areas while neglecting crucial regions.This makes the model's training more challenging, potentially leading to it getting stuck in a local optimal solution and ultimately causing a decrease in performance.Compared with experiments 1 and 3, the FID value of the output result of only improved residual connections decreased by 15% compared with the original, and PSNR and SSIM decreased by 23% and 16% respectively.If only residual connections is added to the network structure, although the loss of original information will be alleviated in the feature extraction of the neural network, information redundancy will also occur, which further leads to the difficulty of convergence in the training of the neural network.Therefore, the final performance of the model will be degraded.
Compared with experiments 1 and 4, the model improvement of residual connections and attention mechanism was added at the same time, which could better improve the model performance, FID index increased by 29%, PSNR and SSIM increased by 9% and 7% respectively.Because, the residual connections and the attention mechanism can complement each other.The residual connections, which helps information propagate better across neural networks, and attentional mechanisms, which help models focus on useful features, can further improve model performance.

Conclusion
Acquiring underwater bridge crack images is a challenging task, and implementing bridge crack detection using deep learning requires a substantial dataset of crack images.Therefore, this paper introduces an algorithm for augmenting the dataset of underwater bridge crack images.Given that real underwater images often suffer from poor image quality, the approach begins by applying dehazing and deblurring processes to enhance the clarity of these authentic underwater images.Subsequently, the algorithm is improved by incorporating residual connections and the CBAM into the generator of the original CycleGAN model.As a result of these enhancements, the improved CycleGAN algorithm is capable of generating underwater bridge crack images with higher image quality than the original version.

Figure 1 .
Figure 1.Generator network structure with residual connections.
2023 4th International Conference on Mechanical Engineering and Materials Journal of Physics: Conference Series 2694 (2024) 012071

Figure 7 .
Figure 7. (a) Realistically captured ground cracks.(b) Output results from the original CycleGAN trained with untreated underwater environment images, without dehazing and deblurring processes.(c) Output results from the original CycleGAN trained with dehazed and deblurred underwater environment images.(d) The results of the improved CycleGAN.