Sea ice detection network for icebreakers in polar environments with attention-based deeplabv3+ architecture

Shipborne sea ice detection aboard icebreakers plays a paramount role in polar navigation. The continuous evolution of deep learning semantic segmentation networks has promoted the advancement of sea ice detection tasks. At this stage, there are relatively few studies on shipboard sea ice detection, and the accuracy of polar sea ice detection will be reduced due to problems such as blurred sea fog and indistinct boundaries. In this study, a shipboard sea ice detection dataset is constructed, and a sea ice detection method that combines multi-branch attention feature alignment and multi-scale feature extraction is proposed. The heterogeneous receptive field enhancement atrous spatial convolution pooling pyramid module is designed, and the feature alignment module based on the attention mechanism is constructed, which strengthens the model’s extraction of sea ice features and elevates representation performance. Experimental results underscore the heightened precision of our approach in sea ice detection, to some extent alleviating the issue of missed detections in new ice. It constitutes a positive contribution towards advancing shipborne sea ice detection in polar environments.


Introduction
To ensure secure and efficient maritime operations in polar environments, the acquisition of precise and reliable sea ice information becomes imperative.Numerous scientific endeavors and practical applications rely on accurate sea ice detection across various temporal and spatial scales, encompassing polar navigation, tourism, fisheries, polar scientific expeditions, and more.Shipborne sea ice detection for icebreakers assumes a pivotal role in any polar expedition.Effectively detecting sea ice serves as a cornerstone in providing navigational assistance for subsequent vessels operating in polar waters.This process typically relies on dedicated ice experts to identify and report on sea ice conditions, but with increasingly busy shipping routes and an ever-increasing number of ships in polar waters, the number of ice experts is no longer sufficient.Thus, the significance of polar sea ice detection holds profound implications, echoing far-reaching importance within this distinctive environment [1][2][3][4].
Traditional image processing methods have demonstrated efficacy in sea ice detection under certain conditions [5][6].However, the polar environment, characterized by its harshness, presents a formidable challenge.Images of sea ice captured by onboard systems of icebreaker vessels exhibit complexity, influenced by variable lighting conditions that interact dynamically with sea ice and water, giving rise to glare.Factors such as sea ice type, size, color, texture, and shape further complicate the sea ice detection process [7].Therefore, the development of a high-performance, precise, and robust shipborne sea ice detection algorithm is a challenging endeavor within this demanding context.Deep learning convolutional neural networks have become an established paradigm for modern semantic segmentation problems [8].Prominent examples encompass the fully convolutional neural network (FCN) [9], SegNet [10], U-Net [11], PSPNet [12], and DeepLabV3+ [13].Given the profusion of semantic segmentation networks, the challenge lies in judiciously selecting the most apt and viable network architecture for a given task.
The advancement of deep learning presents an opportunity to bestow efficient automation to the domain of sea ice detection.Numerous scholars have incorporated convolutional neural network-based semantic segmentation algorithms into sea ice detection tasks, exhibiting superior accuracy, robustness, and efficiency compared to traditional methods.Zhang et al. [14] proposed a CNN deep network ICENet that fuses position and channel attention features, to detect river ice from drone footage of the Yellow River in China captured by UAV.Kim et al. [15] and Pedersen et al. [16] harnessed cruiseacquired sea ice imagery for classification, partitioning the images into nine distinct ice categories.Kim et al. [17] enhanced the U-net architecture, enabling automatic recognition of surface vessel ice features.Boulze et al. [18] employed Sentinel-1 Synthetic Aperture Radar (SAR) data with convolutional neural networks for remote sensing sea ice classification.Han et al. [19] proposed a method for sea ice image classification based on heterogeneous data fusion and deep learning, effectively fusing SAR and optical imagery to enhance the precision of remote sensing sea ice classification.Gao et al. [20] evaluated accurate observations of remote sensing sea ice using reflected signals from the BeiDou Geostationary Earth Orbit (GEO) satellite.Gao et al. [21] introduced a convolutional-wavelet neural network-based approach for sea ice change detection, mitigating the influence of inherent speckle noise in multi-temporal SAR images.In summary, the majority of current sea ice detection algorithms rely on satellite remote sensing imagery rather than shipboard camera images.However, satellite-based sea ice detection faces challenges due to remoteness, yearround operational constraints, and low satellite communication reliability in high-latitude regions.Therefore, in this study, we concentrate on real-time and efficient sea ice detection using optical imagery acquired from shipboard cameras [22].
The scarcity of extensive sea ice imagery captured by shipboard cameras presents yet another challenge to the realm of deep learning-based sea ice detection.To the best of the authors' knowledge, a comprehensive dataset annotating various types of sea ice in polar environments is currently absent in the literature.Dowden et al. [23] pioneering in this field, captured images aboard the Nathaniel B. Palmer during a two-month Antarctic expedition, marking the first endeavor to classify sea ice using in-situ shipboard camera imagery.
The DeepLabV3+ network showcases an elegantly simple encoder-decoder architecture and adeptly integrates multiscale feature extraction.It demonstrates favorable segmentation efficacy for refining details in sea ice imagery and segmenting diminutive targets.To deal with the challenge of diminished detection accuracy in polar sea ice imagery, we devise an improved DeepLabV3+ that harmonizes multi-ranch attention feature alignment and multi-scale feature extraction for shipborne sea ice detection.The experimental results show that our method improves the accuracy of sea ice detection and solves the problem of new ice leakage detection to a certain extent.

Dataset description
We have curated a shipborne sea ice image detection dataset, bolstering our endeavor.As shown in figure 1, the sea ice detection data set is a multi-category dataset, which is derived from real scenes of real polar icebreakers.These images were captured by a GoPro camera mounted on the Nathaniel B. Palmer icebreaker during a two-month expedition in the Ross Sea, Antarctica.The route map is shown in figure 2. The dataset consists of 720p and 4K HD images captured from fixed-position cameras on icebreakers.The images were taken at regular intervals throughout the day, amounting to over 1600 images daily.Each day exhibits an array of varying conditions encountered during the icebreaker's voyage, ranging from noon sunlight to gray skies and sunsets.Additionally, some days feature precipitation on the lens, leading to a blur in the overall image.The images predominantly encompass ice formations, vessels, the ocean, and the sky.Table 1.Image properties and parameters aptured by Nathaniel B. Palmer.

Data pre-processing
In order to ensure the precision and viability of the sea ice detection dataset, facilitating a more comprehensive exploration of the merits of our proposed approach, we opted for a collection of images exemplifying typical scenarios.We carefully curated images of high clarity and distinctiveness, encompassing a variety of ice coverages, types, lighting conditions, and environmental contexts often encountered during polar expeditions.This diversity was introduced to enhance the dataset's comprehensiveness.Nighttime imagery, due to its limited sample size, was omitted from the selection process.Approximately 50 images were chosen each day, meticulously annotated at the pixel level using the labelme image annotation software.The pixel-level annotations categorized the pixels within the images into six distinct classes, including ocean, sky, vessels, new ice, and multi-year ice, as shown in figure 3.

Improved DeepLabV3+
Building upon the foundation of DeepLabV3+, we propose an enhanced methodology for sea ice detection.The primary objective is to address the issue of missegmentation caused by the indistinct boundary between sea ice and the ocean.This novel methodology incorporates a new ASPP (Atrous Spatial Pyramid Pooling) module while retaining the fundamental encoder-decoder architecture of the base model.By integrating an attention mechanism module, the model effectively enhances the segmentation precision and efficiency of sea ice detection, as depicted in figure 4.During the decoding phase, an attention-based feature alignment module guides the alignment of high and low-level features, diminishing the noise generated during their fusion and augmenting feature learning capacity.From the backbone, two branches of shallow-level features are selected.These features undergo CBAM processing, followed by upsampling and concatenation.This approach meticulously mines both channel and spatial features from the foundational cloud images' shallowlevel input features.These refined features, combined with deep-level feature maps, are then channeled into feature alignment module based on attention mechanism (A-FAM), utilizing crosslayer connections to harness the information borne by shallow-level features, further enriching semantic and detailed image information.After feature refinement through a 3x3 convolution, bilinear interpolation upsampling is utilized to restore feature map dimensions to the original image size, mitigating the loss of certain feature information due to excessive sampling strides.

Feature alignment module based on attention mechanism
In the decoding phase, DeepLabv3+ directly integrates the deep-level feature map extracted by the ASPP model.This map is subsequently upsampled by a factor of four and concatenated with the shallow-level feature map.However, this approach introduces redundant spatial information and overlooks the alignment challenge between deep and shallow features.In reality, distinct channels within feature maps from various depths carry varying feature information, each with significantly differing relevance to the target.The shallow-level feature map contains a wealth of intricate details conducive to achieving high-resolution segmentation of sea ice images, delineating clear edges within the ice images.As the network deepens, feature maps incorporate more advanced abstract semantic information, vital for region pixel classification and identification.Direct concatenation in such a scenario introduces noise and compromises the subsequent feature learning phase.
The design of the A-FAM module is inspired by the concept of Squeeze-and-Excitation Networks (SENet) [24].However, beyond incorporating a pathway utilizing global average pooling to update weights, it introduces an additional pathway that employs global maximum pooling for channel attention.This inclusion serves to mitigate noise contamination within sea ice feature representations.The architectural diagram of A-FAM is depicted in figure 5.

Experimental parameters
The model parameters were optimized using the Adam optimizer.Additionally, a blend of the Momentum optimizer and the Poly learning strategy was harnessed, with momentum set at 0.9.Adhering to the tenets of transfer learning, the training process spanned 150 epochs, wherein the initial one-third was designated as the freezing phase and the subsequent segments constituted the thawing phase.The selection of the cross-entropy loss function facilitated loss computation.Furthermore, the Dice Loss was introduced to calculate loss on a global scale.

Evaluation metrics
In this study, the evaluation metrics employed for sea ice detection encompass accuracy, mean pixel accuracy (MPA), and mean intersection over union (MIoU).The MIoU, in particular, holds the pivotal role of quantifying the alignment between actual and predicted labels, serving as the decisive criterion to assess the efficacy of sea ice detection.

Result analysis
The LOSS and MIOU curve during the improved DeepLabV3+ model training are shown in figure 6.The LOSS curve exhibits a steep decline in the initial 5 epochs, followed by a brief surge at the onset of the unfreezing phase, subsequently tapering down.Beyond 80 epochs of training, the curve stabilizes, signifying convergence of the model.Similarly, the MIOU curve reaches approximately 70% within the first 5 epochs, then demonstrates a slight upward trend.However, a transient decline emerges at the inception of the unfreezing phase.
We visualized the predicted renderings.It can be intuitively seen from figure 7 that the model we designed has excellent sea ice detection performance.By comparing with the original image, groundtruth map, and predicted image, we can see that this method solves the problem of inaccurate detection caused by the blurred boundary between new ice and ocean to a certain extent.As shown in Table 2 of the ablation experiment results, it is discernible that the HRFS-ASPP and A-FAM demonstrates a notable enhancement in the capacity to extract features from sea ice imagery.In scheme 1, after improving the ASPP module to a HRFS-ASPP module, substantial gains are observed in Accuracy, MPA, and MIOU.This indicates the model's successful fusion of multiscale information from deep sea ice features, bolstering inter-layer correlations and elevating information utilization.In scheme 2, after the CBAM attention mechanism is introduced into deep and shallow features, the three evaluation index data are improved a little.In scheme 3, the introduction of the A-FAM following the integration of deep and shallow features results in substantial improvements once more.MIOU attains an increase of around 1.6%, signifying a fortified capacity for feature learning.As for the scheme 4, MIOU still improves, reaching 90.25%.Furthermore, the metrics of Accuracy, MPA, and F1-score achieve remarkable values of 94.06%, 94.03%, and 94.02% respectively, which is the best among the improved versions.To summarize, the detailed quantified data highlights that the model in scheme 4 positively contributes to the augmentation of sea ice detection accuracy.

Comparative experiments.
To comprehensively and accurately assess the enhanced performance of the improved DeepLabV3+ model on shipborne sea ice imagery detection, and to validate the efficacy of the model improvements, this study conducts a comparative evaluation experiments among our method and other models, including FCN-8S [25], SegNet, U-Net, PSPNet, and DeepLabV3+, using the sea ice detection dataset.The results are presented in table 3. The data results reveal that FCN-8S exhibits the poorest detection performance, significantly trailing behind other methods, with an Error Rate exceeding 10%.This indicates its failure to fully integrate semantic information across different layers.U-Net, PSPNet, and DeepLabV3+ demonstrate comparable detection outcomes, achieving 83.49%, 84.62%, and 85.80%, respectively.These results signify that U-Net benefits from the inclusion of skip connections between different-level feature maps, enhancing spatial information restoration.PSPNet leverages a hierarchical global priority, incorporating multiscale information from different subregions to enhance detection precision.DeepLabV3+, with its ASPP module, effectively combines features of different scales, leading to improved detection accuracy.Furthermore, the results underscore that using DeepLabV3+ as the foundational model for targeted enhancements is undoubtedly the optimal choice for sea ice detection.Notably, our approach attains the most outstanding results, with all metrics surpassing those of other models, thus affirming the ability of the proposed model in attaining robust detection performance for shipborne sea ice imagery.

Conclusion
This study comprehensively investigates the sea ice image detection task captured by the icebreaker shipboard camera in the polar environment.We have devised a sea ice detection approach that combines multi-branch attentional feature alignment and multi-scale feature extraction, leveraging the DeepLabV3+ architecture.This method effectively explores deep-layered features within sea ice images, augmenting the precision of sea ice detection.Commencing our endeavor, we created a sea ice detection dataset from images obtained aboard the Nathaniel B. Palmer icebreaker, encompassing original images paired with annotated labels.Subsequently, we employed our proposed sea ice detection methodology on this dataset to evaluate and verify its effectiveness.
The results of ablation experiments and comparative analyses of various methods show that our method performs better in feature extraction of sea ice images.It improves the accuracy of sea ice detection, solves the problem of missing new ice detection to a certain extent, and effectively overcomes the problem of decreased detection accuracy caused by image blurring due to sea fog.
In our forthcoming investigations, we shall continue to delve deeply into detection methodologies tailored for blurry sea ice images.We consider introducing self-attention to enhance feature extraction to overcome the problem of reduced accuracy due to image blur, with a focus on enhancing new ice detection accuracy.Concurrently, we intend to expand the dataset by incorporating navigation data from diverse polar environments, thereby enhancing its representativeness.Furthermore, real-time dynamic detection of sea ice stands as a focal point for future research.We will delve into algorithms for real-time detection of sea ice in shipborne images.In conclusion, this study exerts a positive influence on advancing the realm of sea ice detection tasks.

Figure 1 .
Figure 1.Example of sea ice imagery captured by a camera on an icebreaker in a polar environment.

Figure 2 .
Figure 2. The route of the Nathaniel B. Palmer icebreaker in the Ross Sea, Antarctica.Table1.Image properties and parameters aptured by Nathaniel B. Palmer.

Figure 3 .
Figure 3. Sample image of the dataset, (a)Original image, (b)Ground-truth map.

Figure 4 .
Figure 4. Structural diagram of improved DeepLabV3+.Within the encoder, the lightweight EfficientNetV2-S network is employed as the backbone.A heterogeneous receptive field splicing ASPP (HRFS-ASPP) module is innovatively proposed for the ASSP module.It interactively connects multiple convolutional layer branches with different expansion rates to improve the correlation of features between each branch.Ordinary convolutions are replaced with depthwise separable dilated convolutions.This choice mitigates the increase in model parameter count and computation load resulting from the fusion of disparate receptive fields, thereby accelerating the training speed of the improved DeepLabV3+.Subsequently, the stacked feature maps from various branches are input into a self-attention-based feature enhancement module, thereby attaining fortified deep-level feature maps.During the decoding phase, an attention-based feature alignment module guides the alignment of high and low-level features, diminishing the noise generated during their fusion and augmenting feature learning capacity.From the backbone, two branches of shallow-level features are selected.These features undergo CBAM processing, followed by upsampling and concatenation.This approach meticulously mines both channel and spatial features from the foundational cloud images' shallowlevel input features.These refined features, combined with deep-level feature maps, are then channeled into feature alignment module based on attention mechanism (A-FAM), utilizing crosslayer connections to harness the information borne by shallow-level features, further enriching semantic and detailed image information.After feature refinement through a 3x3 convolution, bilinear interpolation upsampling is utilized to restore feature map dimensions to the original image size, mitigating the loss of certain feature information due to excessive sampling strides.

Figure 5 .
Figure 5. Structural diagram of feature alignment module based on attention mechanism (A-FAM).

Table 3 .
The detection performance evaluation index results of the model.