Improving polygon image segmentation by enhancing U-Net architecture

The crucial task of polyp recognition in medical imaging plays a pivotal role in the early detection and prevention of colorectal cancer. Semantic segmentation, particularly utilizing sophisticated deep learning models such as U-Net, has demonstrated promising results in the realm of polyp segmentation. However, the traditional U-Net structure sometimes grapples with accurately delineating the edges of polyps, which subsequently impacts the overall performance of segmentation. To address this issue, the current study introduces a novel approach by proposing a modified framework of U-Net, equipped with an enhanced edge loss function. This function is designed to ameliorate the accuracy of segmentation within polyp images. The aim is to elevate the model’s capacity to capture intricate details, specifically the edges, which is an area where standard U-Net structures often falter. Experimental outcomes of this study serve to underscore the effectiveness of the proposed approach in accomplishing superior segmentation of edges and improved overall performance in polyp recognition. By successfully tackling the challenges inherent to polyp edge segmentation, the modified U-Net model contributes significantly towards more precise diagnostic systems in the field of medical imaging. Consequently, this research is poised to make a valuable contribution to advancements in the prevention and early detection of colorectal cancer.


Introduction
Colorectal cancer represents one of the most common forms of cancer globally, and its early detection is key to enhancing patient prognoses and survival rates.For examining the colon and identifying polyps, which are potential precursors to colorectal cancer, medical imaging techniques such as endoscopy and colonoscopy are widely employed [1].Automation in the segmentation of polyps within these images can assist in making accurate diagnoses, planning effective treatments, and tracking patient progress.Semantic segmentation, a computer vision task that imparts a semantic label to each pixel in an image, has demonstrated encouraging results in the recognition and segmentation of polyps.As shown in Figure 1.
The U-Net architecture, a deep learning model extensively used for semantic segmentation, has proven to be effective in an array of medical imaging tasks [2].This model capitalizes on a symmetric encoder-decoder structure with 'skip connections', which enable the capture of both local and global features while maintaining spatial information [3].However, despite its success, traditional U-Net models can struggle with accurately capturing the complex edges of polyps [4].Precise localization and segmentation of polyp boundaries are imperative for accurate diagnosis, as irregularities and subtle variations in polyp shape and appearance can provide valuable diagnostic clues to medical professionals.
The importance of this research lies in its potential to propel the field of polyp recognition forward by enhancing the precision and dependability of semantic segmentation.Precise polyp segmentation can aid in detecting early-stage lesions, reducing false-positive rates, and assist in the assessment of polyp malignancy.Furthermore, the proposed approach has the potential to streamline the workflow of medical practitioners by automating the laborious and error-prone process of manual segmentation.This not only improves efficiency but also reduces subjectivity in the interpretation of medical images [5].

Overview of polyp recognition in medical imaging
Numerous studies have explored different methodologies for polyp recognition and segmentation in medical imaging [6].Traditional image processing techniques, such as thresholding, region growing, and edge detection, have been widely employed to detect and segment polyps.However, these methods often struggle to handle the inherent challenges associated with polyp shape variability, texture variations, and noise present in the images.

Introduction to semantic segmentation and U-Net
In recent years, deep learning techniques have gained significant attention in medical imaging, showcasing remarkable performance inpolyp recognition tasks.Convolutional Neural Networks (CNNs) have been employed to automatically learn discriminative features from polyp images.Fully Convolutional Networks (FCNs) and U-Net, in particular, have emerged as popular choices for semantic segmentation tasks due to their ability to capture both local and global contextual information [7].

Limitations of U-Net in capturing polyp edges
Although U-Net has achieved notable success in various segmentation tasks, including polyp recognition, it may encounter challenges in accurately capturing the intricate edges of polyps.The primary reason for this limitation lies in the encoder-decoder architecture of U-Net, which involves multiple pooling and upsampling operations.These operations tend to cause information loss and result in blurred boundaries in the segmentation output [8].Consequently, the precise localization of polyp edges becomes challenging, hindering the accurate representation of the irregular and complex polyp shapes.
To address this limitation, researchers have proposed various techniques to enhance edge segmentation in semantic images.One approach involves the integration of additional modules or sub-networks specifically designed to enhance edge detection and localization.For instance, Dilated Convolutional Networks (DCN) have been utilized to capture multi-scale information and improve edge localization.These approaches have shown improvements in edge segmentation but may introduce additional complexity to the network architecture [9].

Existing techniques for improving edge segmentation in semantic images
Another strategy to address the edge segmentation issue is to incorporate additional loss functions that explicitly supervise the network's learning process towards capturing precise boundaries.By incorporating edge-aware loss functions, such as boundary loss or edge loss, the network can be encouraged to focus more on preserving edge details during the training process [10].These techniques have demonstrated promising results in various image segmentation tasks, including medical applications, by improving the delineation of object boundaries.

Dataset preparation
The utilized dataset originates from the Internet.Given that the majority of medical datasets are not publicly available, and some of the published datasets present limited data, this dataset adequately fulfills experimental requirements with clear imagery and large data volume.The datasets undergo preprocessing before further analysis.The raw dataset consists of polyp images and corresponding masks, each stored as separate files.Essential preprocessing steps are carried out to facilitate data handling and ensure model compatibility.Initially, the dataset is read and loaded, with images and masks stored in an array structure.By convention, filenames are kept consistent between images and their corresponding masks, allowing convenient association based on their filename indices.To ensure uniformity in the input data format, all images are resized to standardized dimensions of 256 by 256 pixels.Images are converted to the RGB color space to capture the full color information, while masks are transformed into grayscale images to effectively represent the polyp regions.The resizing process retains the aspect ratio of the images, adjusting them to the desired dimensions.To assess the performance and generalization capabilities of the proposed approach, the dataset is divided into training and validation subsets.Specifically, 700 samples are allocated for training, with the remaining 300 samples reserved for validation.This division guarantees a substantial training set size while dedicating a significant portion of the dataset to serve as an independent validation set for performance assessment.(Photo/Picture credit: Original).

Enhanced edge loss function
In an effort to explicitly guide the network to concentrate on accurate polyp edge capture, an enhanced edge loss function is incorporated into the proposed approach.This methodology involves adopting a weighted combination of the content loss and the edge loss to compute the total loss.The content loss assesses the overall segmentation accuracy, while the edge loss specifically targets edge localization.
The choice of weights assigned to these loss components is critical in balancing the influence of the content and edge information during training.To identify the optimal weight for the edge loss, experiments are conducted with different weight values, and the segmentation performance is evaluated.By systematically varying the weight parameter, the impact on the network's ability to accurately capture polyp edges can be analyzed.The goal is to pinpoint the weight value that produces the best segmentation results, striking a balance between overall content accuracy and precise edge localization.This process is demonstrated in Figure 2.

Training procedure
The modified U-Net with the enhanced edge loss function is trained on the prepared dataset.We utilize a suitable optimization algorithm, such as Adam or stochastic gradient descent (SGD), to minimize the combined segmentation loss and edge loss.The network is trained in an end-to-end manner, iteratively updating the model's parameters to optimize the overall objective function.

Evaluation metrics
In the assessment of polyp segmentation performance within semantic images, pixel accuracy and Intersection over Union (IoU) serve as key evaluation metrics.Pixel accuracy quantifies the percentage of pixels correctly classified in comparison to the total number of pixels.This metric provides a comprehensive evaluation of segmentation accuracy, reflecting the alignment quality between the predicted segmentation masks and the actual annotations.
Intersection over Union, also known as the Jaccard index, measures the overlap between the predicted segmentation mask and the actual mask.This metric is calculated by determining the ratio of the intersection between the predicted and actual regions to the union of these regions.IoU is particularly effective for evaluating the segmentation boundaries' quality and assessing the spatial alignment between the predicted and actual masks.The combined use of pixel accuracy and IoU offers a holistic understanding of segmentation performance.While pixel accuracy provides a global measure of accuracy, IoU presents a more detailed evaluation of the localization and boundary delineation capabilities of the segmentation approach.In unison, these metrics allow for an assessment of the effectiveness of the enhanced U-Net framework with an improved edge loss function in accurately segmenting polyps in semantic images.(Photo /Picture credit: Original).

Results and analysis
The weight parameter incorporated within the enhanced edge loss function takes on a pivotal role in balancing the significance of edge information and the accuracy of overall content.This is further elucidated in Figure 3. Throughout the experimental process, a variety of weight ratios were tested to assess their influence on segmentation performance.Upon assigning the weight ratio as 1.1:1, indicating a slightly heightened focus on edge information, it was observed that the model assigned a similar degree of importance to both the edge and interior regions.Consequently, certain areas beyond the edges of the target segmentation regions were mistakenly identified as part of the target region, resulting in diminished segmentation performance relative to the reference model.When the weight ratio was adjusted to 1.2:1, there was noticeable improvement in the lower half of the target segmentation area in comparison to the reference model.In ambiguous edge regions identified in the reference model, the implementation of the edge loss function with a specified weight resulted in a crisper and more precise segmentation result.However, with a further increase of the weight ratio to 1.3:1 or beyond, there was a clear deterioration in the quality of segmentation.The disproportionate emphasis on edges caused the model to overlook the central part of the target region, thereby leading to subpar segmentation outcomes.These observations underscore the significance of choosing an appropriate weight ratio for the edge loss function.It is imperative to strike an equilibrium between accurately capturing edge information and preserving the integrity of the central target region.Through these experimental findings, the weight ratio of 1.2:1 emerged as the most suitable configuration, demonstrating improved performance in terms of edge localization whilst maintaining the integrity of the central target region.The examination of varying weight ratios offers insights into the behavior of the model and aids in guiding the selection of the weight parameter in the enhanced edge loss function.It emphasizes the need to meticulously adjust the weight ratio to optimize segmentation performance, sidestepping the risk of underemphasis or overemphasis on edges, which could result in substandard segmentation outcomes.In the figure 5, the loss value exhibits an initial decreasing trend during the early training epochs, indicating the model's convergence towards optimal parameter values.Concurrently, the accuracy value demonstrates an overall increasing trend, reflecting the model's improved ability to make accurate predictions as training progresses.However, upon conducting numerous training epochs, we observed an unexpected behavior in the loss values.Both the training loss and validation loss experienced a sudden sharp increase, which was indicative of a problem arising from improper parameter settings in this specific experiment.Consequently, this phenomenon had a cascading effect on the accuracy metric, leading to a significant drop in accuracy.In the figure 6, validation loss and training loss are both continuously decreasing, and accuracy is also increasing, which is the best situation.In the figure 7, loss fluctuates greatly at the beginning, and then tends to decline in general, while accuracy generally shows an upward trend.The accuracy is low.that the accuracy of the best epoch does not significantly improve when the weight ratio is set at 1.1:1 compared to the baseline model.However, when the weight ratio is incremented to 1.2:1, a substantial improvement is noticed, with the accuracy rate nearing 0.6.This progression indicates that attributing a higher weight to the edge loss results in superior segmentation performance relative to the original model.
Interestingly, with a further increase of the weight ratio to 1.3:1 and 1.4:1, the accuracy of the best epoch begins to recede.When the weight ratio reaches 1.4:1, the accuracy noticeably plummets, dipping below 0.3.These observations align with the performance analysis of the predicted masks under various weight ratios, solidifying the relationship between the weight ratio and the accuracy of segmentation.This relationship is showcased in Table 1 and Figure 9.  Evaluation of the proposed approach involves the use of Intersection over Union (IoU) as an additional metric, complementing pixel accuracy.IoU measures the overlap between predicted segmentation masks and their corresponding ground truth masks.This section provides experimental results in the form of a table and a bar chart, demonstrating IoU values obtained at different weight ratios.The table and bar chart illustrate the correlation between the weight ratio and the corresponding IoU values, providing insights into the predictive accuracy of the model.Reduced loss values and increased IoU signify superior accuracy in segmentation predictions.Upon analysis, it is clear that a significant accuracy improvement is achieved when the weight ratio is set at 1.2:1.This observation aligns with prior sections where the weight ratio of 1.2:1 was found to yield optimal segmentation results.However, when the weight ratio strays from this ideal setting, model performance deteriorates.These findings underscore the vital importance of choosing a suitable weight ratio for the edge loss function to achieve top-tier accuracy in polyp segmentation.The tabular and graphical illustrations presented here provide a vivid depiction of the correlation between the weight ratio and resulting IoU values, further solidifying the efficacy of the proposed approach in boosting polyp segmentation accuracy.

Conclusion
In the exploration of polyp segmentation employing traditional U-Net neural networks, a persistent issue of blurred and indistinct edges in the resulting prediction masks was observed.To tackle this challenge, the hypothesis was formed that integrating an edge loss function might enhance the quality of segmentation results.A series of experiments were conducted to test this hypothesis, with careful examination of the resultant data and the produced prediction masks.The analyses endorse the efficacy of this approach, with the implementation of the edge loss function markedly enhancing edge delineation.It was found that the inclusion of the edge loss function substantively improved the quality of the segmented prediction masks, especially in terms of edge sharpness and smoothness.Visual inspection of the prediction masks demonstrated significant progress in accurately demarcating the intricate boundaries of the polyps.By focusing model's attention towards edge localization, the proposed method effectively addresses the issue of fuzzy edges often encountered in traditional U-Net-based polyp segmentation.The edge loss function acts as a regularization technique, compelling the network to emphasize the preservation of intricate edge details during training.These results underscore the effectiveness of this approach in enhancing the quality of polyp segmentation, particularly with regard to edge quality.
The significance of these findings is highlighted by their potential application in medical imaging, where precise edge localization is crucial for accurate diagnoses and treatment planning.In conclusion, this research validates that integrating an edge loss function within the U-Net framework successfully addresses the issue of blurred edges in polyp segmentation.This approach notably enhances edge sharpness and smoothness, leading to more precise and reliable segmentation results.These findings hold promise for advancing the field of computer vision in medical imaging and potentially improving the accuracy and effectiveness of polyp detection systems.

Figure 2 .
Figure 2.This flowchart describes the process of adding a loss function.(Photo/Picturecredit: Original).

Figure 3 .
Figure 3.The top figure shows the result without adding the loss function.The following figure shows the result of loss function with different weights added respectively.(Photo/Picture credit: Original).

Figure 4 .
Figure 4.The validation accuracy and validation loss of the model without edge loss function are shown in the chart (Photo/Picture credit: Original).In the figure4, a slight oscillation is observed in the curves, which is a common characteristic of the training process.Overall, the loss value exhibits a decreasing trend, indicating the gradual convergence of the model during training.Simultaneously, the accuracy value demonstrates an overall upward trend, signifying an improvement in the model's ability to make correct predictions.Upon conducting multiple training epochs, we observed a pattern where the training loss continued to decrease, while the validation loss started to increase.This observation indicated the onset of overfitting, a phenomenon where the model becomes overly specialized in the training data and fails to generalize effectively to unseen data.

Figure 5 .
Figure 5.The validation accuracy and validation loss of the model with added edge loss function and edge to content ratio of 1.1:1 is shown in the chart (Photo/Picture credit: Original).

Figure 6 .
Figure 6.The validation accuracy and validation loss of the model with added edge loss function and edge to content ratio of 1.2:1 is shown in the chart (Photo/Picture credit: Original).

Figure 7 .
Figure 7.The validation accuracy and validation loss of the model with added edge loss function and edge to content ratio of 1.3:1 is shown in the chart (Photo/Picture credit: Original).

Figure 8 .
Figure 8.The validation accuracy and validation loss of the model with added edge loss function and edge to content ratio of 1.4:1 is shown in the chart (Photo/Picture credit: Original).In the figure 8, loss generally tends to decline, while accuracy generally presents an upward trend.accuracy continues to fluctuate, which may be a problem of fitting or a problem with the data itself.The accuracy is low, with the best epoch being 0.23.This is further demonstrated in Figures 4-8, the prediction accuracy achieved across 50 individual training runs of the modified U-Net model, following the incorporation of edge loss functions with assorted weight ratios.A comparison of these figures with Figure 4 enables an assessment of the impact of fluctuating weight ratios on the accuracy of the highest-performing epoch.Analyzing Figures 5 to 8reveals that the accuracy of the best epoch does not significantly improve when the weight ratio is set at 1.1:1 compared to the baseline model.However, when the weight ratio is incremented to 1.2:1, a substantial improvement is noticed, with the accuracy rate nearing 0.6.This progression indicates that attributing a higher weight to the edge loss results in superior segmentation performance relative to the original model.Interestingly, with a further increase of the weight ratio to 1.3:1 and 1.4:1, the accuracy of the best epoch begins to recede.When the weight ratio reaches 1.4:1, the accuracy noticeably plummets, reveals

Figure 9 .
Figure 9.A bar chart with iou as the evaluation index [1].

Table 1 .
Weight adjustment and corresponding loss and IoU values.