Multi-scale reciprocal consistency learning for semi-supervised left atrial segmentation

Accurate and enough labels are crucial to constructing an excellent deep-learning framework in processing medical scans. However, the expensive cost of medical labels greatly impedes this process. In this case, semi-supervised learning has shown great potential due to its efficient use of unlabeled data. Therefore, we present a novel multi-scale reciprocal consistency network (RC-NET) to utilize the unlabeled data more efficiently for more accurate 3D left atrial segmentation. Our model consists of a common encoder and two independent decoders with inconsistent up-sampling. The decoders generate feature maps of hidden layers at different resolutions during the up-sampling process. Lower-resolution features typically contain local and detailed information, while higher-resolution features involve global or abstract information. Subsequently, we applied consistency learning to these feature maps. By combining local and global semantic information, the model can obtain a comprehensive understanding of the segmentation targets. The experimental data indicate the MRC-Net outperforms many semi-supervised learning methods. It achieves more accurate segmentation results by efficiently utilizing unlabeled data. This provides a completely new approach to improving semi-supervised learning.


Introduction
Many patients with heart disease are constantly under the threat of death.Because this greatly improves the likelihood of heart stroke, the risk of deterioration in the patient's condition is extremely high.Catheter ablation is a current standard treatment plan; however, its clinical results are not very satisfactory, and the lack of modeling of the topology of the patient's left atrium (LA) is the main reason.Therefore, it is crucial to perform three-dimensional modeling and segmentation of the imaging scans of the patient's lesions.It is necessary to obtain the topology of LA, assisting clinicians in diagnosing diseases and making patient-specific treatment plans.
Recent deep learning-based approaches have shown outstanding performance and can provide clinical guidelines for clinicians.However, due to the extremely expensive and laborious of obtaining medically dense annotations, many neural networks, which usually have millions of parameters, are insufficiently trained on many medical segmentation tasks to lead to unsatisfactory results.Fortunately, some semi-supervised learning-based methods still obtain acceptable results on many tasks using a few labeled samples and some unlabeled samples where the latter is easily available data.
IOP Publishing doi:10.1088/1742-6596/2770/1/012019 2 At present, the popular semi-supervised method is consistency regularization [6].Its core method involves minimizing the discrepancy between unlabeled data and its perturbed output, which is mainly based on two assumptions.The first is the smoothness assumption: when inputting small perturbations, there should not be a significant deviation between the model outputs.Another assumption is the clustering assumption, which states that samples of the same class may tend to cluster in the same cluster.
Many consistency regularization approaches have been proposed for establishing a more efficient model for unsupervised utilization.Yu [6] designed an uncertainty-aware framework, which gradually learns from more reliable targets by filtering out high-uncertainty information, leading to significant improvements.Li [2] designed an alternative constraint, in which novel shape constraints are applied to the segmentation results.Additionally, they developed a multi-task deep network and introduced adversarial learning to guide the model output more accurate segmentation results.Luo [4] minimizes the discrepancy between each pyramid prediction and their average to enhance semi-supervised learning.Furthermore, multi-scale uncertainty rectification is introduced to encourage the model to become more confident.Bai [1] applied the COPY-PASTE operation on the UA-MT model, pasting objects of different scales onto new background images to address the problem of imbalanced sample distribution.
The above-mentioned method has achieved remarkable achievements in improving the utilization of unlabeled data in consistency learning by introducing new constraints or introducing new stimuli.However, they often overlooked the importance of increasing the tightness of consistent connections.Epistemic uncertainty is used to assess whether inputs exist within the data's distribution already encountered.All these areas with fuzzy and adhesive boundaries often yield high-uncertainty results.At the same time, increasing the training data can lead to a reduction in uncertainty.Therefore, reducing the model's uncertainty can to some extent simulate the generalization process of model training.Uncertainty exists not only in the final output of the model but also in the transmission of features in the hidden layers.By imposing constraints on the uncertainty during its transmission, different influences can be imposed on the various stages of model training.
According to these conditions, we design the multi-scale Reciprocal Consistency Learning to enhance the utilization of unlabeled data.We have set up two separate decoders, each of which outputs an independent feature map after up-sampling at each layer.We model the epistemic uncertainty by comparing the discrepancies between the feature maps.The obtained uncertainty will generate unique effects on model training based on the varying resolutions and depths of its feature maps.For instance, the deep uncertainty contains more high-level semantic information, providing more overall architectural semantic information during training.On the other hand, shallow uncertainty contains more high-resolution pixel-level details, significantly improving the model's understanding of fine details and boundary shapes.Our method significantly enhances the utilization of uncertainty and deepens the consistency of training.Our model has been greatly validated on the LA database.Experimental data demonstrates that our MRC-Net greatly enhances the accuracy of medical image segmentation, outperforming several SOTA consistency learning methods.The main contributions can be summarized as three folds: 1. We designed a brand-new multi-scale reciprocal consistency network to make more efficient utilization of unlabeled data.2. We have pioneered a novel approach to enhance consistency learning by performing consistency learning on multi-scale feature mappings, thereby deepening the connectivity depth.3. Our MRC-Net model has achieved a big improvement in image segmentation on semisupervised left atrial segmentation tasks.

Overall framework
Figure 1 shows the multi-scale reciprocal consistency network's (MRC-Net) overall framework.We

Multi-scale reciprocal consistency learning
We set a shared encoder that extracts features of different resolutions from multiple hidden layers, which are then transferred through skip connections to two independent decoders.After that, an extra layer of dropout is added to the encoder's output, ensuring that the highest-level semantic features received by the two independent decoders are highly randomized.Meanwhile, as one decoder employs the original V-Net's up-sampling method while the other decoder uses tri-linear interpolation, the feature maps outputted by the two decoders at the same layer will constantly have differences as the up-sampling progresses.By measuring such differences, epistemic uncertainty can be estimated within a single forward pass.This approach, compared to the conventional uncertainty measurement methods like Monte Carlo dropout that require multiple forward passes for computation, has significantly improved efficiency.We approximate epistemic uncertainty by utilizing the feature maps obtained from up-sampling at each layer of two decoders.Furthermore, the proposed multi-scale reciprocal consistency learning attempts to exploit semantic information at different levels rather than just the final outputs.In medical image segmentation, highresolution pixel information is crucial for accurately identifying subtle structures of tissues and boundaries of lesions.Meanwhile, high-level semantic information can aid the model in comprehending the spatial distribution of functional regions, organ structures, and pathological areas within the images.As the depth of the model increases, the presence of high-level semantic IOP Publishing doi:10.1088/1742-6596/2770/1/0120194 information in the features often increases while the high-resolution pixels decrease, and vice versa.Therefore, we utilize the uncertainty constructed from feature maps at different resolutions to learn both the overall structural-semantic information and detailed boundary semantic information.By employing this approach, we achieve precise localization and segmentation of different tissue types or pathological regions.UDA has proved the effectiveness of its sharpening function.In this way, we reduce the entropy of the outputs and ambiguous prediction results so that the soft pseudo labels will provide more definitive guidance.The sharpening function is defined as:

Loss function
where σ is utilized to control the sharpness of the function curve.By setting an appropriateσ , we can avoid getting a flat prediction distribution when there is little labeled data.In addition, our model is regularized by entropy minimization, while sharper probability distribution would not disturb the model training.
We input all labeled and unlabeled data into the model training.Afterward, we conduct consistency training between one soft pseudo label and the probability map output by another decoder.Additionally, we perform consistency learning between the feature maps output by each layer of the two decoders.By employing this approach, we reduce the discrepancies between the feature maps at each layer and between the final predictions.This guides the model to produce consistent probability maps in regions with high epistemic uncertainty.The probability consistency loss pc L and the feature loss are defined as:

Database
The MRC-Net is verified on the 2018 Atrial Segmentation Challenge dataset with other methods.In [6], we use 80 of them for training and 20 for testing.

Implementation details and metrics
All networks are implemented in PyTorch 1.9.2+cuda11.6and Python 3.8.5, using an NVIDIA Tesla V100 GPU.For preprocessing, we crop the central area of the heart to remove the redundant margins.The model undergoes 16,000 times iterations with two annotated scans and two unannotated scans per iteration.Then, we set the temperature constant of the sharpening function as 0.1.
During testing, we extract patches sequentially and recompose these patches as final results.The random seeds are fixed so that the whole experiment is under the same conditions.

Figure 2.
Several examples of results obtained by UA-MT [6], SASS Net [2], DTC [3], MC-Net [5], and our MRC-Net on the LA dataset, respectively.As shown in Figure 2, MRC-Net has made the most superior segmentation results compared to several other methods when comparing multiple segmentation results.Our model provides more detailed handling in challenging areas where other models produce unclear or erroneous results.Compared to other models that tend to generate excessive false positive regions, our model demonstrates fewer occurrences of this issue.Its boundaries align more closely with the ground truth labels.This model comprehensively captures both detailed semantic information and overall structural semantic information.Table 1 shows that MRC-Net has achieved great improvement.The most crucial Dice metric exhibits a significant improvement when using 10% labeled data.When using 20% of the data, the improvement slightly decreases due to its proximity to the upper bound but still maintains a substantial lead over other methods.Additionally, the remaining metrics also show significant improvement.Without introducing any additional post-processing operations, our model, leveraging its excellent structural design, enhances the efficiency of utilizing unlabeled data.The ablation studie s (see Table 2) are conducted by seve ral de tailed e xpe rimental studie s to e x a m i n e e a c h c o m p o n e n t .W e f i r s t m a k e t h e o r i g i n a l V -N e t a s a b e n c h m a r k .A f t e r w a r d , w e incorporated additional components into the model, including a V-Net (V2D) with two independent decoders, a sharpening function (SF), and multi-scale reciprocal consistency (MR).We can see that the MR module has made the most significant improvement in model performance.This demonstrates that our model has successfully extracted sufficient information from multi-scale feature maps, leading to a clear understanding of the overall topology of the segmentation targets.

Conclusion
In this paper, we propose a multi-scale reciprocal consistency network (MRC-Net) for segmentation.Feature maps at different resolutions contain complementary semantic information.By imposing reciprocal consistency on these feature maps, they can learn from each other's strengths, thereby capturing comprehensive semantic information.We test our approach on the LA database, which shows that MRC-Net has enhanced the utilization of unlabeled data greatly.
V-Net and additionally embed a similar decoder with a different up-sampling method, wherein the shared encoder extracts multistage hidden representations.The two independent decoders output different feature maps by applying dropout and different up-sampling methods.We can model the epistemic uncertainty by measuring the distance of the discrepancies between these maps.Ultimately, we have effectively utilized unannotated data through multi-scale reciprocal consistency learning.Partially annotated data will refine this model through absolute supervised learning.

Figure 1 .
Figure 1.Diagram of our proposed multi-scale reciprocal consistency network.
For 3D medical images, we are provided with l N annotated samples and u N unannotated samples.The annotated samples are represented as truth.The output distribution of unannotated samples is often ineffective, especially when the problem is difficult or the available annotations are extremely limited.These indecisive predictions often have bad effects when used as a pseudo-label to guide model training.So, the sharpening function is utilized to convert the output map ( The epistemic uncertainty of the √ is the MSE loss.N is the maximum number of layers used for feature mapping.The role of  and α is to adjust the proportion of each loss function.Whereas only the fully supervised loss applies to labeled data, the other losses apply to the entire dataset.In addition, we implemented an Exponential Moving Average (EMA), giving greater weight to fully supervised training in early training stages and gradually magnifying the proportion of consistent training as training progresses.

Table 1 .
Comparisons with five recent SOTA networks in LA.

Table 2 .
Ablation studies of our proposed RC-Net on the LA dataset.