Adaptive convolutional sparsity with sub-band correlation in the NSCT domain for MRI image fusion

Objective. Multimodal medical image fusion (MMIF) technologies merges diverse medical images with rich information, boosting diagnostic efficiency and accuracy. Due to global optimization and single-valued nature, convolutional sparse representation (CSR) outshines the standard sparse representation (SR) in significance. By addressing the challenges of sensitivity to highly redundant dictionaries and robustness to misregistration, an adaptive convolutional sparsity scheme with measurement of the sub-band correlation in the non-subsampled contourlet transform (NSCT) domain is proposed for MMIF. Approach. The fusion scheme incorporates four main components: image decomposition into two scales, fusion of detail layers, fusion of base layers, and reconstruction of the two scales. We solved a Tikhonov regularization optimization problem with source images to obtain the base and detail layers. Then, after CSR processing, detail layers were sparsely decomposed using pre-trained dictionary filters for initial coefficient maps. NSCT domain’s sub-band correlation was used to refine fusion coefficient maps, and sparse reconstruction produced the fused detail layer. Meanwhile, base layers were fused using averaging. The final fused image was obtained via two-scale reconstruction. Main results. Experimental validation of clinical image sets revealed that the proposed fusion scheme can not only effectively eliminate the interference of partial misregistration, but also outperform the representative state-of-the-art fusion schemes in the preservation of structural and textural details according to subjective visual evaluations and objective quality evaluations. Significance. The proposed fusion scheme is competitive due to its low-redundancy dictionary, robustness to misregistration, and better fusion performance. This is achieved by training the dictionary with minimal samples through CSR to adaptively preserve overcompleteness for detail layers, and constructing fusion activity level with sub-band correlation in the NSCT domain to maintain CSR attributes. Additionally, ordering the NSCT for reverse sparse representation further enhances sub-band correlation to promote the preservation of structural and textural details.


Introduction
The diversity of imaging mechanisms in modern clinical practice of medicine is essential and extensively utilized in disease diagnosis and radiation therapy (Du et al 2016).Considering that a single form of imaging tends to be unable to effectively characterize the symptoms of different diseases, doctors generally need to diagnose a patient's condition by comprehensively analyzing different categories of organ/tissue information at the same layers, and features can be adaptively screened out with changes of sample types, which avoids prior knowledge of externally precollected data and has good generalization ability for other fusion types.
(2) The sub-band correlation is proposed to measure the fusion activity level from the global level in the NSCTbased MST domain, where both the sub-band correlation and the NSCT are shift-invariant to keep the attributes of CSR, and this is beneficial for ameliorating the effect of partial misregistration on fusion performance.
(3) Furthermore, this is the first time for the multi-scale transform (MST) to realize sparse reconstruction in reverse order, which not only increases the sparsity but also allows features classification to enhance correlation of the corresponding sub-bands at same level, and this further improves the characterization ability of the sub-band correlation and is conducive for raising the accuracy of the fusion result.
The remaining sections of this study are organized as follows.Section (2) presents a concise explanation of convolutional sparse representation theory.The fusion schemes in detail are outlined in section (3).Section (4) provides the experimental results and accompanying discussion.Finally, section (5) concludes the study.

Convolutional sparse representation theory
When utilizing sparse representation techniques for images, it is common practice to compute the representations of overlapping image patches independently, and the corresponding objective function is defined as where Î y R m represents the vector mode of an overlapping image patch indicates an over-complete dictionary, a represents the sparse vector,   • 2 2 represents the inner product of the vector, and l represents the regularization parameter of the penalty term ( ) a R .This approach has proven to be highly effective in various applications.However, it leads to a multi-valued representation that is not optimized for the entire image.
In contrast, CSR techniques offer an alternative representation in a convolutional form.These techniques involve taking a sum of convolutions between coefficient maps s r m and their corresponding dictionary filters to approximate an image I : where r represents the rth image sample, and * represents the convolution operator.To ensure accurate representation, a constraint is imposed on the norms of the filters d , m which helps avoid any scaling ambiguity between the filters and coefficients.The conventional approach to convolutional sparse representation is via iterative minimization with respect to coefficients and the dictionary.
The sparse coding algorithm has a sparsity-inducing penalty in the form of   s , 1 i.e. the convolutional-basis pursuit denoising (CBPDN) problem (Barajas-Solano et al 2019), and it is defined as 3) can be modified by incorporating an additional variable that is subject to a constraint, and iterations of the alternating-direction method of multipliers (ADMM) are performed to solve the following optimization: The corresponding dictionary update algorithm, such as the convolutional form of the method of optimal directions (MOD) (Cai et al 2016), is defined as To ensure that the filters obtained from the optimization in equation ( 5) have a suitably limited support in the spatial domain, the dictionary norm constraint set is utilized.Here, P represents a zero-padding operator.Next, by incorporating a constrained auxiliary variable into equation (5), the ADMM algorithm iterates to solve the following optimization problem: where the indicator function i C PN of the constraint set C PN is defined as Given the iterative update algorithms for s r m and d m (i.e. the ADMM algorithms for the CBPDN and constrained MOD, respectively), in order to construct a complete dictionary-learning algorithm, the commonly employed approach is to alternate between sparse coding and dictionary updates.Prior to proceeding to the dictionary update, it is crucial to accurately solve the sparse coding problem.In this algorithm, a single iteration encompasses the sequential updates in equations (3)-( 6).
More details on the convolutional sparse representation algorithm are available in Wohlberg (2016).

Proposed fusion scheme
The general structure of the proposed scheme for multimodal medical image fusion (we take M = 32 as an example) were learned by the CSR algorithm (Wohlberg 2016) with the pre-registered source images [ as the training samples.Simultaneously, the source images were employed as test samples for evaluation.The proposed scheme comprised following four main steps.

Two-scale image decomposition
To begin, each source image I r undergoes a decomposition process, separating it into a base layer B r and a detail layer D .
r The base layer is obtained by solving the subsequent optimization problem: y T are the horizontal and vertical gradient operators, respectively.The Tikhonov regularization problem arises, which can be effectively solved by utilizing the fast Fourier transform (FFT).Given B , r the detail layer is subsequently obtained via subtraction: Many image fusion schemes, including (Li et al 2013b), have successfully applied this kind of two-scale decomposition approach.

Fusion of the detail layers
To obtain the sparse coefficient maps for each detail layer D , r the CSR model is solved using equation (3) with the method described in Wohlberg (2016).Since the MST essentially uses low-resolution components to analyze the approximate features of an image, and it uses high-resolution components to analyze the detailed features, namely, according to different scales and different resolutions, different decomposition levels contain different characteristics of an image.Therefore, the MST can be understood as a feature classification process, and the fusion of similar features at the same decomposition level can improve the accuracy of the fusion result.Furthermore, Li conducted a comparative analysis of various MST methods and concluded that NSCT yielded the best performance in multimodal medical image fusion (Li et al 2011).Thus, we first take the NSCT-based MST to decompose each map into different scales and resolutions: where f NSCT K represents the NSCT decomposition operator with K levels.s D L r represents the low-frequency sub-band component, and K s , , s are the high-frequency sub-band components with 2 K directions.The NSCT low-frequency sub-band components s D L r of the corresponding sparse coefficient maps are fused by using the averaging strategy as follows: The sub-band correlation strategy, i.e. the correlation of sub-bands in adjacent directions with the same scale of the sparse coefficient map, is used for the NSCT high-frequency sub-band components s D H k r ( Î k K) as the activity level measure.The sub-band correlation strategy is defined as where s D H Once the fused NSCT low-frequency and high-frequency sub-bands are obtained, the reverse NSCT is employed to generate the fused sparse coefficient maps Finally, the fusion result of the detail layers is achieved using

Fusion of the base layers
Considering the typically prominent variations in brightness and texture between corresponding positions in multimodal medical source images, it becomes necessary to address these differences, while the fusion results obtained with the conventional 'choose-max' strategy might cause visual inconsistencies with small changes in partial pixels.Therefore, we adopt an averaging strategy for the fusion of the base layers.The fusion outcome of the base layers is successfully achieved by utilizing

Two-scale reconstruction
Once the fused detail layer I D F r and fused base layer I B F r are available, the final fused image I F can be reconstructed.This is accomplished by employing

Experimental results and analysis
In this section, we thoroughly assess the performance and demonstrate the effectiveness of our proposed multimodal medical image fusion scheme.A comparative evaluation is conducted in relation to other notable fusion techniques to highlight the advantages of our approach.
(1) Competitive ablation experiments were performed with the principle of single-variable control to verify the effectiveness of the proposed activity level measure.
(2) We analyzed the impact of misregistration on the fusion performance by comparing the proposed scheme with competitors around misregistered edge regions.
(3) Subjective visual evaluation and objective metrics were used to evaluate the performance of the proposed scheme.

Experimental setup
In the experiments, 116 pairs of multimodal medical images that were originally obtained from three image sets-'Acute stroke' (28 pairs), 'Hypertensive encephalopathy' (28 pairs), and 'Multiple embolic infarctions' (60 pairs)-of the whole-brain atlas medical image (WBAMI) dataset from 'http://med.harvard.edu/aanlib/home.html' were used for testing.Part of the source images with a spatial resolution of 256 256 are shown in figure 2. Four popular objective metrics, namely, the entropy (EN), the union entropy (Q U ) (Cao and Wang 2005), the mutual information (MI) (Qu et al 2002), and the structural similarity (SSIM) (Wang et al 2004), were used to evaluate the fusion performance, and the default parameter settings can be found in related publications.EN was employed as a measure to assess the information content present in the fused images.Q U , on the other hand, served as an evaluation metric to gauge the structural correlation between the fused images and the source images.MI indicated the amount of information transferred from the source images to the fused images.SSIM was used to characterize the structural similarity between the fused images and the source images.A higher value indicates superior fusion performance for each of the four metrics.
Inspired by the CSR-based scheme (Wang et al 2021) and CSMCA-based scheme (Liu et al 2019), the proposed scheme was primarily compared with these two schemes to verify its advantages.Meanwhile, competitive ablation experiments with different activity level measures were used to investigate the effectiveness of the proposed activity level measure.Furthermore, we took six representative medical image fusion methods, i.e.NSST-PAPCNN (Yin et  , where NSST-Proposed scheme replaced NSCT with NSST in the proposed scheme.In the CSR-based scheme, the dictionary used for learning was obtained from a set of 50 high-quality natural images, each with a size of 256 256.In the CSMCA-based scheme, the dictionary was learned from 60 cartoon images and 60 texture images.By contrast, in the proposed scheme, pre-trained dictionary filters were obtained through convolutional sparse representation with the source images as training samples, and the spatial dimensions of each dictionary filter were set to 8 8, and the dictionary filters were set to 32, and this was in accordance with the CSR and CSMCA.The parameter l was set to 0.01, which was in accordance with (Wohlberg 2016).All of the experiments were conducted in MATLAB R2017a with a 3.3 GHz CPU and 16.0 GHz of RAM and the 64-bit Win7 operating system.

Ablation experiments with different activity level measures
Experimental verification was conducted to assess the impact of the proposed activity level measure with subband correlation in the NSCT domain on fusion performance, and the activity level measures with the pure subband correlation and the L1-norm in the NSCT domain were used as the ablation competitors.With the principle of single-variable control, the proposed activity level measure was compared with the pure sub-band correlation to confirm the influence of the NSCT-based multi-scale transform on the fusion performance of the proposed scheme, and the proposed activity level measure was compared with the L1-norm in the NSCT domain to confirm the influence of the sub-band correlation on the fusion performance of the proposed scheme.The comparison of the fusion results with the pure sub-band correlation and with the proposed activity level measure showed that the latter enabled much more structural information to be retained, which is reflected by the position of the red arrow in figures 3(c) and (d).Thus, the NSCT-based multi-scale transform was able to effectively promote the retention of the structural information in the proposed scheme.The comparison of the fusion results with the L1-norm in the NSCT domain and the proposed activity level measure showed that the latter enabled much more textural information to be retained, as reflected by the position of the green arrow in figures 3(d) and (e).Thus, the sub-band correlation was able to effectively promote the retention of textural information in the proposed scheme.In conclusion, the proposed activity level measure had advantages in terms of retaining more structural and textural information.

The impact of misregistration on fusion performance
To verify that the proposed scheme was robust to partial misregistration, we took the same example of the multifocus experiments described in Yu et al (2016), as shown in figure 4. Meanwhile, three fusion schemes-CSMCA, CSR, and sub-band correlation were adopted as competitors of the proposed scheme.The window-based averaging strategy was used in CSMCA and CSR to make them insensitive to misregistration, while the strategy reduced the shift-invariance of image transformation, and this, in turn, produced ringing artifacts in the partial misregistered region, namely, the pseudo-Gibbs phenomenon.Furthermore, it was difficult for the artificially controlled window size to be used to attain the ideal robustness to misregistration.It is reflected in figures 4(c) and (d) that ringing artifacts were found at the position of the red arrow around the partial misregistered edge regions.In the sub-band correlation strategy with shift-invariance, the correlation of adjacent sub-bands was employed with no artificial control to ameliorate the robustness to misregistration.It can be verified in figure 4(e) that fewer artifacts existed around the partial misregistered edge regions.By employing the strategy of constructing fusion activity levels with sub-band correlation in the NSCT domain, where both of them are shiftinvariant, the proposed fusion scheme consistently preserves convolutional sparse representation attributes of global optimization and single-valued nature to ameliorate partial misregistration effects.As shown in figure 4(f), it can be observed that the proposed scheme outperformed the three competitors, particularly in the partial misregistered edge regions.

Comparison with other fusion schemes
Both subjective visual evaluations and objective metrics were employed to evaluate the effectiveness of the proposed scheme.We randomly selected eight groups of clinical multimodal medical images from three image    Detailed analysis focusing on the amplification of representative regions was conducted, as shown in figures 6-10, respectively.The fusion results of the proposed scheme and the competitors on three types of CT/MRI images, i.e.CT/ MRI, CT/MR-T1 and CT/MR-T2, are shown in figures 6-8, and the results of zooming in on the selected areas shown in red and green boxes are provided for a better comparison.In the field of medical imaging, computed tomography technology is able to precisely capture hard tissues and structures (such as bones and implants) with high resolution, whereas magnetic resonance imaging demonstrates a more acute ability to    f1), (h1) and (b2)-( f2), (h2), for example).By contrast, the proposed scheme evidently outperforms the competitors in terms of preserving the structural textures and intricate details, and NSST-Proposed, as an ablation experiment for the proposed scheme, clearly has deficiencies in maintaining effective information and keeping contrast-resolutions in medical imaging (see figures 6(h1)/(i1) and 7(h1)-(i1), for example).
For more comprehensive evaluation, the fusion results of the proposed scheme and the competitors on two types of MR/SPECT and MR/PET images, i.e.MR-Gad/SPECT-T1 and MR-T2/PET-FDG, are shown in figures 9-10, and the results of zooming in on the selected areas shown in red and green boxes are provided for a better comparison.The combined advantages of MRI and SPECT/PET are that MRI provides high-resolution anatomical structural information, while SPECT/PET can reflect physiological metabolic processes.The fusion of the two will provide doctors with more comprehensive and accurate diagnostic information.From the comparison experiments, we can conclude that the NSST-PAPCNN-, LRD-, and NSST-MSMG-PCNN-based schemes still have the problem of local blurring of structural outline on MR/SPECT and MR/PET treatments (see figures 9(c1)-(e1) and 10(c1)-(e1), for example).Compared to the subpar performance in dealing with CT/ MRI, the CSR-, CSMCA-, CNN-and NSST-Proposed-based schemes are able to retain more effective information when dealing with MR/SPECT and MR/PET (see figures 10(a1), (b1), ( g1) and (h1), for example).By contrast, the proposed scheme still keeps competitive in preserving the structural textures and intricate details. (see figures 9(i2) and 10(i2), for example).
According to the subjective comparisons, the proposed fusion scheme can not only retain integral structural and detailed information of the source images, but also perform better to keep robustness of artificial interference.Consequently, it can be concluded that the proposed scheme outshined its competitors in terms of subjective visual performance.

Objective quality evaluation
When evaluating image fusion, relying on a single metric may lack objectivity.Hence, it becomes crucial to conduct a thorough analysis by considering multiple evaluation metrics.The comprehensive objective assessment of five datasets (CT/MRI, CT/MR-T1, CT/MR-T2, MR/SPECT, and MR/PET) is presented in table 1, which corresponds to the subjective fusion results shown in figures 6-10, respectively.In table 1, the best fusion performance for each fusion scheme is indicated by highlighting the highest value in each row, while the second and third highest value in each row is underlined.The MI index from multiple comparative experiments reveals that the proposed method not only offers competitive information retention in grayscale channels (CT/ MRI, CT/MR-T1, CT/MR-T2), but also exhibits more prominent advantages in color channels (MR/SPECT, MR/PET).Conversely, the competitors (NSST-PAPCNN, NSST-MSMG-PCNN and NSCT-PC-LLE) possess certain advantages in grayscale channels regarding information representation, yet their representation ability diminishes significantly in color channels, particularly for NSST-MSMG-PCNN.This conclusion is further supported by subjective evaluations depicted in figure 9.The SSIM index from multiple comparative experiments reveals that the fusion results of the proposed method excel the competitors in terms of structural similarity, particularly when compared to NSST-PAPCNN, NSST-MSMG-PCNN, and NSCT-PC-LLE.This conclusion is reinforced by the subjective evaluation conclusions regarding edge structure preservation in (a2)-(i2) of figures 6-10.
In order to more objectively reflect the fusion performance of the proposed scheme, the overall statistically objective fusion results obtained for the 116 pairs of multimodal medical images are shown in figure 11, and the calculated average scores for each metric across all test examples are displayed in parentheses.As a whole, when considering the average results for the four metrics, the proposed scheme outperformed its seven competitors in all three clinical categories of 'Acute stroke', 'Hypertensive encephalopathy', and 'Multiple embolic infarctions'.Meanwhile, a competitive ablation experiment with the NSST-Proposed scheme was carried out to verify that compared with NSST, NSCT is more suitable for the feature characterization in the medical imaging processes.Among the competitors, the proposed scheme demonstrated the most prominent advantages in terms of the MI metric.This suggests that the activity level measure in the proposed scheme played a vital role in retaining effective information of the source images.Therefore, a subjective analysis and objective evaluation indicate that the proposed scheme has significant advantages.
We conducted a computational efficiency comparison between the proposed scheme and the other eight competitors.The CPU running average times of different competitors when merging 116 pairs of source images with a size of 256 256 pixels are listed in table 2. The NSCT-PC-LLE-based scheme ran quickly because it used multi-scale sample transformation to decompose and reconstruct images.The computation of the CSMCAbased scheme was time-consuming, since both the base layer and detail layer involved the performance of sparse coding operations.The computational complexity of the proposed scheme was slightly increased in comparison with that of the NSST-Proposed scheme, since the non-down sampled shear wave of NSST does not restrict its direction and does not need to invert the direction filter bank, so its computational efficiency was higher than that of NSCT.To summarize, despite its elevated algorithmic complexity, the proposed method demonstrates robust subjective and objective performance in terms of texture information retention and structural similarity in both grayscale and color channels.This indicates its applicability and worthiness for multimodal medical image fusion.

Conclusion
In this study, the main motivation was addressing the challenges of sensitivity to a highly redundant dictionary and robustness to misregistration while ensuring improved fusion performance, as these issues plague conventional CSR-based MMIF schemes.An adaptive convolutional sparsity fusion scheme with the measurement of the sub-band correlation in the NSCT domain was proposed.The fusion scheme consisted of four main components: two-scale image decomposition, fusion of the detail layers, fusion of the base layers, and two-scale reconstruction.Firstly, we solved a Tikhonov regularization optimization problem to obtain the base and detail layers.Secondly, the detail layers were sparsely decomposed to obtain sparse coefficient maps with pre-trained dictionary filters, and the structured fusion activity level measure with sub-band correlation in the NSCT domain was employed to obtain fused sparse coefficient maps; the sparse reconstruction technique was employed to obtain the fused detail layer.Meanwhile, the corresponding base layers were fused by using an averaging strategy.Ultimately, the fused images were obtained through the process of two-scale reconstruction.
Experimental results across different clinical medical image categories further confirmed the effectiveness of the proposed scheme in preserving the structural textures and intricate details, while the deficiencies of reduced brightness information and low scores of EN and Q U indicate that the follow-up study of this work focuses on improving the contrast of the fusion results on the basis of maintaining the complete information retention.

Figure 1 .
Figure 1.The framework of the proposed fusion scheme.
-bands in the kth and k′th directions, respectively., I and J represent the sizes of the MST sub-band coefficient maps.Then, the NSCT high-frequency sub-band components s D H k r of the corresponding sparse coefficient maps are fused by using * al 2018), LRD (Li et al 2020), NSST-MSMG-PCNN (Tan et al 2020), NSCT-PC-LLE (Zhu et al 2019), CNN (Zhang et al 2020), NSST-Proposed as competitors

Figure 2 .
Figure 2. Source images used in the experiments.

Figure 3 .
Figure 3.The impacts of NSCT and sub-band correlation on the fusion performance of the proposed scheme: (a) source image I 1 , (b) source image I 2 , (c) sub-band correlation, (d) sub-band correlation with NSCT, (e) L1-norm with NSCT.

Figure 6 .
Figure 6.Fusion results of CT/MRI image pairs with different fusion schemes.

Figure 7 .
Figure 7. Fusion results of CT/MR-T1 image pairs with different fusion schemes.

Figure 8 .
Figure 8. Fusion results of CT/MR-T2 image pairs with different fusion schemes.

Figure 9 .
Figure 9. Fusion results of MR-Gad/SPECT-T1 image pairs with different fusion schemes.

Figure 10 .
Figure 10.Fusion results of MR-T2/PET-FDG image pairs with different fusion schemes.

Figure 11 .
Figure 11.Overall objective statistical results of the different fusion schemes.

Table 1 .
Objective assessment of the different fusion schemes.

Table 2 .
CPU running times of different schemes when merging two source images with a size of 256 256 pixels.