Overcoming Catastrophic Forgetting with Detail-Degradation Decoupling Networks for Super-Resolution

Super-resolution reconstruction is a crucial technology that can significantly enhance image resolution and reconstruct high-frequency details. Existing methods are fine-tuned based on the application target during their usage, which enables them to be applied to new data domains. Nevertheless, the extent to which the fine-tuned model can generalize to the original data domain remains unclear, and therefore, additional research is warranted in this particular aspect. To solve the above problems, we analyzed the forgetting phenomenon of super-resolution methods under different degradation kernels, scenarios, and modalities. Then, we introduced Detail-Degradation Decoupling Networks, which separate the inverse process of degradation and the process of recovering high-frequency details in super-resolution. During the incremental learning process, the network parameters can be adjusted based on the task’s characteristics. Our experimental results demonstrate that our approach can significantly reduce catastrophic forgetting in super-resolution reconstruction and enhance the network’s ability to generalize and remain robust.


Introduction
With the development of image super-resolution (SR) reconstruction, significant progress has been made in various fields [1][2] .Due to the swift advancement of deep learning, deep learning-driven techniques have outperformed conventional approaches by leveraging training datasets to comprehend the correlation between low-resolution (LR) and high-resolution (HR) images.This enables them to generate super-resolution images through end-to-end mapping learning.As a result, these methods have demonstrated superior performance in image reconstruction.
Dong et al. [3] designed the first convolutional neural network with a three-layer simple structure for super-resolution, demonstrating the impressive capabilities of deep learning in this area.Later, a series of neural networks with deeper and more complex architectures were introduced, including VDSR [4] , EDSR [5] , and DRN [6] , thanks to the introduction of residual learning.These approaches progressively enhance the network's non-linear fitting ability by increasing the network's depth and width and incorporating complex design structures.
However, using L1 or L2 loss functions to measure the difference between input and output for updating the network can lead to the network generating more blurry results [7][8] .This is because super-resolution problems are ill-posed.Some works introduce adversarial learning to make the output data distribution close to the data distribution of the training dataset, effectively solving the above problem.It is worth noting that training data is only a sparse sampling of the real world, which means that fitting the data distribution of the training dataset may not generalize well to other domains.Therefore, many widely used methods, such as Real-ESRGAN [9] and BSRGAN [10] , require separate fine-tuning and weight parameter saving for different application tasks.Due to concerns about data privacy or security, old data is often unavailable, which means fine-tuning can only be done on new datasets.We found that the performance of the fine-tuned parameters on old tasks often deteriorates severely, resulting in catastrophic forgetting.
To address the above issue, firstly, we conducted research on catastrophic forgetting in superresolution.We then constructed new training and testing datasets and analyzed performance changes under different scenarios, modalities, and blur kernels.Then, we designed a novel neural network consisting of two branches, namely the detail and degradation branches, and trained them separately using different learning strategies and loss functions.Experimental results showed that our method effectively reduces the degree of catastrophic forgetting and improves the network's generalization ability and robustness.

Catastrophic forgetting
Currently, the issue of catastrophic forgetting has been extensively studied in classification tasks but has not yet been applied to image restoration tasks.Therefore, we conducted research on disasterinduced forgetting in super-resolution.
Firstly, due to the strong performance demonstrated by RRDBNet in multiple works, we adopted it as the baseline model.We collected a dataset with multiple modalities and scenes to conduct our experiments, including natural scenes, thermal infrared, and anime modalities.The natural scene images were further divided into four common scenes (Urban, Plant, Animal, and Other).Figure 1 shows the experimental results of using only L1 loss.As shown in Figure 1(a), the reconstruction performance did not show a significant decrease in the incremental learning experiments under different scenarios and even increased on some datasets.Moreover, this phenomenon exists to some extent between different modalities, as shown in Figure 1(b), where there is still no significant performance loss from visible light to anime datasets.For different types of blur kernels, we used bicubic, bilinear, and nearest instead of complex kernels with randomness.As shown in Figure 1(c), there was a significant performance degradation this time, which indicates that under L1 or L2 loss, the network has a tendency to learn the feature of degraded kernels rather than relying on the image structure itself.This provides inspiration for our method.
In computer vision, it is often desirable to reconstruct images that are as clear as possible.To achieve this goal, Generative Adversarial Networks (GANs) are commonly used to learn and generate data distributions in a structured manner.We attempted to replicate the experiment in Figure 1 using a GAN network, and the results are shown in Figure 2. Due to the fact that the details in the image restoration actually come from the data distribution learned by the network from the training dataset, we can observe significant degradation in Figure 2. To make a more intuitive judgment, the visual effects of its degradation are shown in Figure 3.Although there is not a very drastic change visually, this is because the global residual structure commonly used in super-resolution networks can inherit the low-frequency information of the input image, which partially alleviates the degree of change.By careful observation, we found that the reconstruction errors of the image details had significantly increased, which can be reflected in three core indicators: PSNR, SSIM, and LPIPS.Similarly, we conducted experiments under different modes, and the results showed that there were even more severe degradation phenomena, as shown in Figure 2

Detail-degradation decoupling networks for super-resolution
According to the research in Section 2.1, we found that the mapping learned by super-resolution neural networks varies depending on the loss function used.Specifically, training the network with L1 loss tends to result in the network learning the inverse process of degradation, whereas adversarial loss encourages the network to learn the dataset's data distribution.Thus, drawing inspiration from this phenomenon, we have divided the network into two branches: the degradation branch, which estimates the degradation kernel, and the Detail Branch, which reconstructs high-frequency information that may have been lost.We show the overall architecture of the network in Figure 4, and then we will explain the two branches of the network.Degradation Branch: It has been proven that a lightweight neural network can perform well in estimating the inverse degradat process, at least for bicubic downsampling.To achieve this, we designed a network similar to U-Net as the DB, which utilizes multi-level skip connections to improve the reuse of feature maps, thus enhancing the reconstruction performance of the network.Compared to many large networks, this network has better efficiency and fewer weight parameters.
Detail Branch: Because the super-resolution problem is ill-posed, learning the data distribution from the dataset to reconstruct high-frequency information is a difficult task.Therefore, we directly adopted the more complex RRDBNet as the Detail Branch.However, due to the fact that some recoverable high-frequency details no longer need to be handled by the branches, we appropriately reduced their depth and width.

Loss function and training strategy
Based on the network we designed, we have developed corresponding training strategies, as shown in Figure 5. First, the Detail Branch in the network is temporarily removed, and we separately train the degradation branch using L1 loss.This enables it to learn the inverse process of degradation without coupling with the data distribution.Next, we freeze the other parameters and add the Detail Branch, as well as incorporating adversarial loss.
where g q and d q are weight parameters for the generator and discriminator.For generative adversarial networks, the learned data distribution comes from the implicit knowledge extracted by the old discriminator .Therefore, to retain prior knowledge, we incorporate the distillation loss in addition to the conventional discriminator loss.The formulation of the proposed synthesis loss is shown as follows.)   d gd go adv gd sim gd go L L L q q q l q q = + (2) where adv L is the basic discriminator loss, which is calculated by inputting the generated image and the real image separately and calculating their authenticity, as shown in the following formula.
is the loss function that evaluates the similarity between the feature vectors j of the new and old discriminators, respectively, where the output vector is taken from the last layer.We adopt KL divergence as follows.

Experimental environment
We trained our proposed method on a hardware setup comprising an Intel i9-9900K processor, 32.0 GB RAM, and a GeForce RTX 3090 GPU.The training used the PyCharm software platform and the PyTorch 1.12 deep learning framework.

Detail-degradation decoupling networks for super-resolution
We performed experiments on our proposed method, and the outcomes are shown in Figure 6.We divided the dataset of four scenarios into 12 subsets and conducted incremental learning experiments on each of them sequentially.Our method has shown significant improvement compared to fine-tuning when the number of scenes and modalities is increased, although there is still a gap compared to joint training.The results of incremental learning are shown in Figure 7.Our method can still preserve a considerable number of details after incremental learning, while fine-tuning methods suffer from severe distortion in texture and color.From both visual perception and evaluation metrics, our method effectively alleviates the forgetting phenomenon of the super-resolution network under incremental learning.

Conclusion
To address the forgetting problem in incremental learning for super-resolution reconstruction, we propose a Detail-Degradation Decoupling Network.By decomposing the degradation inverse process and high-frequency detail recovery process in super-resolution, the network parameters can be updated based on the task's properties in the incremental learning process.We propose a new loss function and training strategy for incremental learning, which distills data distribution and prior knowledge from the discriminator on the old dataset to reduce catastrophic forgetting.Experimental results show that our method effectively reduces catastrophic forgetting and achieves better performance.

Figure 1 .
Visualization of the metric changes in incremental learning under the L1 loss function. (b).

Figure 2 .
Visualization of the metric changes in incremental learning on the GANs.

Figure 3 .
Figure 3. Visualization of the degradation phenomenon in incremental learning.

Figure 4 .
Figure 4.The architecture of proposed neural network.

Figure 5 .
Figure 5. Schematic diagram of loss function and training strategy

Figure 6 .
Figure 6.Visualization of the changes in performance metrics for incremental learning.