Deep Unrolling Networks with Recurrent Momentum Acceleration for Nonlinear Inverse Problems

Combining the strengths of model-based iterative algorithms and data-driven deep learning solutions, deep unrolling networks (DuNets) have become a popular tool to solve inverse imaging problems. While DuNets have been successfully applied to many linear inverse problems, nonlinear problems tend to impair the performance of the method. Inspired by momentum acceleration techniques that are often used in optimization algorithms, we propose a recurrent momentum acceleration (RMA) framework that uses a long short-term memory recurrent neural network (LSTM-RNN) to simulate the momentum acceleration process. The RMA module leverages the ability of the LSTM-RNN to learn and retain knowledge from the previous gradients. We apply RMA to two popular DuNets -- the learned proximal gradient descent (LPGD) and the learned primal-dual (LPD) methods, resulting in LPGD-RMA and LPD-RMA respectively. We provide experimental results on two nonlinear inverse problems: a nonlinear deconvolution problem, and an electrical impedance tomography problem with limited boundary measurements. In the first experiment we have observed that the improvement due to RMA largely increases with respect to the nonlinearity of the problem. The results of the second example further demonstrate that the RMA schemes can significantly improve the performance of DuNets in strongly ill-posed problems.


Introduction
Many image processing tasks can be cast as an inverse problem, i.e., to recover an unknown image x from indirect measurements y y = A(x) + ϵ, where x ∈ X, y ∈ Y , A represents a forward measurement operator and ϵ is the observation noise.Problems that can be formulated with Eq. (1) include denoising [18], compressive sensing [37], computed tomography reconstruction [2,3], phase retrieval [29], optical diffraction tomography [30], electrical impedance tomography [28,10,33] and so on.
A common challenge in solving the inverse problems is that they are typically ill-posed and as such regularization techniques are often used to ensure a unique and stable solution.Classical regularization techniques introduce an explicit regularization term in the formulation, yielding a variational regularization problem that can be solved with iterative algorithms, such as the alternating direction method of multipliers (ADMM) [5] and the primal-dual hybrid gradient (PDHG) method [6].Although such methods can obtain well-posed solutions of the inverse problems, they have a number of limitations: most notably, inaccuracy regularizing assumption, need for parameter tuning, and mathematical inflexibility.A possible remedy to these limitations is to implicitly regularize the inverse problem by replacing certain modules in the iterative procedure with deep neural networks [1,2,37,34], a type of method commonly referred to as the deep unrolling networks (DuNets).
DuNets, pioneered by Gregor and LeCun in [13], have achieved great empirical success in the field of inverse problems and image processing [39,26,32] in the past few years.DuNets combine traditional model-based optimization algorithms with learning-based deep neural networks, yielding an interpretable and efficient deep learning framework for solving inverse imaging problems.It should be noted that most of the aforementioned studies have focused on solving inverse problems with linear or linearized forward operators.On the other hand, there has been relatively little research on applying DuNets to nonlinear inverse problems, such as optical diffraction tomography and electrical impedance tomography (EIT).This paper attempts to bridge this gap by applying DuNets to the nonlinear inverse problems.In such problems, nonlinearity may pose additional difficulty for the DuNet methods, as the gradient of the forward operator varies significantly as the iterations proceed.
Intuitively, the performance of DuNets may be improved if the previous gradient information is included.In this work we will draw on momentum acceleration (MA), an acceleration strategy commonly used in optimization algorithms, and the recurrent neural networks (RNN) techniques to improve the performance of the deep unrolling networks.Specifically, we propose a recurrent momentum acceleration (RMA) module that utilizes a long short-term memory recurrent neural network (LSTM-RNN) to represent the momentum acceleration term.The RMA module exploits the LSTM-RNN's capacity to remember previous inputs over extended periods and learn from them, thereby providing information from the entire gradient history.In particular we apply it to two popular DuNets: the learned proximal gradient descent (LPGD) [25] and the learned primal-dual (LPD) [2].We refer to the resulting algorithms as LPGD-RMA and LPD-RMA respectively.It should be noted that several existing works propose to use the iteration history to improve the performance of the unrolling algorithms.For example, [2] extends the state space to allow the algorithm some "memory" between the iterations, and [17] develops a history-cognizant unrolling of the proximal gradient descent where the outputs of all the previous regularization units are used.As a comparison, our RMA method employs the previous gradients that are combined via a flexible RNN model learned from data.
The remainder of this paper is organized as follows.In section 2, we review two widely used types of deep unrolling models: the LPGD and the LPD methods.In section 3, we present our RMA formulation, and incorporate it with both LPGD and LPD, yielding LPGD-RMA and LPD-RMA.Numerical experiments performed on two nonlinear inverse problems are reported in section 4. Finally section 5 offers some concluding remarks.

Deep unrolling networks
We start with the variational formulation for solving inverse problems of the form (1).
These methods seek to solve the following minimization problem that includes a data where λ is the regularization parameter balancing R against D. The regularizer R encodes prior information on x representing desired solution properties.Common regularization functions include Tikhonov regularization, total variation (TV), wavelets, and sparsity promoting dictionary [4], to name a few.Besides Tikhonov, all the aforementioned regularization functions are non-smooth, and as a result Eq. ( 2) is typically solved by the first-order algorithms, such as the proximal gradient descent (PGD) algorithm [11], the variable splitting scheme [5] and the primaldual hybrid gradient (PDHG) method [6].These algorithms typically involve rather expensive and complex iterations.The basic idea of DuNets is to use a learned operator, represented by a deep neural network, to model the iterations.We here focus on two archetypes of the deep unrolling models: the LPGD method with both shared [25,24] and independent weights [15,7], and the LPD method [2].

Learned proximal gradient descent method
Starting from an initial value x 0 , PGD performs the following iterates until convergence: where α t is the step size and the proximal gradient descent P λR (•) is defined by Simply speaking, the LPGD method replaces the proximal operators P λR with a neural network Ψ θt [15,7,36], and thus it allows one to learn how to effectively combine the previous update with the gradient update direction instead of a pre-determined updating scheme.The complete algorithm of LPGD is outlined in Algorithm 1.In the standard LPGD algorithm, the network in each iteration uses its own weights θ t , and a popular variant of the method is to use shared weights over all the networks, i.e., restricting θ 1 = ... = θ T .We refer to [25,24] for more details.

Learned primal-dual method
Adler and Öktem first introduced the partially learned primal-dual approach as an extension of iterative deep neural networks in [1] and further elaborated it into the LPD approach in [2].The method is based on PDHG, another popular algorithm for solving the non-differentiable optimization problem (2).The PDHG iteration is given by where σ, τ, γ are predefined parameters, D * denotes the Fenchel conjugate of D, and [∂A (x t )] * is the adjoint of the Fréchet derivative of A at point x t .The LPD method is built upon Eq. ( 4), and the main idea is to replace prox σD * and prox τ R with neural network models that are learned from data.Once the models are learned, the reconstruction proceeds via the following iterations: The complete LPD algorithm is given in Algorithm 2. It is important to note that the LPD method often enlarges both the primal and dual spaces to allow some "memory" between iterations [2].In particular, it defines t , . . ., x t , u t , . . ., u corresponds to dual and primal networks, which have different learned parameters but with the same architecture for each iteration.A typical initialization is , where 0 is the zero element in the primal or dual space.We refer to [2] for details on the LPD method.

Deep unrolling networks with momentum acceleration
The conventional DuNets, exemplified by LPGD and LPD, only use the current gradient, ignoring a large amount of historical gradient data.As has been discussed, these methods can be improved by adopting the momentum acceleration (MA) strategies that are frequently used in optimization methods.In this section we will present such momentum accelerated DuNet methods.

Momentum acceleration methods
We here discuss the conventional explicit MA scheme and the one based on RNN.

Explicit momentum acceleration Momentum-based acceleration methods, like
Nesterov's accelerated gradient [27] and adaptive moment estimation [20], are wellestablished algorithms for speeding up the optimization procedure and have vast applications in machine learning [31].The classical gradient descent with MA utilizes the previous "velocity" v t−1 at each iteration to perform extrapolation and generates the new update as: where g t−1 is the gradient of the objective function evaluated at x t−1 , η is the step size, and γ ∈ [0, 1) is the momentum coefficient controlling the relative contribution of the current gradient and the previous velocity.Eq (6) can be rewritten as: which shows that the current velocity is essentially a weighted average of all the gradients (assuming v 0 = 0).As mentioned above, the momentum coefficient γ controls how much information from previous iterations is used to compute the new velocity v t , and therefore needs to be chosen carefully for a good performance of the method.However, the optimal value for the parameter is problem-specific and typically requires manual tuning [31,23].

Momentum acceleration via RNN
The conventional momentum method utilizes a fixed formula (a linear combination of all the gradients) to calculate the present velocity v t .In this section we introduce a more flexible scheme that uses the recurrent neural networks to learn the velocity term [16], which we refer to as the RMA method.In particular we use the LSTM based RNN, which is briefly described as follows.At each time step t we employ a neural network to compute the "velocity" v t .The neural network has three inputs and three outputs.These inputs include the current gradient input g t−1 , the cell-state c t−1 (carrying long-memory information) and the hidden state h t−1 (carrying short-memory information), where the latter two inputs are both inherited from the previous steps.The outputs of the network are v t , h t and c t .We formally write this neural network model as, where ϑ is the neural network parameters, and leave the details of it in Appendix A. As one can see this network integrates the current gradient g t−1 and the information from previous step h t−1 and c t−1 to produce the velocity v t that is used in DuNets.Finally we note that other RNN models such as the Gated Recurrent Unit (GRU) [8] can also be used here.In our numerical experiments we have tested both LSTM and GRU, and found no significant difference between the performances of the two approaches, which is consistent with some existing works, e.g., [9,12].As such, we only report our experimental results with LSTM-RNN in this work.

LPGD and LPD with MA
Inserting the MA module into the DuNets algorithms is rather straightforward.In this section, we use LPGD and LPD as examples, while noting that they can be implemented in other DuNets in a similar manner.
LPGD The explicit MA can be easily incorporated with LPGD.A notable difference is that in the unrolling methods, g t is not the gradient of the objective function, which may be nondifferentiable or not explicitly available.In LPGD, g t is taken to be the gradient of the data fidelity term only, i.e., g t = ∇ x t−1 D(A(x t−1 ), y).The main idea here is to replace g t−1 in Alg. 1 by v t calculated via Eq.(6a), yielding the LPGD-MA method (Alg.3).Similarly by inserting the LSTM model into Alg.(1), we obtain the LPGD-RMA algorithm (Alg.4), as illustrated in Figure 1(a).Finally, recall that LPGD also has a shared-weights version (referred to as LPGDSW), and correspondingly we have LPGDSW-MA and LPGDSW-RMA, which will also be tested in our numerical experiments. 4: 5: end for LPD The integration of the MA schemes and LPD is a bit different.Namely, in LPGD, the velocity is constructed based on the gradient of the data fidelity term, while in LPD, we build it upon g t−1 = ∂A(x t .By inserting the explicit MA formula (6a) into Alg. 2 we obtain the LPD-MA method (Alg.5).The LPD-RMA method, depicted in Figure 1(b), can be constructed similarly: one simply replaces the explicit MA formula in Alg. 2 with the RMA module Eq. ( 8), and the complete algorithm is outlined in Alg. 6.

Experiments and Results
In this section, we present our numerical experiments on two nonlinear inverse problems: a nonlinear deconvolution and an EIT image reconstruction.

Implementation details
To make a fair comparison, for various DuNet methods, we adjust the number of unrolled iterations to ensure that all the methods have approximately the same of number of training parameters.Specifically, for the LPGD-type of methods, we set the unrolling iterations to 20 for LPGD-RMA and to 43 for both LPGD and LPGD-MA.The outputs of the proximal operator unit are first concatenated with the estimated direction from the RMA module and then combined using a convolutional layer with a 3 × 3 kernel size and 32 output channels before being fed to the subsequent block.The primal subnetwork consists of two convolutional layers of kernel size 3 × 3 and 32 output channels.The convolutional layers are followed by a parametric rectified linear units (PReLU) activation function.The output convolutional layer is designed to match a desired number of channels and does not include any nonlinear activation function.In LPD-RMA, we use 10 unrolling iterations, while in other LPD methods, we adjust it to 22.The number of data that persists between the iterates be N primal = 5, N dual = 5.
The primal subnetwork Γ θ d i is the same as that used in LPGD-based methods.The dual subnetwork consists of one convolutional layer of kernel size 3 × 3 and output channels 32, and the other setting is the same as the primal subnetwork.
All networks are trained end-to-end using Adam optimizer [20] to minimize the empirical loss (9).We use a learning rate schedule according to the cosine annealing, i.e., the learning rate in step t is where the initial learning rate ζ 0 is set to be 10 −3 .We also let the parameter β 2 of the ADAM optimizer [20] to be 0.99 and keep all other parameters as default.We perform global gradient norm clipping [38], limiting the gradient norms to 1 to improve training stability.We use a batch size of 32 for the nonlinear convolution example and 1 for the EIT problem.For the DuNets-MA methods, we choose γ = 0.9 and η = 10 −3 .We train all models with 20 epochs and keep a set of trainable parameters that achieve minimal validation losses.We do not enforce any constraint on the trainable parameters during training.
All experiments are run on an Intel Xeon Golden 6248 CPU and an NVIDIA Tesla V100 GPU.The nonlinear deconvolution example is run entirely on the GPU.We use the ℓ 2 loss function on the outputs from all the stages.Specifically, given the paired samples {x i , y i }, i = 1, . . ., N , the training objective is defined as: Here, xi is the reconstruction, and Θ presents the set of trainable parameters.

A nonlinear deconvolution problem
4.2.1.Problem setting We consider a nonlinear deconvolution problem which is constructed largely following [42].
n elements, the forward problem is defined as Here ′ is the first-order Volterra kernel, which contains the coefficients of the Volterra series' linear part.The second-order Volterra kernel, denoted as W 2 , is structured as follows: It is important to note that the parameter a controls the degree of nonlinearity in the deconvolution problem.

Training and testing datasets
We assume that the unknown x is on 53 mesh grid points, and meanwhile we choose the nonlinear kernel with size 9 and stride 4, and the dimension of the observed data y is 12.The first-order and second-order Volterra kernels in Eq. ( 10) are derived using the methods described in Section 3 of [21].We consider four sets of experiments with different coefficients in Eq. ( 10): a = 0, 1, 2, 4.
We generate the ground truth by sampling from a TV prior (see Chapter 3.3 in [19]) and then obtain the observation data by (10).We employ 12000 randomly generated paired samples, where 10000 pairs are used as the training set, and the remaining 1000/1000 pairs are used as the validation/test sets.

Results and discussion
Benefit of the RMA scheme We assess the performance of RMA with the three unrolling methods LPGDSW, LPGD and LPD, and in each method we implement the following three different cases: without acceleration, with the conventional MA module and with the RMA module; as such there are 9 schemes implemented in total.All hyper-parameters in the tested methods are manually tuned for optimal performance or automatically chosen as described in the aforementioned references.
Table 1 demonstrates the performance of each method in terms of the mean-square error (MSE) on four different settings: a = 0, 1, 2, 4. The visual comparison can be found in Figure 2. We summarize our findings as the following: (i) when a = 0, the MSE values of each type of DuNets method are almost the same, which is not surprising as the gradient of the forward operator is constant; (ii) when a > 0, DuNets with RMA outperform the state-of-the-art methods by a rather large margin (e.g., LPD-RMA outperforms the LPD method by 8.0%, 12.0%, and 16.0% in terms of MSE for a = 1, 2, 4 respectively), suggesting that the RMA module can significantly improve the performance of DuNets, especially for problems that are highly nonlinear; (iii) the conventional MA module can also improve the performance, but RMA clearly outperforms it in all the nonlinear cases; (iv) LPD-RMA method consistently achieves the best results in terms of MSE.
Next we will test the sensitivity of the RMA module with respect to both the network structure and the data size.Since the LPGD-type methods are significantly outperformed by the LPD-type ones in this example, we only use the LPD-type methods in these tests.
Sensitivity to the RMA structure We discuss here the choice for the architecture of the RMA modules, i.e., the number of hidden layers L and the hidden size n of the LSTM layer.To avoid overfitting, we limit the ranges of the hidden layers as L ∈ {1, 2, 3} and the hidden size as n ∈ {30, 50, 70}.Table 2 demonstrates the results of LPD-RMA  trained with different network structures in the setting where a = 1.We observe that in all these settings the LPD-RMA yields similar results, indicating that the algorithm is rather robust provided that the parameter values are within a reasonably defined range.
With extensive numerical tests, we have found that a reasonable choice of L may be L ∈ {1, 2} in moderate dimensions, and n can be chosen to be approximately the same as the dimensionality of the unknown variables.We have also tested the methods for a = 2, 4, where the results are qualitatively similar to those shown in Table 2, and hence are omitted here.reconstructing the inner conductivity distribution of a medium from a set of voltages registered on the boundary of the domain by a series of electrodes [41].

Sensitivity to data size
In this example, we consider a bounded domain Ω ⊆ R 2 with a boundary ∂Ω containing certain conducting materials whose electrical conductivity is defined by a positive spatial function σ(x) ∈ L ∞ (Ω).Next, we assume that L different electrical where Γ( Γ) is the boundary ∂Ω with (without) electrodes, e is the outer normal direction at the boundary, V l is the voltage measured by the l-th electrode E l when current I l is applied, and z l is the contact impedance.
Numerically, the object domain Ω discretized into n S subdomains {τ j } n S j=1 and σ is constant over each of them.One injects a current at a fixed frequency through a pair of electrodes attached to the boundary and measures the voltage differences on the remaining electrode pairs.This process is repeated over all electrodes, and the resulting data is represented as a vector denoted by y ∈ R n Y where n Y is the number of measurements.We can define a mapping F : R n S → R n Y representing the discrete version of the forward operator: where η is a zero-mean Gaussian-distributed measurement noise.The EIT problem aims to estimate the static conductivity σ from measurements y.The EIT problem is widely considered to be challenging due to its severe ill-posedness, largely caused by the highly non-linear dependence of the boundary currents on the conductivity.

Training and Testing Datasets We run numerical tests on a set of synthetic 2D
experiments to evaluate the performance of the reconstruction methods.In the circular boundary ring, L = 16 electrodes are equally spaced and located.The conductivity of the background liquid is set to be σ 0 = 1.0Ωm −1 .Measurements are simulated by using pyEIT [22], a Python-based package for EIT.For each simulated conductivity phantom, the forward EIT problem ( 11) is solved using FEM with approximately 1342 triangular elements.We explore the following two typical cases to test the methods: • Case 1: the anomalies consist of two random circles with radii generated from the uniform distribution U (−0.6, 0.6) and the conductivity values are 0.5 and 2 respectively in each circle; • Case 2: the anomalies consist of four random circles with radii generated according to U (−0.55, 0.55) and the conductivity values are 0.3, 0.5, 1.5, and 2.0.
We perform the DuNet methods with three different training sample sizes 50, 200 and 400, and 20 testing samples to evaluate the performance of the methods.We report that the testing time is approximately 19 seconds per sample for all DuNets.

Results and discussion
In the numerical experiments, we use the same set of unrolling schemes as in Section 4.2, and in addition we also implement the regularized Gauss-Newton (GN) method and the primal-dual interior point method with total variation regularizer (PDIPM-TV) [40].The parameters in both GN and PDIPM-TV are optimally tuned.We calculate the mean and standard deviation of MSE over ten independent runs with different training data sizes, and provide the results for Case 1 in Table 3 and those for Case 2 in Table 4.In what follows our discussion is focused on three aspects of the experimental results, (i) benefit of the RMA module, (ii) behaviors in low-data regimes, (iii) robustness against the number of inclusions in the conductivity area, and (iv) performance against the number of unrolling iterations.
Benefit of RMA scheme First we note that five of the ten runs of the baseline LPGD (with no momentum acceleration) fail to converge within 20 epochs in both cases, and as such, we omit the results of the method in all the tables and figures.All the other algorithms can reasonably capture the inclusions' shape and position in all the ten runs.
We highlight that, according to Tables 3 and 4 the methods with the RMA module achieve the best performance in all but one test (LPGDSW method with 50 samples for Case 1) where the standard MA has the best results.In contrast, the effect of the standard MA module is not consistent: for example it results in worse performance than the baseline approach (without momentum acceleration) in the LPD method for case 1.The learning-based RMA module provides a more effective implicit regularizer than standard DuNets by utilizing previous gradient information.As such, we can see that DuNets-RMA achieves improved and more stable performance relative to standard DuNets.Moreover, we want to compare the proposed methods with the conventional optimization based approach.To do so we provide the reconstruction results of four testing samples in Figure 4 (for Case 1) and Figure 5 (for Case 2).For simplicity we only provide the results of the DuNets with RMA, which is compared with those obtained by GN and PDIPM-TV methods.It can be seen from both figures that all DuNets approaches with the RMA module can yield rather accurate reconstruction for all the inclusions, and the quality of the images is clearly better than those of GN and PDIPM-TV (note that the parameters in GN and PDIPM-TV are manually tuned for the best MSE results).Furthermore, the inclusions near the boundary are better recovered than those near the center, which confirms that inclusions far away from the boundary are more difficult to reconstruct since the boundary data are not sensitive to them [35,14].Data size Next we examine the performance of the DuNets methods with respect to the data size.For this purpose we visualize the results in Tables 3 and 4 with Figs. 6   and 7.One observes that MSE is decreasing with more training data used in all the methods with RMA, which is not the case for the baseline methods and those with MA.
For example, in Case 1, the MSE results of LPGDSW become evidently higher when the data size changes from 200 to 400.These results demonstrate that the RMA module can increase the stability of the DuNet methods with respect the data size.

Number of inclusions
We now study how the methods perform with different numbers of inclusions by comparing the results in Case 1 (2 inclusions) and Case 2 (4 inclusions).
We can see that the MSE results in Case 2 are generally higher than those in Case 1, indicating that more inclusions make the problems more challenging for all methods.
Nevertheless, the RMA module considerably improves the performance of DuNets in both cases, an evidence that RMA is rather robust against the number of inclusions.In       Then we compute the average MSE value from ten independent runs over the 20 test datasets and present the results in Figure 8.We can see from the figure that, when T is less than 10, an increase in T significantly enhances model performance in terms of the MSE value; however, when T is greater than 10, this enhancement diminishes and the MSE value slightly rises.One possible reason for this phenomenon is that an increase in T leads to a higher number of training parameters, making the model more prone to overfitting.Therefore, we should select an appropriate T that represents a good balance between model performance and complexity.

Conclusion
In this work, we propose a method for improving the performance of DuNets in nonlinear inverse problems.In particular, we apply RMA to two popular DuNets: LPGD and LPD respectively.We provide numerical experiments that can demonstrate the performance of the proposed method.We expect that the proposed method can be extended to other unrolling algorithms and applied to a wide range of nonlinear inverse problems.cell proceeds as follows.First LSTM computes a candidate cell state ct by combining h l t−1 and z l−1 t (with z 1 t = g t−1 ), as: ct = tanh W hc h l t−1 + W gc z l−1 t + b c , and it then generates a forget gate x t , an input gate i t , and an output gate o t via the sigmoid function σ(•): The forget gate is used to filter the information inherited from c t−1 , and the input gate is used to filter the candidate cell state at t. Then we compute the cell state c t via, which serves as a memory reserving information from the previous iterations, and the hidden representation h l t as, where ⊗ denotes the element-wise product.Finally the output of the LSTM model v t at time t is calculated as: which is used to replace the gradient in the standard DuNets methods.This is a brief introduction to LSTM tailored for our own purposes, and readers who are interested in more details of the method may consult [16,12].

Figure 1 :
Figure 1: The model architectures of DuNets-RMA.The RMA module is constructed via a deep LSTM-RNN, denoted as Ξ ϑ .

= 4 Figure 2 :
Figure 2: Deconvolution results and their corresponding MSE values for all DuNets-RMA methods.
figure shows that LPD-RMA is considerably more data-efficient than LPD and LPD-MA.Interestingly after the data size increases to over 50%, the use of the conventional MA module can not improve the performance of LPD, while LPD-RMA consistently achieves the best accuracy across the whole range.

Figure 3 :
Figure 3: The MSE results plotted against the data size for a = 1.

Figure 6 :
Figure 6: The MSE results plotted against the number of training data for Case 1.The solid line represents the average of 10 tests, and the shade around the solid line depicts the one standard deviation.

Figure 7 :
Figure 7: The MSE results plotted against the number of training data for Case 2. The solid line represents the average of 10 tests, and the shade around the solid line depicts the one standard deviation.

Figure 8 :
Figure 8: The MSE results plotted against the number of unrolled iterations.
The forward and adjoint operators in the EIT experiments are run on the CPU as the pyEIT toolbox used is not computationally parallelizable and runs faster on the CPU.The training time for a single epoch is approximately 4 minutes in the nonlinear convolution example with 10000 training samples, and 60 minutes for the EIT example with 400 training samples.The code for implementing all experiments is available at https://github.com/zhouqp631/DuNets-RMA.git.

Table 1 :
MSE results of the DuNets methods under different a values.The result of LPGD is not reported as it fails to converge.The best results are indicated in orange color.

Table 2 :
Mean MSE values of the LPD-RMA models with L = 1, 2, 3 and n = 30, 50, 70.Evaluation is done via repeating the experiment 10 times.The number of trainable parameters is also reported below the MSE value in parentheses.

Table 3 :
The average MSE values of the DuNets methods for Case 1, with the associated standard deviations in parentheses.The average MSE for the GN approach is 10.8E-03 (± 3.16E-03), and for the PDIPM-TV method, it is 5.24E-03 (± 2.78E-03).The best MSE results are indicated in orange color.

Table 4 :
The average MSE values of the DuNets methods for Case 2, with the associated standard deviations in parentheses.The average MSE for the GN approach is 12.6E-03 (± 1.0E-03), and for the PDIPM-TV method, it is 7.67E-03 (± 0.36E-03).The best results are indicated in orange color.