Application of a mild data-driven technique to Lippmann–Schwinger inverse scattering in variable-exponent Lebesgue spaces for microwave imaging

A mild data-driven approach for microwave imaging is considered in this paper. In particular, the developed technique relies upon the use of a Newton-type inversion scheme in variable-exponent Lebesgue spaces, which has been modified by including a data-driven operator to enforce the available a-priori information about the class of targets to be investigated. In this way, the performance of the method is improved, and the problems related to the possible convergence to local minima are mitigated. The effectiveness of the approach has been evaluated through numerical simulations involving the detection of defects inside (partially) known objects, showing good results.


Introduction
Microwave imaging is nowadays considered a very effective tool for non-destructive diagnostics in several applicative areas, including civil engineering, subsurface prospecting, industrial quality control, and biomedical imaging [1][2][3][4].Indeed, in principle, it allows retrieving some physical parameters (e.g. the dielectric properties) of the scenario under test in a noninvasive way, starting from measurements of the electromagnetic field scattered by the targets when illuminated by a proper incident field.It is worth remarking that, since the frequencies adopted for the inspection are similar to those of other widespread equipment (e.g.communication apparatuses), it is possible to realize quite cheap and portable apparatuses.Moreover, thanks to the use of non-ionizing radiations, such systems are safe for the users, and potentially usable for continuous monitoring.
However, the dielectric properties of the targets under test are related to the measured scattered-field data through a non-linear operator, which is well-known to lead to an ill-posed inverse problem [5][6][7].To address this issue, in the last years, several approaches have been developed.In particular, they can be customarily subdivided into two main classes: qualitative and quantitative techniques.On the one hand, qualitative approaches aim at providing a reconstruction of only some features of the scenario under test, such as position and dimension of targets.In these cases, linearized formulations and/or simplified models are often adopted [8][9][10][11][12][13][14][15], although regularization is still needed to face the ill-posedness of the problem.On the other hand, quantitative techniques may allow retrieving the full distributions of the dielectric properties (e.g.dielectric permittivity and electric conductivity) of targets by solving the full non-linear problem.In this framework, stochastic and deterministic techniques have been developed [16][17][18][19][20][21][22][23][24][25][26].In stochastic techniques, the inverse problem is recast as the solution of a minimization problem, and approaches of this kind have the advantage of being able to reach the global minimum.However, they usually require quite large computational resources, making their use in practical applications not always feasible.On the other hand, deterministic approaches are instead much less computational demanding, but they usually suffer from premature convergence to local minima.Consequently, it is often necessary to include some a priori information (e.g. in the form of suitable initial guesses or specific constraints) in order to strengthen the convergence to the actual solution.
Recently, data driven approaches, such as those based on surrogate or learning-based models, are also acquiring an increasing interest in the field of inverse problems.Indeed, they allow including information directly learned from data into the inversion procedure.In this framework, deep and convolutional neural networks have been proposed by several Authors for solving microwave imaging problems, both considering 'black box' approaches and by trying to include some physical insights into the procedure [27][28][29][30][31][32][33][34].In this framework, methodologies that merge classical inversion schemes with data driven terms are emerging.Indeed, a rigorous modelling of the physics of the inverse problem is maintained, and the a-priori information may be effectively inserted through a set of proper 'examples'.
In this work, a mild data-driven approach [35] is introduced for the first time in the framework of variable-exponent Lebesgue-space inversion techniques [36,37].In particular, following the main idea in [35], the proposal allows including a data-driven regularization term able to add some a-priori information about the class of targets to be inspected.In such a way, the main drawback of the approach in [36,37], which is related to the strong dependence of the solution upon the initial and reference profiles, is overcome, leading to enhanced reconstructions.The performance of the approach has been studied here through numerical simulations.
The paper is organized as follows.The developed data-driven regularized inverse-scattering method is discussed in section 2. In particular, after the Lippmann-Schwinger formulation of the inverse scattering problem of section 2.1, the algorithm is introduced in Hilbert spaces and then extended to Banach spaces in sections 2.2 and 2.3 respectively.The framework of variable-exponent Lebesgue space inversion is introduced in section 2.4, and the mild datadriven approach is summarized in section 2.5.Section 3 reports a numerical evaluation of the reconstruction performances of the method.Finally, conclusions are drawn in section 4.

Data-driven regularized inverse scattering
Physics-informed learning uses some model knowledge in neural networks to achieve a better correspondence between data and solutions.Indeed, the physical model allows enforcing the a-priori relationships between them.The developed approach, following the idea in [35], reverses this paradigm, because now some information about known pairs of data and solutions is inserted into the physics model, making the whole solving scheme more efficient by a-priori data-driven content.
The inverse scattering physics integral formulation and the data-driven enhancement are summarized in the following Subsections, together with the extension to our specific variableexponent Lebesgue space setting.

Inverse scattering integral formulation
The inverse scattering problem is formulated with reference to the geometrical configuration shown in figure 1.In particular, it is assumed that the investigation domain D, which is a square region on the xy plane where targets of interest are assumed to lie, is sequentially illuminated from M different positions by ideal line-current sources located at r (m) Tx , m = 1, . . ., M. In this way, a multiview configuration with M views is obtained.In each view, the source is described by a current density J (m) ẑ, where δ (•) is the two-dimensional Dirac's delta on xy plane and I an arbitrary current value.Time-harmonic sources and fields at the angular frequency ω are assumed (the e jωt time dependence is then always dropped).
In the absence of targets inside D (i.e.free space conditions) the above source term gives rise to a z-polarized transverse-magnetic (TM-z) incident electric field given by where g 0 = ( j/4) H (2) Tx |) represents the free space Green's function, with k 0 = ω √ µ 0 ϵ 0 , µ 0 and ϵ 0 being the magnetic permeability and dielectric permittivity of vacuum, respectively [1].Assuming that cylindrical targets are present in D, with uniform geometric and dielectric properties along the z axis, the total electric field is also z-polarized and equal to with in (r), it results from (2) that its z-component is given by For each position of the source, E (m) sc,z (r) is measured for r ∈ M (m) , i.e. in a set of points outside D denoted as the measurement domain for the mth view.In this way, ( 4) is considered as the data equation, since it links the scattered field E (m) sc,z (r), r ∈ M (m) with the contrast function χ (r), r ∈ D. However, the link is nonlinear due to the presence of E (m) tot,z inside the integral, which in turn depends on χ .By introducing the linear scattering operator equation ( 4), evaluated at r ∈ M (m) , can be rewritten as In order to solve the inverse problem and find the unknown values of χ (r), r ∈ D, the total field E (m) tot,z (r) , r ∈ D is computed based on (2).Such an equation, by formally exploiting the linear scattering operator A D defined as can also be written as Consequently, E (m) tot,z (r) , r ∈ D, from (8) can be substituted inside (6), obtaining where N (m) is a nonlinear operator mapping the contrast function on the scattered field for the mth view.Since scattered field measurements are available for all the M views, it is convenient to combine the collected data in M (m) , m = 1, . . ., M, for the solution of the inverse problem.To perform this operation, let us define a data vector as y ≜ E (1)  sc,z , E (2)  sc,z , . . ., E (M)   sc,z T (10) where T indicates transposition, and a multiview scattering operator as By adopting the above definitions, the inverse scattering problem at hand can be compactly formulated as where the contrast function χ ∈ X inside D is mapped on the scattered field y ∈ Y measured in M (1) ∪M (2) ∪ . . .∪M (M) by means of the nonlinear operator F : X → Y.The functional equation ( 12) should be inverted to retrieve χ , which is the goal of the algorithms presented in the following paragraphs.

Basic data-driven iterative regularization
From a purely mathematical point of view, the inverse scattering problem leads to the solution of the non-linear functional equation (12), which is well known to be ill-posed, where both the vectorial spaces X (of the unknown) and Y (of the data) are Banach spaces.In the framework of variational regularization, the basic iterative method for its solution concerns the minimization of the convex functional Λ : X → R defined as generally called cost or residual functional.If both X and Y are Hilbert spaces, the gradient descent iterative minimization, that is, the iteration is the most popular regularization algorithm, where λ n > 0 is a proper step length and ∇ χ denotes the gradient of Λ w.r.t.χ .By straightforward computation of the gradient by chain rule, the iteration scheme leads to the Landweber method for non-linear functional equations, i.e., where F ′ χ n : X → Y is the Frechét derivative of F at χ n , and F ′ * χ n : Y → X is its adjoint.Landweber method requires an early stop of the iterations, as any implicit regularization iterative method, to avoid large errors due to increasing data noise amplification.Implicit regularization means that regularization is achieved without any penalty term, by just performing a relatively small number of iterations, according to the semi-convergence behavior [38].
On the other hand, the minimization often incorporates a regularization penalty term R : X → R + , so that the convex functional becomes where R measures the 'non-regularity' of the solution.Generally, R is based on a-priori information about its properties (such as closeness to a fixed target profile, or its magnitude, smoothness, sparsity, or special statistical distributions of expected solutions, …).Among these regularization terms, recent proposals involve the exploitation of a previously acquired set of N D couples of scatterer-scattered data, that is, the pairs of known scattered fields ỹ(i) produced by known scatterer objects χ (i ) , called expert data [35].
Basically, these pairs allow building a data-driven system identification operator D : X → Y, which maps any χ (i ) with ỹ(i) by means of a prescribed approximation rule, such as multivariate polynomial interpolation or any (nonlinear) regression technique.The operator D is thus included in a penalty term R , leading to a minimization scheme which extends ( 14) by involving the expert data knowledge as where λ n > 0 and γ n > 0 are two weighting step lengths.Iteration ( 18) can be equivalently written as where the latter term is the gradient of R at χ n .Under the mild assumptions of local continuity of the Fréchet derivatives F ′ χ n and D ′ χ n , with the model operator F satisfying the classical tangent cone condition, the convergence of (19) to a solution χ † of (12) has been proven for noise-free data [35], w.r.t.any initial guess χ 0 in a proper neighborhood of χ † .The positive step lengths λ n and γ n , also called relaxations parameters, are related to the uniform boundedness of the Lipschitz constants of both F and D, that is, of the quantities F ′ χ n and D ′ χ n in the same neighborhood of χ † , and to the magnitude of the decreasing residual sequence ( 13) Λ (χ n ), n = 0, 1, 2, 3,… [35].
In virtue of ( 16), the proposed method is based on a linear modeling of the data expert operator D, so that D ′ χ n = D at any χ n ∈ X, and an enhanced local inversion of the forward operator F via polynomial approximation of the inverse.Hence where and implicitly incorporates the role of the step length λ n to ensure convergence.The rationale behind this idea is the same of [39], where the Tikhonov regularization operator More specifically, from the partial sum of the basic power series (1 − λt) i , for the inversion of any real value 0 < t < 2/λ, we consider the r-degree polynomial approximation of the inverse in the classical spectral resolution of Hermitian operators [40], where the constant guarantees the convergence for r → +∞.By simple application of the recursive Horner's scheme, the evaluation of the operatorial polynomial (21), leads to the following algorithm, for a fixed initial guess χ 0 ∈ X (where the no-information χ 0 = 0 can be generally used).
Algorithm I. Hilbert Spaces Mild Data-Driven Regularization.
FOR n: = 0 TO Nn − 1 χ If a single inner step is throughout done, that is N k ≡ 1, it is simple to see that algorithm I coincides with the basic Landweber method (19), where the operator polynomial Anyway, the number of inner iterations N k is usually small and identifies the degree of approximation of the generalized inverse of the least square linearization matrix F χ n ′ * F χ n ′ .Indeed, due to the ill-posedness of the Lippmann-Schwinger integral operator F, its linearization F ′ χ n at point χ n is ill-posed too, so that using a too accurate approximation of the generalized inverse of F χ n ′ * F χ n ′ gives rise to noise amplification and instability.In other words, the degree N k − 1 of the polynomial Ψ N k −1 acts as a regularization parameter for the solution, w.r.t.h n ∈ X, of any linearized outer equation It is interesting to notice that the outer correction step χ . after the inner loop, involves only the data-driven operator D. The number N n of outer iterations can be controlled by the discrepancy principle, i.e., the a-posteriori rule that stops the iteration at the first index such that where η = ∥y − y exact ∥ is (an estimate of) the amount of noise on the data and τ > 1 is a fixed relaxation value to be empirically chosen.Within this stopping rule, the basic scheme ( 19) is proven to be a regularization method (see [35], theorem 2.6), that is, (19) converges to a solution χ † of the equation F (χ ) = y exact as the noise vanishes, i.e. as η → 0. We can reason that the same property holds for algorithm I, since it holds for the Tikhonov regularization [39] without outer correction step.The numerical complexity of algorithm I is basically the same as the Landweber one-step iteration (15), since the expensive evaluation of the scattering operator F (χ n ) and its derivative F ′ χ n is done only at each outer new iteration as well as for any step of (15).
A proper choice of the initial guess χ 0 ∈ X may allow enhancing the reconstruction accuracy and reducing the number of outer iterations N n , if close enough to χ † [36,37,42].Anyway, the null initial guess χ 0 = 0 (i.e.no a-priori information) can be successfully used in the developed approach, as shown in the following.Indeed, for N k = 1 it leads to the first iteration χ 1 = (λ 0 F ′ * 0 + γ 0 D ′ * ) y, which corresponds to the action of the two adjoint operators (which can be thought of as the simplest approximation of each corresponding inverse) on the data y.
In classical deblurring problems based on forward integral operators, this χ 1 corresponds to the so-called reblurred data.

Data-driven iterative regularization in Banach spaces
Solving the functional equation (12) in Hilbert spaces generally results in over-smoothed solutions, as well as bad reconstructions of edges and sparse patterns, such as small isolated scatterers, cracks, or inclusions [40].To overcome such issues, we propose to iteratively minimize the residual functional in non-standard Banach spaces.
We start recalling that in any non-Hilbertian Banach space B, the (sub)-gradient of the squared-norm functional is a nonlinear operator, called (normalized) duality map, which plays a key role in Banah spaces minimization procedures.Specifically, for a uniformly smooth Banach space B, the duality map J B : B → B * is the nonlinear single-valued operator such that [40]).In our setting, since X, Y and their corresponding duals X * , Y * are not necessarily isomorphic, then the adjoint operator must be considered only in its original form F ′ * χ n : Y * → X * , so that the algorithm requires using the duality maps J X : X → X * , J Y : Y → Y * , and J X * : X * → X * * ∼ = X, for a reflexive space X, as follows [43].
Algorithm II.Banach Spaces Data-Driven Regularization.
Any inner minimization step is computed into X * , while any outer data-driven correction step into X.For this reason, this algorithm belongs to the class of the so-called dual methods [43].Anyway, as well as for the basic algorithm I, the evaluations of F (χ n ) and F ′ χ n are done only at each new outer iteration.
Duality maps in the iterative regularization in Banach spaces are strictly required for formal consistency, as introduced in [40].Anyway, by recalling that J Y (v) = ∂ 1  2 ∥v∥ 2 Y , by chaining rule, it results that which shows the role of the duality map J y , since it allows the minimization of the outer nth step functional Λ n : On the contrary, the role of the duality maps J X and J X * is not evident, because they both do not appear in the latter subdifferential.Basically, J X and J X * are related to the 'geometry' induced by the norms of the Banach space X and its dual X * .More specifically, in [44] it is proven that the inner iteration can be exactly read as a proximal step with respect to the Bregman distance ∆ X : X × X → R + of the Banach space X, defined as for any χ a , χ b ∈ X, with ⟨ χ , χ ⟩ = χ (χ ) denoting the pairing notation, for any χ ∈ X * and χ ∈ X.In this respect, each single inner iteration consists of solving the minimization problem The inner iteration corresponds to the computation of a point which decreases ⟨∂Λ n (χ (k) n+1 ), χ ⟩ (i.e. which locally minimizes Λ n ) and simultaneously is close (i.e.proximal) to the previous iteration χ (k) n+1 w.r.t. the Bregman distance ∆ X .Hence the step size λ n can be thought of as the weight which balances between the two terms.This gives a simple heuristic meaning to the role of the duality maps of X and X * .Explicit expressions for all the duality maps in our applicative framework will be given in section 2.4.

The variable-exponent Lebesgue spaces
The minimization of ( 16) depends on the functional spaces X and Y, as expected, since their norms induce different geometries, that is, different level sets and gradient directions.As evidence of these dependencies, both the algorithms of the previous section require the application of the specific duality maps of X, Y and of their duals.
In the last twenty years, the L p Lebesgue spaces of p-integrable functions defined in a measure space D, which are characterized by the L p -norm ∥u∥ L p = (∫ D |u (r)| p dr) 1/p , for 1 ⩽ p < +∞, have been widely studied and applied to inverse problems in variational formulation [40].L p spaces, with 1 < p < 2, indeed allow reducing over-smoothing and enhancing the localization of small scatterers.
More recently, variable exponent Lebesgue spaces L p(•) have been considered for inverse scattering problems [36,37].In L p(•) spaces, the power p (•) is not constant, but rather it is a function, that is, it may vary depending on the point of D. This flexibility allows using different exponents in different regions of the investigation domain, to promote sparsity in the background, with values of p close to 1, as well as to enhance the reconstruction of smooth variations of the contrast function, with values of p close to 2. The computation of the norm in L p(•) spaces requires the solution of a 1D optimization problem, namely the Luxemburg norm [45], which is defined as follows where ρ p(•) is the so-called modulus, Hence, the modulus ρ p(•) is the natural extension of the p-power of the conventional fixedexponent L p space, since if p k = p is constant then The duality map J L p(•) in variable-exponent Lebesgue spaces has been introduced in [36] and analytically proven in [44].Its expression is the following where sign (v) = v/ |v| when v ̸ = 0, and zero otherwise.
In our proposal, a variable exponent Lebesgue space L p(•) is considered as space X of the contrast functions, while a conventional (i.e.fixed exponent) Lebesgue space L pav is chosen as space Y of the multiview observations.Here p av denotes the average of the exponent function p (r) of L p (•) .For L pav , we recall that the classical duality map is [40] It is worth noting that, as expected for consistency, (29) reduces to (30) if p (r) = p av .The exponent function p (r), with 1 < p (r) ⩽ 2, is the key parameter of the minimization algorithm.Following [36], it is modified at each iteration, to achieve an adaptive strategy during the iterations.Specifically, fixed an a-priori range [p min , p max ] ⊆ (1, 2], the exponent function p (n+1) (r) at iteration n + 1 is computed as Recalling that in the background (that is, the regions without scatterers) the contrast function values vanish to zero, this choice gives values of p close to p min ≈ 1 in the background (to promote sparsity therein), whilst values close to p max ≈ 2 in the regions inside the strongest scatterers (to obtain a smooth reconstruction of the contrast function therein).This behavior is enhanced at each iteration, since the last scattered contrast function χ n is used as new weighting distribution in (31), as shown in the numerical examples of section 3.
Regarding the space Y, the exponent p av of L pav is then updated at each new outer iteration n, too, by evaluation the average of p (n+1) (r) on D. This ensures a uniformly magnitude quantification of the measured scattered field w.r.t. the contrast function, as generally happens when conventional fixed exponent Lebesgue space are used with the same exponent p. for both X and Y.

The data-driven operator
The expert data ( χ (i ) ,ỹ (i ) ) ∈ X × Y, for i = 1, . . ., N D , represent the information base for the computation of the data-driven map D : X → Y.In general, when searching for a map D which links somehow couples of data, there are two key issues: the choice of its model framework and the choice of the metric which measures the misfit between any D( χ (i ) ) and ỹ(i) .Regarding the former, the simplest option is a linear model, so that the map can be represented by a matrix, possibly with low dimensions or low rank, to provide fast computation.Regarding the latter, the statistical basic and simplest choice is the Hilbertian squared distance.
Following [35], D is then chosen among all the linear bounded operators from X to Y, i.e. inside the vector space,s one which minimizes the square distance functional s : B (X, Y) → R + defined as In the finite-dimensional 2D case, where the domain X of the contrast functions is discretized into N X pixels and the multiview observations domain into N Y samplings, the space B (X, Y) can be identified with the space of the N Y × N X matrices.Hence, the argmin solution of ( 32) can be straightforwardly computed by imposing that s (D) = 0, that is, in matrix form, where C is the N X × N D matrix containing the contrast function expert-data C = χ (1) , χ (2) , . . ., χ (ND) and M is the N Y × N D matrix containing the multiview observations expert-data M = ỹ(1) ,ỹ (2) , . . ., ỹ(ND) .By simple linear algebra arguments, the lowest rank matrix which satisfies (33) is given by where the N D × N X matrix C † is the Moore-Penrose generalized inverse of C. If the matrix dimensions are not large, C † can be computed by direct solvers based on the SVD decomposition of C. Otherwise, iterative approximations can be used, just involving a small number of singular values.

Numerical results
The method has been validated through numerical simulations involving the detection of defects inside a host target.Indeed, in this scenario, the proposed mild data-driven approach can be effectively exploited to include a-priori knowledge about the host within which the unknown inclusion may be present.
In particular, figure 1 shows the geometry of the adopted configuration.The investigation domain is square of side L = 2λ 0 (λ 0 being the free space wavelength) and discretized into N X = 1600 square subdomains in the numerical implementation of the inversion procedure.The host structure is a cylinder with a square cross-section of side L host = 1.25λ 0 centered at the origin and with dielectric permittivity ε host = 2ε 0 .Inside the host, a void (i.e. with dielectric permittivity ε i = ε 0 ) square cylindrical inclusion of side L i = 0.25λ 0 is positioned at (0.2, 0.2) λ 0 .The adopted measurement setup consists of M = 36 antennas with operating frequency f = 3 GHz modeled as line-current sources located at points r l equally spaced on a circumference of diameter d = 5λ.Antennas work as a multiview and multistatic system, that is, one antenna transmits at a time and the remaining M − 1 positions are occupied by ideal measurement points.
The simulated incident and total field data have been obtained through a custom computational code based on the method of moments [46] (using a finer mesh to avoid inverse crimes).Moreover, in order to simulate a more realistic configuration, an additive white Gaussian noise (AWGN) is added to the numerically computed data for all the configurations presented in this paper.More in detail, the total electric field data have been corrupted with a complex AWGN CN (0, σ) with zero mean value and variance σ 2 corresponding to a signal-to-noise ratio SNR defined as [47]: with P = m=1,..,M l=1,...,M;l̸ =m |E In particular, a value SNR = 25 dB, corresponding to a normalized standard deviation σ N = σ/ P/M (M − 1) = 5%, has been considered when not otherwise specified.
The data-driven operator D has been created using a dataset of N D = 100 expert data χ (i ) , F χ (i ) , i = 1, . . ., N D , in order to enforce the a-priori knowledge of the host object.In particular, the distribution of contrast function χ (i ) is obtained by considering a squaresection cylinder of side L = 1.25λ and random permittivity ε host in the interval [1.5, 2.5] ε 0 (to account for some uncertainty, as usual in practical applications), whereas F χ (i ) is obtained by numerically simulating the corresponding scattered-field data.
The parameters of the inversion method are: p min = 1.4,p max = 2.0, N k = 5, and N max n = 50.The results are evaluated by considering the optimal solution having the lowest value of the normalized root mean square error in the whole investigation domain (e NRMSE ), defined as where χ * (r) and χ (r) are the reconstructed and actual contrast functions for r ∈ D, respectively.In all cases, the inversion procedure is initialized with an empty investigation domain.
The data-driven weight has been set equal to whereas, following [48], the step length has been heuristically set to In order to study the performance of the method, the following relative errors have also been considered: where R is the region over which the error is computed and N R is the number of elements in that region.In particular, three different regions are considered: R hi , which is the region composed of the host and inclusion; R incl , which is the inclusion region, and R tot , which is the whole investigation domain D.

Effect of the data-driven weight, γ 0
First, the performance of the method against the data-driven term weighing step length, γ n , has been studied.In particular, the term γ 0 , has been initially varied in the range γ 0 ∈ [0.01, 5] × 10 −6 .Figure 2 shows the behavior of the root mean square error, e NRMSE , versus  γ 0 .In particular, figure 2(a) reports a coarse discretization of the search interval, from which it can be seen that the reconstruction error bottoms out around γ 0 = 10 −6 and then rises again.
To further assess the behavior of the error versus the data-driven weighting factor, figure 2(b) reports a zoom in the range γ 0 ∈ [0.3, 2] × 10 −6 .In this case, too, the optimal value of the parameter γ 0 results to be equal to 10 −6 .Figure 3 shows the actual distribution of the relative dielectric permittivity, whereas in figure 4 the reconstructed distribution of the dielectric permittivity corresponding to such an optimal value is provided.As can be noticed, the host is correctly reconstructed, both in terms of shape and of value of the dielectric permittivity.Moreover, the inclusion is correctly identified.
In figure 5, the behavior of the functional Λ (χ ) and R (χ ) along with their ratio Λ (χ ) /R (χ ) are shown in logarithmic scale for γ 0 = {3, 1, 0.3} × 10 −6 , respectively.Moreover, the normalized root mean square errors in the whole investigation domain, e NRMSE , versus the number of outer interations are also reported.
In figures 6 and 7 the corresponding reconstructed relative dielectric permittivity distributions for γ 0 = {0.3, 1, 2} × 10 −6 are also depicted.As can be observed, by increasing the data-driven term weight, γ 0 , the ratio Λ (χ ) /R (χ ) tends to increase in first iterations, testifying a greater influence of the data-driven approach on iterations.For low values of γ 0 , the influence of the data-driven term is too low, as can be seen for γ 0 = 0.3 × 10 −6 , and the residual and NRMSEs converge slower (figures 5(a) and (c)) than for the optimal case with γ 0 = 10 −6 (figures 5(d) and (f)) and a worse reconstruction is achieved where the host region edge is less visible (figure 6).Conversely, for too large values of γ 0 , the data-driven term forces the solution in the first iterations (figure 7) leading to a worse solution and higher NRMSE value results (figure 5(i)).

Variation of noise level on data
The method has been assessed in the presence of different levels of noise on the electric-field data using the same configuration as in section 3.1.Specifically, the signal-to-noise ratio has been set equal to SNR ∈ {5, 10, 15, 25} dB (corresponding a normalized standard deviation  σ N ∈ {56%, 32%, 18%, 5%}).The parameters of the inversion procedure are the same as in section 3.1, with the data-driven weight set to γ 0 = 10 −6 .Figure 8 shows the reconstructed distributions of the relative dielectric permittivity for SNR = {5, 10, 15} dB, while table 1 reports the reconstructions errors for all the considered cases.According to (36) and ( 39), e NRMSE denotes the relative L 2 error in the whole investigation domain, while e R incl and e R hi the relative reconstruction errors in the inclusion and in the host with inclusion regions, respectively.As expected, all the errors increase when SNR becomes lower (and thus, the standard deviation of the noise is higher).In particular, also considering the images in figure 8, it can be observed that the reconstruction of the host medium remains acceptable in all the considered cases.However, for very high levels of noise, e.g. when SNR = 5 dB (σ N = 56%), the retrieved inclusion starts to be significantly underestimated (although it is still barely visible in the considered case).

Variation of host dimension, L host
The behavior of the method as the host size changes has then been studied and the data-driven method has been compared with a Newton scheme in variable-exponent spaces without datadriven term [43].First, the method has been evaluated by considering a smaller host size, i.e., L = 0.75λ 0 .In this case, the data-driven matrix D is constructed by considering a host of side L host = 0.75λ 0 and random permittivity ε host in the interval [1.5, 2.5] ε 0 .The inclusion has been located at (0.1, 0.1) λ 0 (so that in all cases it is completely contained inside the host).The weight of the data-driven has been set equal to γ 0 = 10 −6 .All other parameters of the method have not been changed from the case presented previously.Secondly, the performance of the method for objects of intermediate size has been evaluated, namely L = λ 0 .For this purpose, the method was run with the same parameters, and in addition, a dataset obtained with a host of side L = λ 0 and random permittivity ε host in the interval [1.5, 2.5] ε 0 has been used.
Figures 9(a)-(c) and (d)-(i) show the true and reconstructed distributions of the relative dielectric permittivity for the considered dimensions of the inclusion (together with the reference case considered before) obtained by using both the data-driven approach and the traditional method.As can be seen, in all cases, the traditional method provides worse results.In the first two cases, even if the inclusion is correctly detected (especially for L = λ 0 ), significant artefacts appear in the host, leading to a difficult identification of the inclusion region.For L = 1.25λ 0 , the reconstruction does not even allow a good detection of the host with the inclusion.Conversely, the data-driven method allows the correct detection and characterization of the inclusion, and the host is reconstructed more accurately, especially in its shape, in all cases.In addition, figure 10 shows the NMRSE error in the whole investigation domain, e NRMSE , as well as the relative reconstruction errors in the host with inclusion, e R hi , and inclusion only, e R incl , regions in the three cases L = [0, 75, 1, 1.25] λ 0 .As can be observed both by comparing the reconstructions (figure 9) and the NMRSE error in the whole investigation     10), the data-driven approach allows for a reconstruction improvement in all cases.Moreover, by looking at the relative errors in the inclusion and the host with inclusion region (figure 10), we can observe that the mild data-driven approach enables a better identification of the inclusion and reconstruction of the whole host region with dimension L < λ 0 .For L = λ 0 , e R incl is similar in the two methods, even though more artefacts are visible in the host using the traditional method, leading to a higher e R hi .Finally, for L > λ 0 , the relative errors of the mild data-driven approach become significantly smaller: in fact, the traditional method reconstruction is no longer able to identify the inclusion.

Extended expert dataset
Subsequently, the behavior of the method has been studied considering a dataset with hosts of variable size, in order to account for an increased uncertainty about the host object.For this purpose, the matrix D has been created by considering square-section cylinders of random side in the interval L ∈ [0.75, 1.25] λ 0 and random permittivity ε host in the interval [1.5, 2.5] ε 0 .All the method parameters are the same as in the previous cases.Then, an object of dimension L = λ 0 with the inclusion located at (0.1, 0.1) λ 0 has been simulated and reconstructed.
Figure 11 shows the reconstructed relative dielectric permittivity while the actual relative dielectric permittivity is the same as figure 9(b).Although there are some artifacts in the background, the obtained reconstruction is quite accurate, since both host and inclusion are detected.Moreover, the method has been compared with the Newton approach in variable-exponent spaces without data-driven term: table 2 shows the whole domain NRMSE (e NRMSE ), as well  2, it can be seen that the data-driven approach allows a more accurate reconstruction, especially with regard to the host, as also evidenced by the error values.

Effect of the initial guess, χ 0
In this Section, the effect of the initial guess has been investigated.To this aim, the first N ig = 20 samples extracted from the training dataset have been used as the initial dielectric profile for the inversion procedure, i.e. χ 0 = χ (i ) , i = 1, . . ., N ig .It is worth remarking that some of these starting points significantly differ from the actual host target (i.e. an incomplete knowledge of the host is assumed) All the other parameters are the same as in the previous Section.In table 3, the mean values and the standard deviations of the relative reconstruction errors in the case of inexact-Newton method without data-driven and with mild-data-driven approach are reported.As can be observed, the errors are on average higher than those in table 2. This fact is expected since in some cases the considered initial guesses are significantly different from the actual ones, and thus may produce a worse reconstruction.However, when the initial profile is close enough to the actual one, the errors are comparable to or lower than those in table 2, as can be seen by considering the values of the standard deviations.It is however worth noting that the mild data-driven approach always allows obtaining significantly lower values of the mean errors, as well as lower standard deviations.Consequently, such an approach is less affected by the choice of the initial guess.

Behavior for different host shapes
The behavior of the method when different shapes of the host structure are taken into account has also been assessed.To this end, the same dataset of section 3.4 has been used for the construction of the data-driven map D, whereas the method has been tested by considering hosts with rectangular and circular cross sections.It is worth noting that, since such shapes are not included in the training set, a significant reduction of the reconstruction performance is expected.Considering the different shape of the host structure, a lower data-driven weight has been taken into account i.e., γ 0 = 5 × 10 −8 .All the other parameters are the same as in section 3.4.At first, the case of a host with rectangular cross section with dimensions a = λ 0 and b = 1.4λ 0 has been considered.In figure 12, the actual distribution of the relative dielectric permittivity is reported, and in figure 13 the distribution of the reconstructed relative dielectric permittivity is shown.As can be seen, the method is able to obtain a reliable reconstruction of  the host shape, even if it is not included in the training set, despite the presence of some artifacts inside the structure.Additionally, the inclusion, although slightly overestimated, is correctly localized.The relative reconstruction errors for this configuration are e NRMSE = 0.393, e R incl = 0.589, e hi = 0.157.The corresponding errors obtained by the bare inexact-Newton method without data-driven approach are: e NRMSE = 0.410, e R incl = 0.610, e R hi = 0.158, showing that, also in this case, the proposed approach allows obtaining better results.
Then, the behavior of the method in the presence of a host with circular cross section of radius r = 0.7λ 0 has been studied.The actual and reconstructed distributions of the relative dielectric permittivity are shown in figures 14 and 15, respectively.
In this case, too, the approach is able to correctly retrieve the external shape of the host, although artifacts induced by the training set are present.Nevertheless, the presence of the inclusion is still correctly identified.The relative reconstruction errors in this case  are: e NRMSE = 0.393, e R incl = 0.579, e R hi = 0.165.In this case, too, the application of the bare inexact-Newton method without data-driven approach would result in a worse dielectric reconstruction, for which the corresponding errors are: e NRMSE = 0.419, e R incl = 0.657, e R hi = 0.183.

Conclusion
A mild data-driven inversion technique for microwave imaging applications has been considered in this paper.The approach relies upon a Newton-type scheme developed in the framework of the variable-exponent Lebesgue spaces, which has been extended by including an additional regularization term built using a set of expert data.In this way, it is possible to include the available a-priori information about the inspected scenario in an effective way through a set of proper 'examples'.This additional information helps in mitigating the strong dependence of the classical approach upon initial and reference solutions, which may converge to false reconstructions unless a proper initial guess is selected (especially when considering strong scatterers and complex targets).The effectiveness of the approach has been evaluated by considering the problem of identifying inclusions inside a known host (eventually with some degree of uncertainty), showing very good reconstruction capabilities.Further work will be aimed at extending the approach in order to deal with more accurate data-driven terms (e.g., not based on linear approximations) and with experimental validation.

Figure 1 .
Figure 1.Geometry of the considered scenario.

Figure 3 .
Figure 3. Actual distribution of the relative dielectric permittivity, εr, in the investigation domain.

Figure 4 .
Figure 4. Reconstructed distribution of the relative dielectric permittivity, εr, in the investigation domain along with the true profile of host and inclusion (white line).

Figure 6 .
Figure 6.Reconstructed distribution of the relative dielectric permittivity, εr, in the investigation domain along with the true profile of the host and the inclusion (white line) for γ 0 = 0.3 × 10 −6 .

Figure 7 .
Figure 7. Reconstructed distribution of the relative dielectric permittivity, εr, in the investigation domain along with the true profile of the host and the inclusion (white line) for γ 0 = 2 × 10 −6 .

Figure 8 .
Figure 8. Reconstructed relative dielectric permittivity, εr, in the investigation domain along with the true profiles of the host and the inclusion (white line) for (a) SNR = 5 dB, (b) SNR = 10 dB and (c) SNR = 15 dB.

Figure 9 .
Figure 9. Actual relative dielectric permittivity, εr, in the investigation domain for host side length (a) L = 0.75λ 0 , (b) L = λ 0 and (c) L = 1.25λ 0 .Reconstructed relative dielectric permittivity, εr, in the investigation domain along with the true profiles of the host and the inclusion (white line) with mild-data driven method for host side length (d) L = 0.75λ 0 , (e) L = λ 0 and (f) L = 1.25λ 0 .Reconstructed relative dielectric permittivity, εr, in the investigation domain along with the host and the inclusion shape (white line) with the bare inexact-Newton method without data-driven approach for host side length (g) L = 0.75λ 0 , (h) L = λ 0 and (i) L = 1.25λ 0 .

Figure 10 .
Figure 10.Relative reconstruction errors versus the host side length, L, with mild datadriven (MDD) and no data-driven inexact-Newton (IN) approaches.

Figure 11 .
Figure 11.Reconstructed relative dielectric permittivity, εr, in the investigation domain along with the host and the inclusion shape (white line) with the mild-data-driven approach for L = λ 0 .

Figure 12 .
Figure 12.Actual relative dielectric permittivity, εr, in the investigation domain for a rectangular host with a = λ 0 and b = 1.4λ 0 .

Figure 13 .
Figure 13.Reconstructed relative dielectric permittivity, εr, in the investigation domain along with the host and the inclusion shape (white line) with the mild-data-driven approach for rectangular host with a = λ 0 and b = 1.4λ 0 .

Figure 14 .
Figure 14.Actual relative dielectric permittivity, εr, in the investigation domain for a circular host with r = 0.7λ 0 .

Figure 15 .
Figure 15.Reconstructed relative dielectric permittivity, εr, in the investigation domain along with the host and the inclusion shape (white line) with the mild-data-driven approach for a circular host with r = 0.7λ 0 .

Table 1 .
Reconstruction errors for different levels of noise on the data.

Table 2 .
Relative reconstruction errors.Comparison with standard variable-exponent technique without data-driven term.

Table 3 .
Relative reconstruction errors (mean value and standard deviation) with initial guess from dataset.Comparison with standard variable-exponent technique without data-driven term.errors in the host and inclusion (e R hi ), and in the inclusion alone (e incl ), obtained in the two approaches.By comparing figure11with (figure9(d)) and looking at the error values shown in table