A Direct Sampling-Based Deep Learning Approach for Inverse Medium Scattering Problems

In this work, we focus on the inverse medium scattering problem (IMSP), which aims to recover unknown scatterers based on measured scattered data. Motivated by the efficient direct sampling method (DSM) introduced in [23], we propose a novel direct sampling-based deep learning approach (DSM-DL)for reconstructing inhomogeneous scatterers. In particular, we use the U-Net neural network to learn the relation between the index functions and the true contrasts. Our proposed DSM-DL is computationally efficient, robust to noise, easy to implement, and able to naturally incorporate multiple measured data to achieve high-quality reconstructions. Some representative tests are carried out with varying numbers of incident waves and different noise levels to evaluate the performance of the proposed method. The results demonstrate the promising benefits of combining deep learning techniques with the DSM for IMSP.


Introduction
Inverse medium scattering problems (IMSP) aim to recover the geometry and physical properties of unknown scatterers in the domain of interest from the measured scattered fields.This inverse problem has wide practical applications such as biomedical imaging, geophysical exploration, remote sensing, nondestructive evaluation, and so on (see, e.g., [5,30,52]).Suppose that a bounded domain of interest Ω is occupied by some inhomogeneous medium scatterers in the homogeneous background space R N (N = 2 or 3).With the incident field u i , the total field u = u i + u s satisfies the following Helmholtz equation [17] ∆u + k 2 ε r (x)u(x) = 0 in R N , (1) where k is the wavenumber, and the scattered field can be expressed asymptotically as [17]: where x = x x 2 ∈ S N −1 and u ∞ is called the far-field pattern of u s .The induced current density I(x) is defined as I(x) = η(x)u(x) with the contrast η(x) = ε r (x) − 1, where ε r (x) = 1 in the homogeneous medium.Then the scattered field u s on the measurement surface S satisfies [17] where G(x, y) is the free-space Green's function for the scattering problem which is given by where the function H (1) 0 refers to the Hankel function of the first kind with order 0. The IMSP is then to recover ε r from noisy measurements of the scattered field u s or the far-field pattern u ∞ , corresponding to some incident fields.
The IMSP is known to be severely ill-posed and highly non-linear, especially very challenging in its numerical solutions when the measurement data is available only at a limited aperture.Various numerical algorithms have been developed for solving inverse medium scattering problems, such as iterative methods like recursive linearization methods [4], contrast source-type inversion methods [47,48], iteratively regularized Gauss-Newton method [31], least-squares methods [25], and subspace optimization methods [9], which can obtain satisfactory reconstructions while they are generally very time-consuming.Non-iterative methods such as reverse time migrations [8], singular sources methods [40], factorization methods [29,41] and sampling-type methods [6,12,20,23,39] are fast while their reconstructions may not be accurate.We refer to [7,10,17] for more extensive reviews on both theoretical analyses and numerical algorithms on inverse scattering problems.
In recent years, many deep learning methods have already been applied to solve inverse scattering problems.Using conventional convolutional neural networks to directly approximate the relationship between the near scattered fields and the true contrasts was shown to be only feasible for simple scatterers [50], hence, incorporating deep learning methods with conventional numerical methods for inverse scattering problems has become a popular direction.By exploiting the inherent low-rank property of the scattering problems, a sophisticated network architecture called SwitchNet was developed in [28] for both forward and inverse scattering problems.The backpropagation, and dominant current CNN schemes were proposed and compared in [50] for solving inverse medium scattering problems.A Two-step enhanced deep learning approach invented in [51] consists of an initial guess step and a refinement step by introducing two convolutional neural networks.An iterative reconstruction algorithm called Learned Combined Algorithm was proposed in [35] to reconstruct the inhomogeneous media from the far-field data by repeatedly applying the Landweber approach, the iteratively regularized Gauss-Newton method, and the deep neural projector.In [18], by employing a few parameters to represent the star-shaped scatterers, the authors designed a fully connected neural network for the inverse obstacle problem of recovering an impenetrable scatterer from measured scattered data with only a single incident field.For more deep learning-based approaches to inverse scattering problems, we refer to the review article [11] and references therein for some recent developments.On the other hand, some deep learning-based approaches have also been developed for solving general inverse problems.A partially learned approach was proposed in [1], where the "gradient" component in an iterative scheme is learned using a convolutional network.The NETT [33] combines deep learning with a Tikhonov regularization scheme, where a regularizer is defined and learned by a deep neural network.Several sophisticated neural operators, e.g., [36,37,46], have been mainly designed to learn forward operators, and their extensions for solving inverse problems by using reversed input-output, as well as applying Tikhonov regularization to a trained forward model were studied in a recent survey [45].A novel and influential framework called Physics-informed neural network (PINN) was invented in [42] for solving forward and inverse problems involving nonlinear PDEs, where the solutions are approximated by neural networks and the loss function is to enforce the neural networks to satisfy the strong PDE equations.We refer to the surveys [3,38] for more deep learning concepts on various inverse problems.
Among the sampling-type methods, the direct sampling method (DSM) proposed in [23] can directly reconstruct the shapes and locations of unknown scatterers using one or a few incident waves, which is computationally efficient, easy to implement, and highly stable to large noise.The method is also independent of a priori knowledge of the geometry and physical properties of the unknown scatterers.The main idea of the DSM is to design an index function that takes large values for points inside the inhomogeneous scatterers, thus providing a shape estimate.The DSMs have also been developed for some other important inverse problems, such as inverse scattering problems using far-field data [34], inverse electromagnetic scattering [14,24], electrical impedance tomography [16], inverse elastic scattering problems [26], diffusive optical tomography [15], inversion of the radon transform [13].However, the DSMs have some limitations.Firstly, like most non-iterative methods, they can not provide very accurate geometric reconstructions, especially for complicated scatterers.Secondly, they are mainly focused on cases of one or two incident waves, which may limit their applications to more complicated cases.Thirdly, the DSMs can only estimate the locations and shapes of the scatterers, while it is not able to estimate physical properties (e.g., the permittivity and conductivity), which is also partly due to very limited measurement data used by DSMs.To address these limitations, we combine the deep learning method with the DSM proposed in [23], in which the network is used to approximate the relation between the index functions and the true contrasts.The numerical results show that this new approach can fully make use of the measurement data from multiple incident fields so that it can robustly get better reconstruction and even recover the physical properties of complicated scatterers.The trained networks also have some generosity to recover some scatterers with geometries and physical properties that are quite different from those used for the training data.In addition, the proposed DSM-DL also inherits the advantages of the classical DSM, i.e., it is computationally efficient, robust to noise, and it can provide reasonable reconstruction with one or two incident waves.
Combining deep learning and sampling-type methods for solving inverse scattering problems was also studied recently in [32] by using the orthogonality sampling method.However, it only considered one incident wave and is mainly focused on recovering the shapes and locations of the scatterers.In addition, it requires both the Cauchy data and the scattered field data.As a comparison, our proposed method can make use of a general number of incident waves, with only scattered field data or far-field pattern data available, to recover more complicated scatterers.As we shall see from the numerical results, it can also recover the physical properties of the scatterers.The deep learning methods were also combined with sampling methods to tackle electrical impedance tomography [19] and diffusive optical tomography [27], in which the Cauchy difference functions are the inputs of the neural network and computation of several forward problems is required, while in our method, the inputs of the neural network are the index functions that can be computed with simple scalar product.We shall point out that the general framework of the proposed DSM-DL can also be directly applied to many other important inverse problems where the DSMs have been developed.
The rest of this paper is organized as follows.In Section 2, we review the original DSM from [23] for inverse medium scattering problems and present some analysis of properties of index functions.The development of the proposed DSM-DL and the architecture of the neural network, as well as several strategies for data augmentations are introduced in Section 3. Several representative numerical experiments are carried out to validate the advantages of the DSM-DL in Section 4, and some concluding remarks are given in Section 5.

Direct sampling method
The direct sampling method [23] is a very simple technique for estimating the shapes and locations of the unknown medium scatterers when the measurement data are available from limited incident fields without any iteration.Specifically, denote B(0, R) as the open ball with radius R in R 2 or R 3 and let S = ∂B(0, R) be the circular curve in R 2 or spherical surface in R 3 , and Ω be the bounded domain of interest that is occupied by some inhomogeneous medium inclusions, the direct sampling method employs an index function of the form: for any point x ∈ Ω or its normalized form If Φ(x) attains an extreme value at a point x, the point lies most likely within the support of the scatterers, whereas if Φ(x) takes small values, the point x likely lies outside the support.The detailed derivation of the index function can be found in [23].An example of the true contrast and the computed index function is presented in Fig. 1.The computation of index function Φ only involves evaluating the scalar product of the measured data with the Green's function; thus the computation is very cheap.Furthermore, the highly noisy data is expected to be represented by high Fourier modes, while the Green's function G(x, y) is a very smooth function on the circular curve/spherical surface and is dominated by the low-frequency Fourier modes.Thus, the noises are roughly orthogonal to the Green's function G(x, y) and have little contribution to the index function.Consequently, the DSM is expected to be very robust to highly noisy data.
We are now going to present some properties of the index functions, which can be regarded as some motivation of our subsequent strategy to combine the deep learning strategy with the DSM.Theorem 1.Let S = ∂B(0, R) be the circular curve in R 2 or the spherical surface in R 3 , and Ω be an open domain that is contained in the interior of S. Define the operator T : Then since G(x, y) is analytic for x = y, we have thus by continuity of the function T (u) in B(0, R), we have We recall that the above single layer potential is an injective operator if S is a circular curve or a spherical surface [2], we then have The above theorem shows that the output T (u) does not lose any information of u.Next, we study the relation between the index functions associated with the far-field and near-field patterns.
For the reconstruction using the far-field pattern, the DSM with the index function ( 6) for the near-field pattern can be generalized to a DSM with the following index function for the far-field pattern [34]: or its normalized form Φ∞ (x where u ∞ is the far-field data and By the asymptotic behavior of the scattered field (2) and the following asymptotic behavior of the Green's function: we can derive that, when S = ∂B(0, R), the index function will be where the constant C N only depends on the dimension N .Thus, the index function for the far-field data is the limit of the index function for the scattered field as the radius R tends to infinity.However, despite the effectiveness and robustness of the DSM, we note that it also has the potential to be improved and generalized in a few aspects.Firstly, it focuses mainly on the reconstruction with one or two incident waves which may limit its applications to more complicated cases.Studying efficient strategies to make use of multiple data fully is important since data from more incident waves usually contain more inherent information on the distribution of the scatterers which can improve the robustness of the reconstruction.Secondly, the direct sampling method can only roughly predict the shapes and locations of the scatterers and is unable to recover the physical properties which are essential in many applications.

Direct sampling-based deep learning approach (DSM-DL)
To obtain reconstruction with high accuracy and strong robustness, one can either design more effective algorithms or make use of more measurement data.The latter is usually necessary for scatterers with complicated shapes.Suppose there are N i incident waves, then we can compute N i index functions Φ p (x) defined in ( 5) with p = 1, 2, • • • , N i .Based on Theorem 1 it is reasonable to expect that more index functions contain more information about the scatterers.To fully utilize the multiple index functions from many incidences, we should better understand the relation between the index functions and the true contrasts.For fixed N i incident fields {u i p } Ni p=1 , we define an operator A as where W ⊂ L 2 (Ω) is the set of all possible ε r , i.e., ε r (x) ≥ 0 for x ∈ Ω.The above operator is well-defined and well-posed as it is simply a combination of the forward scattering problem and the convolution in (5).Now, we are ready to introduce a new inverse problem that we would like to solve: For fixed incident fields {u i p } Ni p=1 with corresponding index functions {Φ p } Ni p=1 computed by (5), we aim at recovering the true contrast ε r .In other words, our goal is to find an operator A inv such that However, it is extremely difficult to theoretically derive an explicit form that can systematically and effectively incorporate multiple index functions to approximate the true contrasts.Although some simple strategies such as taking the maximum, average, or product of all index functions can be applied, they may not always give satisfactory results.In this work, we employ a deep learning method to approximate the relationship between the index functions and the true contrasts which is considered as a black-box.Next, we introduce our new direct sampling-based deep learning approach (DSM-DL), for which the N i index functions Φ p (x) for p = 1, 2, • • • , N i defined in (5) are the inputs of the neural network while the outputs are the approximate contrast ε r , i.e., where Ni are the scattered fields corresponding to N i incident fields, G Θ denotes the neural network, DSM refers to the output of the classical DSM, and Θ denotes the free parameters to be learned /* I l refers to the index set in the l th batch, τq refers to learning rate in the q th epoch.*/ 7 end 8 end in the network.By introducing a metric Loss(x, y) to measure the distance between any two images x and y, we can iteratively update the free parameters Θ as follows: where τ is the learning rate and I denote the index set for the training data.We summarize the DSM-DL scheme in Algorithm 1.The explicit loss function will be given in equation (35) in the numerical experiments.Specifically, if we discretize the index functions and the true contrasts on N d × N d grid points in Ω for 2D problems, then the input to the neural network is a tensor with the dimension N i × N d × N d , while the output of the neural network is a tensor with the dimension 1 × N d × N d , i.e., the index functions are put into different input channels of the network.We choose the well-known U-Net [43] as the neural network in our numerical experiments, the detailed descriptions of the U-Net and the reasons for choosing it will be given in subsection 3.2.Overall, the DSM-DL has the architecture as presented in Fig. 2, which consists of a DSM and a neural network that takes the index functions as the inputs and outputs an approximation of the true contrast.
In the DSM-DL scheme, we do not use the normalized index function as it requires additional computation and does not provide better reconstructions based on our numerical experience.Instead, we scale all the inputs by multiplying a constant 2/C to improve the network stability in the numerical experiments, where Remark 1.Based on Theorem 1, it is also reasonable to use the phased index functions as the inputs of the neural network which we have observed can indeed yield some satisfactory results.However, our numerical experiments indicate that using index functions in absolute-valued form can often achieve better reconstructions.
Remark 2. The average index function Φ ave := 1 Ni Ni p=1 Φ p may provide better reconstruction estimates than the single incident field index function, and one might suggest using it as the input to the neural network.However, the average index function is likely to lose some of the information of the scattered fields as the average operation is not injective.And our numerical experience shows that using multi-channel input {Φ p } Ni p=1 can significantly outperform using Φ ave as input.

Rotation and symmetry properties of index functions
We are now going to present some nice properties of the index functions, including the rotation and symmetry properties, which can be used not only for data augmentations [44], i.e., increasing the amount of data from and the rotation operator Furthermore, for an incident direction d = (cos(θ d ), sin(θ d )) and a point x = (r cos(α), r sin(α)) ∈ R 2 , we define S d (x) as the symmetric point of x with respect to the incident direction d (which should be understood as a line across the origin), i.e., And the symmetry operator on Suppose that Φ(x; ε r , d) is the index function corresponding to the incident field e ikx•d and the inhomogeneous media ε r .We emphasize that for a fixed incident direction d, if ε r2 = R φ (ε r1 ) (or ε r2 = S q (ε r1 ) with q = d), we usually can not derive the identity Φ(x; R φ (ε r ), d) = R φ (Φ(x; ε r , d)) (or Φ(x; S q (ε r ), d) = S q (Φ(x; ε r , d))) for their corresponding index functions.Thus, unlike problems such as image denoising, we can not apply arbitrary rotation and symmetry operators on existing data to increase the amount of observation data.Instead, we have the following important results.
Theorem 2. Suppose that Φ(x; ε r , d) is the index function corresponding to the incident field e ikx•d and the inhomogeneous medium ε r .Then it holds that and for any φ ∈ [0, 2π), Proof.We only prove the identity (24), as the argument is the same for identity (23).By considering the rotation operator to the coordinate system, we note that applying a plane wave with direction d to the medium R φ (ε r ) is equivalent to applying a plane wave with direction R −φ (d) to the medium ε r , and we have the following relation: Thus by definition of R φ , we have

Data augmentations in DSM-DL
We now present our strategies for data augmentations based on Theorem 2. Let us first consider the case of N i = 1, i.e., only a single incident field is used.The identity (24) in Theorem 2 shows that if we know the index function Φ(x; ε r , d) for the medium ε r with incident direction d, we can obtain the index function Φ(x; S d (ε r ), d) for the medium S d (ε r ).Thus, we can apply symmetry operator S d on the medium ε r to increase the amount of observation data. For Thus, if we know the index functions {Φ(x; ε r , d i )|i = 1, 2, • • • , N i } for the medium ε r , we can obtain the index functions {Φ(x; R φj (ε r ), The rotation and symmetry properties of the index functions can not only be used for data augmentations but can also provide a new way to compute the approximation of the true contrasts.Note that the input of the neural network G Θ has the following form: and Thus Then by taking an average over the reconstruction with respect to different φ j , we have where Φ d R φ j (εr) can be computed from Φ d εr by equation (27).From our numerical experience, we have observed that the averaging approximation (31) can generally produce more accurate reconstructions of the true medium ε r than the scheme (29).

U-Net and advantages of the DSM-DL algorithm
The U-Net [43] is a popular convolutional neural network that was originally designed for biomedical image segmentation.The architecture of the U-Net that we shall use in our DSM-DL algorithm is shown in Fig. 3.We can see that the architecture of the U-Net consists of two paths, forming a U-shaped structure.The lefthand contracting path is designed to extract feature maps at different levels, whereas the right-hand expansive path reconstructs the image by deconvolutions of the corresponding features.The main compositions in the U-Net consist of convolutional layers, batch normalization, ReLU, max pooling, and skip connections.We now provide some more detailed descriptions of these components, from which we can also see the advantages and motivations for using U-Net as the network in our DSM-DL algorithm.
• The convolutional layers process the data only for its receptive field with convolution operations.Two important main features of the convolutional layers are weight sharing and local operation.Compared with fully connected layers, the convolutional layers have fewer free parameters and are more efficient in terms of memory and complexity.Furthermore, the convolutional layers take into account the spatial relationships between individual features, making them very appropriate for data with a gridded topology.Since the size of inputs (index functions) in the DSM-DL scheme is usually large, especially for the cases of multiple incidence fields and 3D problems, and the index functions are close to the true contrasts, which indicates that their relations are likely to be (mainly) local.Therefore, the convolutional layers would be a very appropriate choice for the DSM-DL scheme.
• The batch normalization proposed in [22] is a method used to accelerate and stabilize the training of the deep neural networks through normalization of the layers' inputs by re-scaling and re-centering.
The batch normalization transfers the layer's input X to Y as The mean E[X] and standard deviation Var[X] above are calculated for each dimension, while α and β are free parameters to be learned.
• The activation functions enable the neural networks to approximate nonlinear mappings.The ReLU activation function (rectified linear unit) is defined as the piecewise linear function Since the true contrasts are images that are usually piece-wise constants, and the relative permittivity of the homogeneous background is equal to 1, thus the ReLU is very suitable for the proposed DSM-DL.
• The max poolings reduce the spatial information while they can significantly increase the effective receptive field of the neural network, and the up-sampling convolutions can increase the resolution of the outputs.
• Skip connections are able to transfer features from the contracting path to the expansive path so that the lost spatial information during down-sampling can be recovered, which is very suited to the DSM-DL since the index functions and the true contrasts have similar shapes.In addition, the skip connections can also stabilize the training and convergence as it alleviates the vanishing gradient problem.Before we present our numerical results, we shall list main advantages of the proposed DSM-DL due to the DSM and the structure of the network that we use.
• The implementation is simple and rather straightforward.The evaluation of the index functions is very cheap and naturally parallel.The forward pass of the neural network is also highly parallel since the main compositions of the U-Net are CNNs.In our numerical experiments, each test is completed within 1 second, after the network is trained.
• The DSM-DL is highly robust to noise.This is mainly because the index functions (the inputs of the neural network) are robust to noise, i.e., the noise is smoothed in the DSM process.• The neural network is easy to train since the index functions and the true contrasts which are the input and output of the network have similar shapes and are defined in the same domain.In this sense, the forward pass of the neural network can be considered as a refinement process.
• Data from multiple incident fields can be fully incorporated.From the numerical results, we can see that our method can be applied to the case of an arbitrary number of incident fields, and using data from more incident fields can significantly improve the accuracy, robustness, and generalization capacity.
• As we shall see from the upcoming numerical tests for reconstructing Latin letters "D", "S" and "M", the number of the receivers and the radius of the measurement surface for the testing data are not necessary to be consistent with those for the training data.This is due to the convergence property (15) of the index functions.
• The DSM-DL is applicable and robust for the reconstruction with limited aperture data.Our numerical results indicate that the DSM-DL can still work satisfactorily if the scattered fields are only available on part of S.

Numerical results
In this section, we present numerical results using a circle dataset and a modified MNIST dataset for inverse medium scattering problems in 2D to illustrate the accuracy and robustness of the DSM-DL scheme in reconstructing some unknown scatterers from near-field scattered fields.The proposed scheme can be directly applied to 3D problems and far-field problems by using the index functions (11).

Numerical setup
In the numerical tests, we consider a square sampling region Ω = [−1, 1] 2 and discretize it into 64 × 64 pixels.We take N i sources and N r = 32 receivers that are equally distributed on a circle of radius 3 centered at the origin and will consider N i = 1, 2, 4, 8 and 16.The scattered fields are generated numerically using the method of moments.The noisy measured scattered field is then generated pointwise by the formula where δ refers to the relative noise level, ζ r (x) and ζ i (x) refer to the real and imaginary parts of the noise which follow the standard normal distribution.The wavelength is λ = 0.75 and k = 2π/λ.To use the data augmentation strategy that we discussed in section 3.1.2,we consider the following loss function to train the neural networks: for N i > 1, where Φ Rπ(εr) can be computed from Φ εr by ( 27), and the rotation operator R π should be replaced by the symmetry operator S d1 for N i = 1.We only use R π and S d1 for data augmentations in the numerical experiments, so that the amount of augmentation is the same for all N i , although we can increase more data for N i > 2. The TV refers to the total variation which is commonly used as the regularizer in image reconstruction, and SSIM [49] is the structural similarity function which is a common metric for measuring the perceptual distance between two images and was also added to the loss function in [21].α 1 , α 2 are the corresponding weights, and we set α 1 = 0.01, α 2 = 0.05 for the circle example, α 1 = 0.05, α 2 = 0.05 for the MNIST example.The loss function is computed averagely on the training data for each iteration, and the Adam optimizer is used to update the learnable parameters.5% Gaussian noise is imposed on the scattered field for the training data, while we test the trained network with different high noise levels.We choose the network which achieves the smallest validation error during training as the final model.In the training stage, a server with GeForce GTX 1080Ti and 256 GiB of system RAM is used.It takes about 0.5 hours for the training of each example, and it takes less than 1.0 seconds for reconstructing the scatterers using the trained networks.
To reconstruct the true medium ε r , we employ the following average form: To quantify the quality of the reconstruction, we compute the relative L2 error via

Circle dataset example
In the first example, we simulate the scatters by employing a dataset of circles.The number of circles in each image is randomly set between 1 and 3, the radius is taken from the uniform distribution U (0.15, 0.4), and the relative permittivity is taken from U (1.5, 2.0).The overlapping is allowed between circles.We use 3000 images as the training data, 200 images as the validation data, and 200 images as the testing data to compute the testing error.The batch size is taken as 6 and we use a total of 30 epochs to train the neural networks.As we use the loss function (35), we actually have 6000 training data and the batch size is 12.The learning rate starts at 0.001 and decreases by a factor of 0.5 every 3 epochs.The trained networks are then used to test the following tests.

Tests with testing data from the circle dataset
In Fig. 4, reconstructed images of four tests from the testing data with noise levels δ = 15% and δ = 40% are presented.From the numerical results, we can see that when the number of incidences is small, the trained networks can provide very satisfactory results if the circles are well separated, while it can only estimate the locations and the rough shape of scatterers if they are overlapped.This is reasonable as the interaction between scatterers is strong if they are very close to each other.The reconstructions can be further improved if we use more incident fields, and the trained networks with N i = 8 or 16 can obtain very high-quality results for all tests.The average relative error and SSIM for the testing data are presented in Table 1 and Fig. 5. From Fig. 5, we can intuitively observe the impacts of the noise level and the number of incidences on the performance.An important finding is that as N i increases, the error gaps between different noise levels become smaller, which means that multiple measurement data can make the reconstruction more robust to noise.

Tests with a four-circles example
To examine the generality of the trained networks, we consider the reconstruction of scatters of four circles, which is beyond the distribution of the training data.Based on the reconstructions shown in Fig. 6, an additional scatterer is mispredicted in the reconstructed profile with δ = 15% for N i = 1.It also shows that the noises have a significant impact on the reconstruction with small N i , while better reconstruction and generality capability can be observed by using more incident fields.

Tests with the "Austria ring"
Furthermore, we consider a rather special test called "Austria ring", where the profile is also out of the distribution of the training data as shown in Fig. 7 which is a well-known challenging profile in the community of inverse medium scattering problems.From Fig. 7, with very limited incident waves, it is reasonable to see that the DSM-DL can hardly recover the "Austria ring".Moreover, as we have more measurement data, the accuracy and robustness of the reconstruction are improved gradually.This shows that using multiple data is very important for recovering complicated scatterers.

Tests with the high-contrast scatterers
In this example, a high-contrast circle dataset is used to train and test the proposed neural network.The relative permittivity of the scatterers is sampled from U (3.5, 4.0), while other settings are the same as the previous circle dataset example.Thus, this inverse problem is much more ill-posed and nonlinear compared to the previous ones.Some reconstructions from the testing data are shown in Fig. 8, and the relative L2 testing error and SSIM are listed in Table 1.For small N i , especially for the case of three scatterers in the domain, the reconstructed scatterers are distorted and there are some unexpected artifacts.The reconstruction quality can be significantly improved as we have more data and is quite satisfactory with at least 4 incidences fields which may be because the classical DSM can be applied to the high-contrast case.

MNIST database example
In this example, the unknown inhomogeneous medium is modeled by a modification of the well-known MNIST dataset.In the MNIST dataset, the resolution of each image is 28 × 28 and the pixel values range from 0   to 1.We rescale the size to be 64 × 64 and apply a threshold with value 0.5 so that the pixel values only take 0 and 1.To represent multiple scatterers and enhance the diversity of the data, we randomly rotate the digits and also randomly add a circle with a radius from U (0.1, 0.3) to the domain of interest Ω.The relative permittivities of the digits and the circles are randomly chosen from U (1.5, 2.5).In the examples of the MNIST database, which is more complicated than the circle dataset, we use 10000 images as the training data, 200 images as the validation data, and 200 images as the testing data.The batch size is set as 10 and we use a total of 20 epochs to train the models.As the loss function ( 35) is applied, we actually have 20000 training data and the batch size is 20.The learning rate starts at 0.001 and decreases by a factor of 0.5 every 2 epoch.

Tests with testing data
In Fig. 9, reconstructions of four images from the MNIST dataset are presented.When N i is small, it is difficult for the trained networks to recover the physical properties, and noises have a large impact on the reconstruction, but the DSM-DL can still provide some reasonable reconstruction for the shapes and locations of the scatterers.The error and SSIM for different noises and incidences are presented in Table 2 and Fig. 10, we can also see significant improvement as N i becomes larger.

Tests with "Austria ring"
We further test the generalization ability of the models using two different "Austria profiles" which are out of the distribution of the training parameters.For the first "Austria profile", the relative permittivity of the two small circles is set to be 3.0 (Note that the range in the training data is from 1.5 to 2.5) and the relative permittivity of the ring is set to be 1.5.For the second "Austria profile", the relative permittivities of the three scatters are 1.5, 2.0 and 2.5, respectively, while for each example in the training data, the scatterers only take two different relative permittivities.Thus, the two "Austria profiles" are out of the distribution of the training parameters, and are very challenging for the trained networks.The results are shown in Fig. 11, which also show the benefits of using more incident fields.It can be also observed that when multiple scatterers exist, the DSM-DL can provide reasonable reconstruction for the strong scatterers (the yellow small circles), while it is distorted for weak scatterers with small N i .This is mainly because the strong scatterers contribute to the dominant part of the scattered field.In addition, by comparing Fig. 7 and Fig. 11, we can see that the trained models from the MNIST dataset have better generalization than the trained models from the circle dataset; this is reasonable since the MNIST dataset is more diverse and we use more training data for the MNIST example.

4.3.3
Tests with Latin letters "D", "S" and "M" with high noise levels In this example, we use the network trained by the MNIST dataset with N i = 16 to reconstruct Latin letters "D", "S" and "M" with high noise levels 50% and 100%.In addition, different from the configuration for the training data, we use 60 receivers equally distributed on a circular curve with a radius 5 centered at the origin to collect the data for the scattered field.As shown in equation ( 15), the index function is convergent as the radius R tends to infinity, which explains the new configuration can still achieve good reconstructions.Based on the results shown in Fig. 12, it can be seen that the proposed DSM-DL is very robust to high-level noise as the noise is smoothed in the DSM process.We can also observe that the reconstruction of the thinner part of the letters is more likely to be distorted, which is mainly because the thinner part contributes less to the scattered field.

Tests with measured data from half circular curve
We recall that the inner product in equation ( 5) is defined on the whole circular curve S, thus the original DSM requires enough receivers and it is not applicable if the receivers are all located on a half circular curve Γ := {(r, θ) r = 3; 0 ≤ θ < π} ⊂ S.However, based on the upcoming numerical experiments, it is exciting to see that the DSM-DL can be applied to the limited aperture case by considering the following function   The reconstructions for an image from the testing data and an "Austria Ring" are presented in Fig. 13, the relative L2 testing error and SSIM are presented in Table 2.The results indicate that although the original DSM is not applicable in this case, the DSM-DL can still achieve very satisfactory results.However, the performance is not as good as that of using full measurement on the whole surface, this is reasonable since fewer measurement data is used.

Conclusions
In this work, we propose a novel approach for solving inverse medium scattering problems, building upon the classical DSM method introduced in [23].Our method combines deep learning with direct sampling to approximate and learn the relationship between index functions and true contrasts.The proposed DSM-DL has several attractive features.Firstly, our approach is highly efficient and parallelizable, while being robust to noise.Secondly, our deep neural network is able to fully leverage multiple scattering data, leading to significant improvements in reconstruction quality and noise resilience.Thirdly, even with only one incident wave, our method can produce satisfactory results, particularly for simple scatterer distributions.Numerical experiments confirm these features.
With the framework of the proposed DSM-DL in this paper, the DSM-DL can also be generalized to many other inverse problems for which proper DSMs have been developed, for example, in inverse obstacle scattering, inverse electromagnetic scattering, inverse elastic scattering, electrical impedance tomography, and inversion of radon transform.In general, DSM-DL offers a promising methodology for tackling various inverse problems.

Figure 1 :
Figure 1: An example of the index functions by DSM (left) and the true contrast (right)

Figure 2 :
Figure 2: The architecture of the direct sampling-based deep learning approach

Figure 3 :
Figure 3: The architecture of the U-Net used in the numerical experiments

Figure 4 :
Figure 4: Image reconstructions from measured scattered fields with 15% and 40% Gaussian noises by using the networks trained by the circle dataset, where the relative permittivity is between 1.5 and 2.0.From left to right: the ground-truth images, the reconstruction with 1,2,4,8, and 16 incident fields.

Figure 5 :Figure 6 :Figure 7 :Figure 8 :
Figure 5: The relative L2 testing error and SSIM for the networks trained by the circle dataset with different noise levels and number of incidences.

)Figure 9 :
Figure 9: Reconstructed images from scattered fields with 15% and 40% Gaussian noises by using the networks trained by the MNIST dataset, where the relative permittivity is between 1.5 and 2.5.From left to right: the ground-truth images, the reconstruction with 1,2,4,8, and 16 incident fields.

Figure 10 :Figure 11 :
Figure 10: The relative L2 testing error and SSIM for the trained networks from the MNIST dataset with different noise levels and numbers of incidences.

Figure 12 :
Figure12: Image reconstructions of Latin letters "D", "S" and "M" with 50% and 100% Gaussian noises by the networks trained by the MNIST dataset with N i = 16.We use 60 receivers equally distributed on a circular curve with a radius 5 centered at the origin to collect the scattered field data.

Figure 13 :
Figure 13: Image reconstructions of an example from testing data and an "Austria Ring" with 15% and 40% Gaussian noises in the scattered fields by the networks trained by MNIST dataset.From left to right: the ground-truth images, the reconstruction with 1,2,4,8, and 16 incident fields.

Table 1 :
The relative L2 testing error and SSIM for circle and high contrast circle examples with different noise levels and number of incidences.

Table 2 :
Relative L2 testing error and SSIM for different examples with different noise levels and number of incidences, where the example MNIST Γ refers to that the scattered fields are measured on the half circular curve Γ.