Adaptive regularisation for ensemble Kalman inversion

Marco Iglesias; Yuchen Yang

doi:10.1088/1361-6420/abd29b

1. Introduction

Ensemble Kalman inversion (EKI) [1–5] is a collection of algorithms that import ideas from the ensemble Kalman filter (EnKF) developed in [6]. While EnKF was devised for assimilating data into models for numerical weather prediction and oceanography, EKI has been applied to solve parameter identification problems arising from multiple disciplines. The initial versions of EKI were proposed for the calibration of oil-reservoir models [7, 8], and then transferred in [1, 2] to more generic PDE-constrained inverse problems settings. The development of EKI as an iterative solver for parameter identification problems has lead to numerous application including the calibration of climate [9], turbulent flow [10], finance [11] and biomedical [12] models. EKI has been used for imaging and non-destructive testing including electrical impedance tomography (EIT) [13], seismic inversion [14], characterisation of thermophysical properties of walls [15] and composite materials [16]. More recently, EKI has also been applied for the solution of machine learning tasks [17].

There is clear promise for the potential use of EKI as a practical and operational tool for PDE-constrained calibration and imaging arising from multiple applications in science and engineering. However, most EKI algorithms still rely on the appropriate selection of user-defined parameters that control the stability, accuracy and computational efficiency of EKI. Unfortunately, the lack of a general theory for the convergence of EKI means that there is no principled approach to select these parameter in an optimal fashion. The seamless version of EKI proposed in [18] and further developed in [19–22] has provided enormous theoretical advances for understanding EKI within the more general frameworks of stochastic differential equations (SDEs). However, to the best of our knowledge, the selection of those crucial EKI parameters are still chosen empirically. Among those parameters, the choice of the regularisation (inflation) parameter in EKI, or alternatively, the step size from the discretisation of the SDE formulation of EKI often depends on additional tuning parameters which can only be informed after problem-specific numerical testing which maybe computationally intensive.

The aim of this work is to introduce a simple, yet computationally efficient, regularisation strategy for EKI that does not rely on further tuning parameters. Our main focus is large-scale/high resolution imaging/identification settings in which there are two fundamental challenges: (i) the forward model is computationally costly and (ii) the unknown must be parameterised in a highly-complex manner so that key features from the truth can be extracted via EKI. Both challenges are intertwined since the latter means that EKI requires large ensembles and more iterations to achieve accurate identifications; thus the total computational cost of EKI can become unfeasible. An efficient and robust regularisation strategy within EKI is a key requirement in these practical settings.

1.1. The inverse problem framework with ensemble Kalman inversion

We work on a generic setting where the properties that we wish to identify are functions which, in turn, play the role of inputs (e.g. coefficients) of partial different equations (PDEs) describing the underlying experiment/process. The observable variables in these experiment/processes are functionals of the solution of these PDEs. Under this inverse problem setting, the forward problem can be written in terms of a nonlinear operator $\mathcal{F}:\mathcal{X}\to {\mathbb{R}}^{M}$ that maps physical properties from an admissible space $\mathcal{X}$ to the space of observable quantities ${\mathbb{R}}^{M}$ . Our work is focussed on using EKI for solving the classical (deterministic) inverse problem that consists of estimating the underlying physical property of the medium, denoted by κ^†, given noisy measurements of $\mathcal{F}\left({\kappa }^{{\dagger}}\right)$ , which we assume are given by

$\begin{equation}y=\mathcal{F}\left({\kappa }^{{\dagger}}\right)+\eta ,\end{equation} \tag{ 1 }$

where η is the unknown measurement error. We further assume η is drawn from a centred Gaussian measure with known covariance Γ. In order to address this inverse problem, we define a class of suitable parameterisations $\mathcal{P}:\mathcal{H}\to \mathcal{X}$ that enable us to characterise our estimate of the unknown as $\kappa =\mathcal{P}\left(u\right)$ , i.e. in terms of an input (function) $u\in \mathcal{H}$ which we calibrate within the EKI framework. We pose the parameterised (in terms of u) inversion via

$\begin{equation}{u}^{{\ast}}=\underset{u\in {\mathcal{S}}_{0}}{\text{arg min}}\enspace {\Phi}\left(u;y\right),\end{equation} \tag{ 2 }$

where ${\mathcal{S}}_{0}\subset \mathcal{H}$ is a user-defined subspace of admissible solutions and Φ(u; y) is the functional defined by

$\begin{equation}{\Phi}\left(u;y\right)\equiv \frac{1}{2}{\Vert}{{\Gamma}}^{-1/2}\left(y-\mathcal{G}\left(u\right)\right){{\Vert}}^{2},\end{equation} \tag{ 3 }$

where $\mathcal{G}=\mathcal{F}{\circ}\mathcal{P}$ is the forward (parameter-to-output) map and ${\Vert}\cdot {\Vert}$ denotes the M-dimensional Euclidean norm. We use EKI to approximate (2) and to provide an estimate of κ^† via ${\kappa }^{{\ast}}=\mathcal{P}\left({u}^{{\ast}}\right)$ .

For the PDE-constrained identification problems that we wish to address, $\mathcal{G}$ is often a compact operator which leads to the ill-posedness of (2) in the sense of stability [23, 24]. Although numerous regularisation techniques [23, 25] can be used to address ill-posedness, most of them require the computation of the Frechét derivative of the forward map, as well as the corresponding adjoint operator. This constitutes a substantial limitation in many practical applications where the map $\mathcal{F}$ is only accessible via commercial software simulations with no adjoint built-in functionalities. Such a practical limitation has given rise to a large body of work on EKI techniques that stem from EnKF [6] and which, in turn, do not require derivatives of the forward map.

While there are several versions of EKI algorithms to approximate (2), here we focus on the classical, perturbed-observation EKI displayed in algorithm 1. This generic version of EKI involves selecting an initial ensemble of J particles ${\left\{{u}_{0}^{\left(j\right)}\right\}}_{j=1}^{J}\subset \mathcal{H}$ . Then, each particle is iteratively updated according to the following formula

$\begin{equation}{u}_{n+1}^{\left(j\right)}={u}_{n}^{\left(j\right)}+{\mathcal{C}}_{n}^{u\mathcal{G}}{\left({\mathcal{C}}_{n}^{\mathcal{G}\mathcal{G}}+{\alpha }_{n}{\Gamma}\right)}^{-1}\left(y+\sqrt{{\alpha }_{n}}{\xi }_{n}-\mathcal{G}\left({u}_{n}^{\left(j\right)}\right)\right),\end{equation} \tag{ 4 }$

where α_n is a tuning (regularisation) parameter, ξ_n ∼ N(0, Γ) is perturbation of the data, and ${\mathcal{C}}_{n}^{u\mathcal{G}}$ and ${\mathcal{C}}_{n}^{\mathcal{G}\mathcal{G}}$ are empirical covariances defined in (16) and (17). The running estimate of the unknown is obtained via the ensemble mean

$\begin{equation}{\bar{u}}_{n+1}\equiv \frac{1}{J}\sum\limits _{j=1}^{J}{u}_{n+1}^{\left(j\right)}.\end{equation} \tag{ 5 }$

Using informal arguments, it can be shown (see appendix A and the work in [2]) that the ensemble mean ${\bar{u}}_{n+1}$ can be seen as an ensemble approximation of

$\begin{align}\hfill {m}_{n+1}=& \underset{u\in {\mathcal{S}}_{0}}{\text{arg min}}\left\{\frac{1}{2}{{\Vert}{{\Gamma}}^{-1/2}\left(\right.y-\mathcal{G}\left({m}_{n}\right)-D\mathcal{G}\left({m}_{n}\right)\left(u-{m}_{n}\right){\Vert}}^{2}+\frac{{\alpha }_{n}}{2}{{\Vert}{\mathcal{C}}_{n}^{-1/2}\left(u-{m}_{n}\right){\Vert}}_{\mathcal{H}}^{2}\right\}\hfill \end{align} \tag{ 6 }$

where $D\mathcal{G}$ denotes the Frechét derivative of $\mathcal{G}$ , ${\mathcal{C}}_{n}$ is a covariance operator that we define in appendix A, and ${\mathcal{S}}_{0}\equiv \text{span}{\left\{{u}_{0}^{\left(j\right)}\right\}}_{j=1}^{J}$ is the subspace generated by the initial ensemble. If ${\mathcal{C}}_{n}$ is the identity operator, (6) is the standard Levenberg–Marquardt (LM) scheme [26] applied for the computation of (2). Note that (6) can also be interpreted as iterative Tikhonov regularisation applied to the linearisation of $\mathcal{G}$ .

The link between EKI and (6) is very useful because (i) it motivates EKI as a derivative-free solver for (2) and (ii) it reveals the role of α_n in (4) as a Tikhonov regularisation parameter. According to the theory in [27], α_n must be carefully selected, together with the stopping criteria, in order to ensure the stability of the LM scheme. The approach for selecting α_n in the LM proposed in [27] has been adapted to the EKI framework in [2, 16], and subsequently used in [3, 12, 14, 28–30]. As we discuss in the next section, this approach relies on tuning parameters that, unless carefully chosen, can lead to unnecessary large number of iterations n*. Since the main computational cost of algorithm 1 is n* × J, it is clear that a large n* is detrimental to the computational efficiency of EKI.

1.2. EKI as a Gaussian approximation in the Bayesian tempering setting

Although the goal of most of the existing applications of EKI is to solve the deterministic problem in (2), the role of EKI within the Bayesian setting for parameter identification can be useful for identifying suitable choices of the regularisation parameter α_n. In the Bayesian setting we put a prior measure, ${\mu }_{0}\left(u\right)=\mathbb{P}\left(u\right)$ , on the unknown u that we wish to infer. Given measurements, y, the Bayesian inverse problem consists of approximating the posterior $\mu \left(u\right)\equiv \mathbb{P}\left(u\vert y\right)$ which, from Bayes' rule [31] is given by

$\begin{equation}\mu \left(\mathrm{d}u\right)\propto {\mu }_{0}\left(\mathrm{d}u\right)\mathrm{exp}\left[-\frac{1}{2}{{\Vert}{{\Gamma}}^{-1/2}\left(y-\mathcal{G}\left(u\right)\right){\Vert}}^{2}\right],\end{equation} \tag{ 7 }$

where we have made the standard assumption that $y=\mathcal{G}\left(u\right)+\eta$ with η ∼ N(0, Γ). Modern computational approaches [28, 32–35] for high-dimensional Bayesian inverse problems use the tempering approach that consists of introducing N intermediate measures ${\left\{{\mu }_{{t}_{n}}\right\}}_{n=1}^{N}$ between the prior and the posterior. These measures are defined by

$\begin{equation}{\mu }_{{t}_{n}}\left(\mathrm{d}u\right)\propto {\mu }_{{t}_{0}}\left(\mathrm{d}u\right)\mathrm{exp}\left[-\frac{{t}_{n}}{2}{{\Vert}{{\Gamma}}^{-1/2}\left(y-\mathcal{G}\left(u\right)\right){\Vert}}^{2}\right],\end{equation} \tag{ 8 }$

where ${\left\{{t}_{n}\right\}}_{n=1}^{N}$ are tempering parameters that satisfy:

$\begin{equation}{t}_{0}\equiv 0{< }{t}_{1}{< }{t}_{2}{< }\cdots {< }{t}_{N}{< }{t}_{N+1}\equiv 1.\end{equation} \tag{ 9 }$

Note that n = 0 and n = N + 1 in (8) yields the prior ( ${\mu }_{{t}_{0}}={\mu }_{0}$ ) and posterior ( ${\mu }_{{t}_{N+1}}=\mu$ ), respectively. From expression (8) we obtain the following recursive formula for the intermediate measures:

$\begin{equation}\frac{{\mu }_{{t}_{n+1}}\left(\mathrm{d}u\right)}{{\mu }_{{t}_{n}}\left(\mathrm{d}u\right)}\propto \mathrm{exp}\left[-\frac{1}{2}{{\Vert}{\left({\alpha }_{n}{\Gamma}\right)}^{-1/2}\left(y-\mathcal{G}\left(u\right)\right){\Vert}}^{2}\right],\end{equation} \tag{ 10 }$

where

$\begin{equation}{\alpha }_{n}^{-1}={t}_{n+1}-{t}_{n}.\end{equation} \tag{ 11 }$

From (9) it follows that

$\begin{equation}\sum\limits _{n=0}^{N}{\alpha }_{n}^{-1}=1.\end{equation} \tag{ 12 }$

In (11) we employ the same notation that we use for the regularisation parameter in EKI (see equation (4)), because the ensemble computed at the nth iteration of EKI, is an ensemble approximation of a Gaussian measure that, in turn, approximates¹ the intermediate distribution ${\mu }_{{t}_{n}}$ in (8) (see [16] and appendix A.2). Therefore, in the Bayesian setting, controlling the regularisation parameter α_n in EKI means to gradually transition between prior and posterior in order to facilitate more accurate sampling of the intermediate measures.

The link between EKI and the Bayesian tempering setting has been recently explored in [16], where the selection of α_n is borrowed from the adaptive-tempering sequential Monte Carlo (SMC) method of [33]. In this approach, α_n is selected based on user-defined threshold to the effective sample size (ESS) which, in SMC methods, is used to determine the quality in the population of particles that approximate each intermediate measure ${\mu }_{{t}_{n}}$ . However, this selection of α_n involves the computation of the likelihood between consecutive measures (10). When the number of observations is large and/or a the measurement noise is small, this likelihood can take very small values unless large α_n's are chosen. Consequently, many iterations may be needed to satisfy condition (12). Furthermore, the approach of [16] requires the aforementioned user-defined threshold on the ESS which may substantially affect the efficiency of EKI.

1.3. Our contribution

In this work we propose a novel adaptive choice of α_n in EKI that does not require any tuning parameters. This new selection of α_n depends on the number of observations, M, as well as the values of the least-squares functional (3) for the ensemble of particles, i.e.

$\begin{equation}{{\Phi}}_{n}\equiv {\left\{{\Phi}\left({u}_{n}^{\left(j\right)};y\right)\right\}}_{j=1}^{J}={\left\{\frac{1}{2}{{\Vert}{{\Gamma}}^{-1/2}\left(y-\mathcal{G}\left({u}_{n}^{\left(j\right)}\right)\right){\Vert}}^{2}\right\}}_{j=1}^{J}.\end{equation} \tag{ 13 }$

More specifically, we select ${\alpha }_{n}^{-1}$ via

$\begin{equation}{\alpha }_{n}^{-1}=\mathrm{min}\left\{\mathrm{max}\left\{\frac{M}{2\langle {{\Phi}\rangle }_{n}},\sqrt{\frac{M}{2\langle {\Phi},{{\Phi}\rangle }_{n}}}\right\},1-{t}_{n}\right\},\end{equation} \tag{ 14 }$

where ${\left\langle {\Phi}\right\rangle }_{n}$ and ${\left\langle {\Phi},{\Phi}\right\rangle }_{n}$ denote the empirical mean and variance of Φ_n, and

$\begin{equation}{t}_{n}=\begin{cases}\sum\limits _{j=0}^{n-1}{\alpha }_{j}^{-1}\quad \hfill & \text{if}\enspace n{\geqslant}1,\hfill \\ 0\quad \hfill & \text{if}\enspace n=0.\hfill \end{cases}\end{equation} \tag{ 15 }$

Note that in (14) we require that ${\alpha }_{n}^{-1}{\leqslant}1-{t}_{n}$ in order to enforce constraint (12) from the tempering/annealing setting using N = n* intermediate measures. More specifically, we propose to stop the EKI algorithm at the iteration, n*, such that

$\begin{equation*}{\alpha }_{{n}^{{\ast}}}^{-1}=1-{t}_{{n}^{{\ast}}}\end{equation*}$

which, from (15), yields constraint (12). Noting that the selection of ${\alpha }_{n}^{-1}$ in (14) is determined by the data misfit of the particles, we refer to our regularisation strategy as the data misfit controller (DMC).

Algorithm 1. Generic EKI (with perturbed observations).

We motivate our DMC from the Bayesian perspective of EKI in which ${\alpha }_{n}^{-1}$ is the difference between consecutive tempering/annealing parameters (see equation (11)). We show that the selection of ${\alpha }_{n}^{-1}$ in (14) can be obtained by controlling, via imposing a threshold, the symmetrised Kullback–Leibler divergence (or Jeffreys' divergence) between two consecutive tempering measures. We then use a statistical discrepancy principle to select this threshold.

Jeffreys' divergence has been employed to design efficient selection of the step size between tempering measures when using MCMC to approximate the target posterior. For example, in the context of the tempered transitions method, the work of [36] uses Jeffreys' divergence to select step sizes that increase acceptance rates. Jeffreys' divergence has also been used for path sampling: a method commonly used to estimate normalisation constants (e.g. for Bayesian model selection). The work of [37], for example, shows that by controlling the Jeffreys' divergence between tempering measures, the error in the path sampling estimator can be minimised. Both the works in [36, 37], as well as various references therein, employ continuous tempering in the finite-dimensional setting in order to characterise Jeffreys' divergence. Here we extend some of those results to our infinite-dimensional framework for EKI.

We encode our regularisation strategy for EKI in an algorithm (EKI-DMC) that we test via EIT with the complete electrode model (CEM). We show that the choice of α_n via the DMC that we import from the Bayesian formulation, leads to a stable, robust and computationally efficient EKI algorithm for solving classical (deterministic) inverse problems posed via (2). We demonstrate the computational and practical advantages of EKI-DMC over existing approaches in which α_n is borrowed from the theory for the LM scheme. We conduct test with two parameterisations of the unknown conductivity that allows to consider both continuous and piece-wise constant conductivities. In particular, we investigate the performance EKI-DMC to infer regions characterised via a level-set function which is, in turn, parameterised with an anisotropic Whittle–Matern (WM) field.

In section 2 we further review the literature on existing approaches for the selection of α_n for the classical EKI framework. The proposed DMC (14) is motivated in section 3, where we use continuous tempering to derive approximations to Jeffreys' divergence between tempering measures (subsection 3.2), as well as a statistical discrepancy principle to (subsection 3.3) for selecting the threshold on Jeffreys' divergence. In section 4 we provide a numerical investigation of the performance of the proposed EKI-DMC algorithm for EIT. In section 5 we discuss final remarks and conclusions. For completeness in appendix A we motivate the classical EKI from the Bayesian tempering scheme, and from which we also draw links between the LM algorithm and EKI. Some technical proofs are included in appendix B.

2. Literature review

We discuss some existing regularisation strategies for EKI within the inversion setting posed in terms of the unregularised least-squares formulation of (2), and which leads to the classical EKI formulation in (4). We highlight that there is a new alternative EKI methodology and algorithms proposed in [4, 5] that arise from regularising (2) with a Tikhonov-like term. In addition, the review is focussed on PDE-constrained inversion; for a review of modern Kalman methods in the context of the data assimilation framework we refer the reader to the recent work on [38] and references therein.

2.1. EKI as an iterative solver for identification problems

The initial versions of EKI [1] for generic identification problems proposed to use the classical EnKF [6] update formula² as an iterative solver for (2) by introducing an artificial dynamical system. For various PDE-constrained identification problems, the work of [1] numerically showed that this early version of EKI approximated well the solutions of (2) (with ${\mathcal{S}}_{0}=\text{span}{\left\{{u}_{0}^{\left(j\right)}\right\}}_{j=1}^{J}$ ) within the first few iterations. However, they noted the algorithm became unstable if it was allowed to iterate after the data misfit (2) had reached the noise level δ defined by

$\begin{equation}\delta ={\Vert}{{\Gamma}}^{-1/2}\left(\right.y-\mathcal{G}\left({u}^{{\dagger}}\right){\Vert},\end{equation} \tag{ 18 }$

where u^† is the truth. This lack of stability led to the work of [2] where links between EKI and the regularising LM scheme of [27] were first established and used to develop a regularising version of EKI. For the LM scheme, the work of [27] ensures that, under certain assumptions of the forward map $\mathcal{G}$ , the scheme in (6) converges to the solution of (2) (as δ → 0) provided that (i) the regularisation parameter α_n satisfies

$\begin{align}\hfill & \rho {\Vert}{{\Gamma}}^{-1/2}\left(y-\mathcal{G}\left({m}_{n}\right)\right){\Vert}{\leqslant}{\alpha }_{n}{\Vert}{{\Gamma}}^{1/2}\left(D\mathcal{G}\left({m}_{n}\right){\mathcal{C}}_{n}D{\mathcal{G}}^{{\ast}}\left({m}_{n}\right)+{\alpha }_{n}{\Gamma}\right)\left(y-\mathcal{G}\left({m}_{n}\right)\right){\Vert},\hfill \end{align} \tag{ 19 }$

where ρ < 1 is a tuning parameter, and (ii) that the algorithm is terminated at an iteration level n* determined by the following discrepancy principle

$\begin{equation}{\Vert}{{\Gamma}}^{-1/2}\left(y-\mathcal{G}\left({m}_{{n}^{{\ast}}}\right)\right){\Vert}{\leqslant}\tau \delta {< }{\Vert}{{\Gamma}}^{-1/2}\left(y-\mathcal{G}\left({m}_{n}\right)\right){\Vert}\quad 0{\leqslant}n{< }{n}^{{\ast}},\end{equation} \tag{ 20 }$

where τ is another tuning parameter that must satisfy τ > 1/ρ. In [2], these regularisation strategies from LM were adapted for the selection of α_n in EKI via using derivative-free Gaussian approximations in (19) and (20). We refer to the approach from [2] as EKI-LM (see algorithm 2).

Algorithm 2. (EKI-LM) EKI with LM selection of α_n and stopping criteria.

The numerical results of [2] showed that EKI-LM enabled stability and accuracy for sufficiently large ensembles. Further work that has explored EKI-LM can be found in [3, 28, 29] as well as some practical applications including seismic tomography [14], modelling of intracraneal pressure [12] and time fractional diffusion inverse problems [30].

Despite of addressing stability in EKI, the approach EKI-LM suffers from a potential practical limitation that arises from the fact that it relies on the tuning parameters ρ and τ in (21). Larger ρ's yield larger α_n and in turn more iterations to converge. Smaller ρ's, while desirable for computational efficiency, result in larger τ's which can lead to larger data misfit and possible loss of accuracy from stopping too early (via (22)). Selecting these parameters in a computationally optimal fashion becomes more crucial when EKI is combined with complex parameterisations of the unknown. As we discuss in subsection 4.2, there are cases for which, instead of using EKI directly on the physical property that we wish to infer, we need to parameterise the unknown to be able to capture properties that are not necessarily encoded in the initial ensemble. For example, in [2] the LM approach for EKI was applied with a level-set parameterisation of the unknown in order to characterise discontinuous properties (i.e. discontinuous conductivity in the context of EIT). In comparison to the simpler case in which EKI directly estimates a physical property of interest, [2] found that not only they needed a larger ensemble to achieve converge, but also more EKI iterations. The application of EKI-LM with level-set parameterisations for seismic imaging in [14] reported up to 40 iterations to achieve convergence. The numerical results of [3] also show that when EKI is combined with various others parameterisations of the unknown, the number of iterations of EKI can become large even for simple 1D and 2D forward modelling settings. We aim at addressing these very same issues with the selection of α_n that we propose in (14) and that we motivate in section 3 using the Bayesian perspective of EKI discussed earlier.

2.2. EKI as Gaussian approximation in linearised Bayesian tempering

Although the Bayesian perspective of EKI as a Gaussian approximation within the tempering setting was initially mentioned in [18] and further developed in [16], the early work of [7, 39] established a strong link between EKI and the Bayesian setting which led exactly to the same EKI scheme from algorithm 1. However, instead of using tempering, they used a heuristic approach in which the data was inverted multiple times with noise inflated by $\sqrt{{\alpha }_{n}}$ . From algebraic manipulations they figured out that, in order to accurately sample from the posterior in the linear-Gaussian case, their inflation parameter α_n must satisfy (12) with N = n*, where n* is the total number of EKI iterations. Note that, in the tempering setting, (12) is simply a consequence of the definition of α_n in (11) as well as the definitions of t₀ and t_N+1 in (9). Moreover, in the general Bayesian tempering settings, expression (12) holds for the general nonlinear case and without any assumptions on the prior. Nonetheless, those considerations for the linear-Gaussian case from [39] enabled them to (i) justify the use of (12) in the nonlinear case and (ii) to propose a simple selection of α_n given by α_n = n* with n* selected a priori (which trivially satisfies (12)). This approach, referred to as ensemble smoother with multiple data assimilation, has been popularised in petroleum engineering applications (see [40] and references therein). However, from the insight gained from the link between EKI and the LM scheme, this selection of α_n was discouraged in [41] since the stability of EKI, as shown in [2, 41], requires α_n to be large at the beginning of the iterations, and gradually decrease as the data misfit approaches the noise level. Further versions of ES-MDA [42] adopted selections of α_n similar to those initially proposed in [2] based on the LM scheme.

As discussed in the previous section, a selection of α_n based on the adaptive-tempering SMC method of [33] is proposed in [16]. More specifically, α_n is selected so that

$\begin{equation}{\left[\sum\limits _{j=1}^{J}{\left({\mathcal{W}}_{n}^{\left(j\right)}\left[{\alpha }_{n}\right]\right)}^{2}\right]}^{-1}={J}_{{\ast}},\end{equation} \tag{ 23 }$

where J_* is a tuning user-defined parameter and

$\begin{equation}{\mathcal{W}}_{n}^{\left(j\right)}\left[{\alpha }_{n}\right]=\frac{\mathrm{exp}\left[-\frac{1}{2}{{\Vert}{\left({\alpha }_{n}{\Gamma}\right)}^{-1/2}\left(y-\mathcal{G}\left({u}_{n}^{\left(j\right)}\right)\right){\Vert}}^{2}\right]}{{\sum }_{s=1}^{J}\mathrm{exp}\left[-\frac{1}{2}{{\Vert}{\left({\alpha }_{n}{\Gamma}\right)}^{-1/2}\left(y-\mathcal{G}\left({u}_{n}^{\left(s\right)}\right)\right){\Vert}}^{2}\right]}.\end{equation} \tag{ 24 }$

The left-hand side of (23) is the ESS of the ensemble approximation of ${\mu }_{{t}_{n}}$ . For further details we refer the reader to [16], where the selection of α_n according to (23) was implemented in a batch-sequential EKI framework to sequentially solve an inverse problem that arises in resin transfer moulding. The same EKI methodology was applied in [15] for parameter identification of the heat equation, including identification of thermal conductivity and heat capacitance given boundary measurement of heat flux. Both time-dependent applications tackled in [15, 16] involved the inversion of small number of measurements (e.g. <30) at each observation time, and with relatively large noise informed by measurement protocols. However, as discussed in the previous section, the selection of α_n via solving (23) becomes problematic for large number of observations and/or small observational noise, since (24) requires the computation of the likelihood (10). Another limitation is that the efficiency of the approach used in [15, 16] relies on the tuning parameter J_* in (23).

2.2.1. EKI as a discretisation of stochastic differential equations

The pioneering work of [18] has shown that EKI can be derived as a discretisation of an SDE system for the ensemble of particles in EKI. This so-called continuous-time limit or seamless formulation of EKI has led to further theoretical understanding of the EKI framework [19, 20, 22]. In addition, using alternative discretisation schemes of the seamless formulation of EKI, leads to new EKI algorithms in both the context of optimisation [17, 19] and sampling within the Bayesian approach [21].

In the seamless formulation of EKI, the regularisation parameter α_n from the classical EKI becomes the inverse of the mesh-size/discretisation step. The work of [17] proposes to choose this parameter according to

$\begin{equation}{\alpha }_{n}^{-1}=\frac{{\alpha }_{0}^{-1}}{{{\Vert}U{\Vert}}_{\mathrm{F}}+{\epsilon}},\end{equation} \tag{ 25 }$

where α₀ and are user defined parameters, ${{\Vert}\cdot {\Vert}}_{\mathrm{F}}$ denotes the Frobenius norm and U is a matrix with entries ${U}_{j,k}={\left(\mathcal{G}\left({u}_{n}^{\left(k\right)}\right)-{\bar{\mathcal{G}}}_{n}\right)}^{T}{{\Gamma}}^{-1}\left(\mathcal{G}\left({u}_{n}^{\left(j\right)}\right)-y\right)$ . To the best of our knowledge, this selection of α_n has only been used for new different EKI algorithms including the ones arising from forward-Euler [17] and the implicit split-step method [21]. While (25) is also a perfectly valid choice of α_n for classical EKI, we emphasise that it relies on the choice of the tuning parameters α₀ and .

3. The proposed regularisation framework for EKI

The aim of this section is to motivate the new approach (DMC) that we propose via (14) to select ${\alpha }_{n}^{-1}$ for the classical EKI setting given in (4). We motivate our approach using tempering within the Bayesian setting for inverse problems in which ${\alpha }_{n}^{-1}$ is the step size between consecutive tempering measures. After introducing some notation and assumptions in subsection 3.1, in subsection 3.2 we define tempering in the continuous setting which we require in order to compute an approximation of Jeffreys' divergence between two consecutive measures. Under this approximation, we further show that our selection of ${\alpha }_{n}^{-1}$ (14) controls Jeffreys' divergence. The control parameter is then selected following a heuristic approach based on a discrepancy principle that we introduced in subsection 3.3.

3.1. Notation, definitions and assumptions

We assume that the space of inputs, $\mathcal{H}$ , is a real-valued separable Hilbert space. The Borel-sigma algebra over $\mathcal{H}$ is denoted by $\mathcal{B}\left(\mathcal{H}\right)$ . In this section we assume that the prior is a Gaussian measure ${\mu }_{0}=\mathcal{N}\left(m,\mathcal{C}\right):\mathcal{B}\left(\mathcal{H}\right)\to \left[0,1\right]$ on $\left(\mathcal{H},\mathcal{B}\left(\mathcal{H}\right)\right)$ , satisfying ${\mu }_{0}\left(\mathcal{H}\right)=1$ . Given two probability measures $\hat{\mu }$ and μ on a measurable space $\left(\mathcal{Y},{\Sigma}\right)$ , if $\hat{\mu }$ is absolutely continuous with respect to μ, we define the Kullback–Leibler divergence of $\hat{\mu }$ with respect to μ by [43],

$\begin{equation*}{\mathrm{D}}_{\text{KL}}\left(\hat{\mu }{\Vert}\mu \right){:=}{\int }_{\mathcal{Y}}\mathrm{log}\left(\frac{\mathrm{d}\hat{\mu }}{\mathrm{d}\mu }\right)\mathrm{d}\hat{\mu },\end{equation*}$

where $\frac{\mathrm{d}\hat{\mu }}{\mathrm{d}\mu }$ denotes the Radon–Nikodym derivative of $\hat{\mu }$ with respect to μ. For two equivalent probability measures $\hat{\mu }$ and μ (i.e. $\hat{\mu }\ll \mu$ and $\mu \ll \hat{\mu }$ ) we define the symmetrized KL divergence or Jeffrey's divergence:

$\begin{equation}{\mathrm{D}}_{\text{KL},2}\left(\mu ,\hat{\mu }\right){:=}{\mathrm{D}}_{\text{KL}}\left(\hat{\mu }{\Vert}\mu \right)+{\mathrm{D}}_{\text{KL}}\left(\mu {\Vert}\hat{\mu }\right).\end{equation} \tag{ 26 }$

We consider the following assumption on the forward operator:

Assumption 1. Assume that the forward map $\mathcal{G}:\mathcal{H}\to {\mathbb{R}}^{M}$ satisfies the following two conditions:

(a)
For every > 0 there is an $\beta =\beta \left({\epsilon}\right)\in \mathbb{R}$ such that, for all $u\in \mathcal{H}$ ,
$\begin{equation*}{\Vert}{{\Gamma}}^{-1/2}\mathcal{G}\left(u\right){\Vert}{\leqslant}\mathrm{exp}\left({\epsilon}{\Vert}u{{\Vert}}_{\mathcal{H}}^{2}+\beta \right)\end{equation*}$
(b)
For every r > 0 there is an K = K(r) > 0 such that,
$\begin{equation*}{\Vert}{{\Gamma}}^{-1/2}\left(\mathcal{G}\left({u}_{1}\right)-\mathcal{G}\left({u}_{2}\right)\right){\Vert}{\leqslant}K{\Vert}{u}_{1}-{u}_{2}{{\Vert}}_{\mathcal{H}}\end{equation*}$
for all ${u}_{1},{u}_{2}\in \mathcal{H}$ with $\mathrm{max}\left\{{\Vert}{u}_{1}{{\Vert}}_{\mathcal{H}},{\Vert}{u}_{2}{{\Vert}}_{\mathcal{H}}\right\}{< }r$ .

This assumption was used in [[31], theorem 4.1] to establish well-posedness of the posterior in the infinite-dimensional framework. Let us recall the least-square functional Φ(u; y) defined in (3). For any $y\in {\mathbb{R}}^{M}$ , Fernique's theorem and condition (a) in assumption 1 implies that Φ(⋅; y)^m is square integrable, for any m ⩾ 1, with respect to the prior ${\mu }_{0}=\mathcal{N}\left(m,\mathcal{C}\right)$ . Integrability of Φ(⋅; y)^m is used through the rest of this section and for the proofs in appendix B.

3.2. Controlling Jeffreys' divergence between successive tempering measures

In this section we show that the data-misfit controller (14) ensures that the level of information between any two consecutive tempering measures is approximately bounded by a user-defined threshold θ > 0. We measure the level of information between these measures via Jeffreys' divergence defined in (26). Our aim is to show that the selection of ${\alpha }_{n}^{-1}={t}_{n+1}-{t}_{n}$ via (14) yields

$\begin{equation}{\mathrm{D}}_{\text{KL},2}\left({\mu }_{{t}_{n+1}},{\mu }_{{t}_{n}}\right){\leqslant}\theta ,\quad n\in \left\{1,\dots ,N\right\}\end{equation} \tag{ 27 }$

In subsection 3.3 we give heuristic motivations for the choice of θ based on a statistical discrepancy principle.

Our first step is the explicit characterisation of ${\mathrm{D}}_{\text{KL},2}\left({\mu }_{{t}_{n+1}},{\mu }_{{t}_{n}}\right)$ which we latter approximate in order to propose our adaptive selection of ${\alpha }_{n}^{-1}$ . This approximation, require us to work on the continuous tempering setting which we now introduce.

3.2.1. Continuous tempering

For t ∈ [0, 1], we define measure μ_t such that the Radon–Nikodym derivative of μ_t with respect to the prior ${\mu }_{0}=\mathcal{N}\left(m,\mathcal{C}\right)$ satisfies, for almost every $u\in \mathcal{H}$ ,

$\begin{equation}\frac{\mathrm{d}{\mu }_{t}}{\mathrm{d}{\mu }_{0}}\left(u\right)={N}_{t}^{-1}\enspace \mathrm{exp}\left(-t{\Phi}\left(u;y\right)\right),\end{equation} \tag{ 28 }$

where N_t is the normalising constant defined by

$\begin{equation}{N}_{t}={\int }_{\mathcal{H}}\mathrm{exp}\left(-t{\Phi}\left(u;y\right)\right){\mu }_{0}\left(\mathrm{d}u\right).\end{equation} \tag{ 29 }$

The following lemma establishes the well-posedness of {μ_t|t ∈ [0, 1]}. The proof can be found in appendix B.1.

Lemma 1. The probability measure μ_t from (28) exists on the probability space $\left(\mathcal{H},\mathcal{B}\left(\mathcal{H}\right),{\mu }_{0}\right)$ and is equivalent to μ₀, i.e. μ_t ≪ μ₀ and μ₀ ≪ μ_t.

Using integrability of the data misfit with respect to the prior, and the equivalence proven in the previous lemma, let us define the mean and variance of the data misfit:

$\begin{equation}{\left\langle {\Phi}\right\rangle }_{t}{:=}{\int }_{\mathcal{H}}{\Phi}\left(w;y\right){\mu }_{t}\left(\mathrm{d}w\right)\quad {\left\langle {\Phi},{\Phi}\right\rangle }_{t}{:=}{\int }_{\mathcal{H}}{\left({\Phi}\left(w;y\right)-{\left\langle {\Phi}\right\rangle }_{t}\right)}^{2}{\mu }_{t}\left(\mathrm{d}w\right)\end{equation} \tag{ 30 }$

for all t ∈ [0, 1].

It is not difficult to see that, given 0 = t₀ < t₁ <⋯< t_N+1 = 1, lemma 1 also establishes the equivalence of the probability measures $\left\{{\mu }_{{t}_{n}}:n=0,1,\dots ,N+1\right\}$ . Furthermore, from the integrability of the data misfit, it follows that the Jeffreys' divergence between consecutive measures is well-defined and, using its definition in (26), it can be written as

$\begin{equation}{\mathrm{D}}_{\text{KL},2}\left({\mu }_{{t}_{n}},{\mu }_{{t}_{n+1}}\right)={\alpha }_{n}^{-1}\left(\langle {{\Phi}\rangle }_{{t}_{n}}-\langle {{\Phi}\rangle }_{{t}_{n+1}}\right).\end{equation} \tag{ 31 }$

The finite-dimensional version of (31) was proven in [[37], proposition 3.2] in the context of path sampling.

Let us note that, since Φ ⩾ 0,

$\begin{equation}{\mathrm{D}}_{\text{KL},2}\left({\mu }_{{t}_{n}},{\mu }_{{t}_{n+1}}\right){\leqslant}{\alpha }_{n}^{-1}{\left\langle {\Phi}\right\rangle }_{{t}_{n}}.\end{equation} \tag{ 32 }$

3.2.2. Approximating ${\mathrm{D}}_{\text{KL},2}\left({\mu }_{{t}_{n}},{\mu }_{{t}_{n+1}}\right)$

In practice, the expression for ${\mathrm{D}}_{\text{KL},2}\left({\mu }_{{t}_{n}},{\mu }_{{t}_{n+1}}\right)$ provided in (31) cannot be used in (27) to find ${\alpha }_{n}^{-1}$ since, at time t_n, measure ${\mu }_{{t}_{n+1}}$ is unknown. Moreover, since ${t}_{n+1}={t}_{n}+{\alpha }_{n}^{-1}$ , measure ${\mu }_{{t}_{n+1}}$ depends on the choice of α_n. In order to propose a practical choice of ${\alpha }_{n}^{-1}$ via (31), let us now discuss some approximations. Note that if ${\left\langle {\Phi}\right\rangle }_{{t}_{n+1}}\ll {\left\langle {\Phi}\right\rangle }_{{t}_{n}}$ , then the term ${\left\langle {\Phi}\right\rangle }_{{t}_{n}}{\geqslant}0$ in the right-hand side of (31) can be neglected and hence

$\begin{equation}{\mathrm{D}}_{\text{KL},2}\left({\mu }_{{t}_{n}},{\mu }_{{t}_{n+1}}\right)\approx {\alpha }_{n}^{-1}\langle {{\Phi}\rangle }_{{t}_{n}}{\leqslant}\theta {\Rightarrow}{\alpha }_{n}^{-1}{\leqslant}\frac{\theta }{\langle {{\Phi}\rangle }_{{t}_{n}}}.\end{equation} \tag{ 33 }$

However, approximating ${\mathrm{D}}_{\text{KL},2}\left({\mu }_{{t}_{n}},{\mu }_{{t}_{n+1}}\right)$ with its upper bound in (32) may not be accurate when ${\left\langle {\Phi}\right\rangle }_{{t}_{n+1}}\approx {\left\langle {\Phi}\right\rangle }_{{t}_{n}}$ which is likely to happen when ${\alpha }_{n}^{-1}$ is small. We address this case via a first order approximation of ${\left\langle {\Phi}\right\rangle }_{t}$ (as a function of t), around t = t_n. We first need the following proposition that we prove in appendix B.2 and which extends, to the present infinite-dimensional setting, the results from [[36], section 2] for tuning tempered transitions.

Proposition 1. For any bounded t ⩾ 0, let u_t ∼ μ_t be an $\mathcal{H}$ -valued random variable, where μ_t is the probability measure determined via formula (28). For any square integrable functional $g:\mathcal{H}\to \mathbb{R}$ , the expected value $\mathbb{E}\left\{g\left({u}_{t}\right)\right\}$ , as a function of t, is differentiable and its derivative $\frac{\mathrm{d}}{\mathrm{d}t}\mathbb{E}\left\{g\left({u}_{t}\right)\right\}$ satisfies:

$\begin{equation}\frac{\mathrm{d}}{\mathrm{d}t}\mathbb{E}\left\{g\left({u}_{t}\right)\right\}=-\mathrm{C}\mathrm{o}\mathrm{v}\left\{g\left({u}_{t}\right),{\Phi}\left({u}_{t};y\right)\right\},\end{equation} \tag{ 34 }$

where Cov denotes covariance between two random variables.

The following corollary is a simple consequence of theorem 1 with g(⋅) = Φ(⋅, y).

Corollary 1. For any t ∈ [0, 1], let μ_t denote the probability measure determined via the tempering setting (28). The mean ${\left\langle {\Phi}\right\rangle }_{t}$ defined in (30), as a function of t ∈ [0, 1], is differentiable. Moreover, its derivative is given by

$\begin{equation*}\frac{\mathrm{d}}{\mathrm{d}t}\langle {{\Phi}\rangle }_{t}=-\langle {\Phi},{{\Phi}\rangle }_{t}.\end{equation*}$

We now come back to the case in which ${\alpha }_{n}^{-1}={t}_{n+1}-{t}_{n}$ needs to be small so that (27) can be satisfied for a given threshold θ > 0. Using corollary 1 we have

$\begin{equation*}{\left\langle {\Phi}\right\rangle }_{{t}_{n+1}}\approx {\left\langle {\Phi}\right\rangle }_{{t}_{n}}+{\alpha }_{n}^{-1}{\frac{\mathrm{d}}{\mathrm{d}t{\left\langle {\Phi}\right\rangle }_{{t}_{n}}={\left\langle {\Phi}\right\rangle }_{{t}_{n}}-{\alpha }_{n}}}^{-1}{\left\langle {\Phi},{\Phi}\right\rangle }_{{t}_{n}}.\end{equation*}$

Thus,

$\begin{equation}{\mathrm{D}}_{\text{KL},2}\left({\mu }_{{t}_{n}},{\mu }_{{t}_{n+1}}\right)\approx {\alpha }_{n}^{-2}\langle {\Phi},{{\Phi}\rangle }_{{t}_{n}}{\leqslant}\theta {\Rightarrow}{\alpha }_{n}^{-1}{\leqslant}\sqrt{\frac{\theta }{\langle {\Phi},{{\Phi}\rangle }_{{t}_{n}}}}.\end{equation} \tag{ 35 }$

We postulate that the approximation used in (35) is reasonable provided that it honours inequality (32), i.e.

$\begin{equation*}{\mathrm{D}}_{\text{KL},2}\left({\mu }_{{t}_{n}},{\mu }_{{t}_{n+1}}\right)\approx {\alpha }_{n}^{-2}\langle {\Phi},{{\Phi}\rangle }_{{t}_{n}}{\leqslant}{\alpha }_{n}^{-1}\langle {{\Phi}\rangle }_{{t}_{n}}.\end{equation*}$

Whenever this condition is not satisfied, we approximate ${\mathrm{D}}_{\text{KL},2}\left({\mu }_{{t}_{n}},{\mu }_{{t}_{n+1}}\right)$ via its upper bound as in (33). In summary, we propose the following approximation

$\begin{equation}{\mathrm{D}}_{\text{KL},2}\left({\mu }_{{t}_{n}},{\mu }_{{t}_{n+1}}\right)\approx \mathrm{min}\left\{{\alpha }_{n}^{-1}\langle {{\Phi}\rangle }_{{t}_{n}},{\alpha }_{n}^{-2}\langle {\Phi},{{\Phi}\rangle }_{{t}_{n}}\right\}.\end{equation} \tag{ 36 }$

We define ${\alpha }_{n}^{-1}$ :

$\begin{equation}{\alpha }_{n}^{-1}=\mathrm{min}\left\{\mathrm{max}\left\{\frac{\theta }{\langle {{\Phi}\rangle }_{{t}_{n}}},\sqrt{\frac{\theta }{\langle {\Phi},{{\Phi}\rangle }_{{t}_{n}}}}\right\},1-{t}_{n}\right\},\end{equation} \tag{ 37 }$

which satisfies (27) using the approximation (36), while we ensure that ${\alpha }_{n}^{-1}{\leqslant}1-{t}_{n}$ in order to satisfy (12) with n* = N; hence n* defines the natural stopping iteration level.

Expression (37) has the same form as the DMC in (14) with (i) ${\left\langle {\Phi}\right\rangle }_{{t}_{n}}$ and ${\left\langle {\Phi},{\Phi}\right\rangle }_{{t}_{n}}$ approximated with empirical mean and variance from the ensemble of particles (recall ${u}_{n}^{\left(j\right)}$ are approximate samples of ${\mu }_{{t}_{n}}$ ), and (ii) the choice θ = M/2 which we motivate from heuristic arguments in the next subsection.

3.3. A statistical discrepancy principle

According to Morozov's discrepancy principle [44], if $u\in \mathcal{H}$ satisfies

$\begin{equation}{\Vert}{{\Gamma}}^{-1/2}\left(y-\mathcal{G}\left(u\right)\right){\Vert}{\leqslant}\delta ,\end{equation} \tag{ 38 }$

where δ is the noise level defined in (18), then u is an acceptable estimate of the classical (deterministic) inverse problem (1). In the context of iterative regularisation [25], a form of the discrepancy principle is often used as an early stopping rule (e.g. see expression (20)) to avoid instabilities. In the Bayesian setting, however, the noise level is a random variable. In fact, under the assumption of additive centred Gaussian noise (1), the squared noise level δ² (see (18)) is a realisation of a chi-square random variable with M degrees of freedom. Therefore, its mean and variance are:

$\begin{equation*}\mathbb{E}\left\{{\delta }^{2}\right\}=M,\quad \mathrm{Var}\left\{{\delta }^{2}\right\}=2M.\end{equation*}$

Inspired by the classical discrepancy principle, we then postulate that an acceptable estimate, u, must satisfy either one of the following criteria

C1 (accuracy)

$\begin{equation}\mathbb{E}\left\{{{\Vert}{{\Gamma}}^{-1/2}\left(y-\mathcal{G}\left(u\right)\right){\Vert}}^{2}\right\}{\leqslant}\mathbb{E}\left\{{\delta }^{2}\right\}\equiv M.\end{equation} \tag{ 39 }$

C2 (uncertainty)

$\begin{equation}\mathrm{Var}\left\{{{\Vert}{{\Gamma}}^{-1/2}\left(y-\mathcal{G}\left(u\right)\right){\Vert}}^{2}\right\}{\leqslant}\mathrm{Var}\left\{{\delta }^{2}\right\}\equiv 2M.\end{equation} \tag{ 40 }$

We invoke this statistical discrepancy principle to determine the step size ${\alpha }_{n}^{-1}$ of the nth Bayesian sub-problem (41) that arises from transition between tempering measures ${\mu }_{{t}_{n}}$ and ${\mu }_{{t}_{n}}$ . More specifically, let us consider the following remark.

Algorithm 3. EKI with DMC (EKI-DMC).

Remark 1. We can view (10) as an iterative application of Bayes rule, where at the nth iteration level we have a prior ${\mu }_{{t}_{n}}\left(u\right)$ and a likelihood defined by the observational model

$\begin{equation}y=\mathcal{G}\left(u\right)+\sqrt{{\alpha }_{n}}\eta ,\quad \eta \sim N\left(0,{\Gamma}\right).\end{equation} \tag{ 41 }$

In other words, (10) defines a sequence of Bayesian inverse problems similar to the original one³ but with a Gaussian error $\sqrt{{\alpha }_{n}}\eta$ that has a covariance matrix Γ inflated by α_n. We can then think of ${\mu }_{{t}_{n+1}}$ given by (10) as the distribution of u|y under the observational model (41) and prior ${\mu }_{{t}_{n}}$ .

We apply (39) and (40) to the sub-problem in remark 1. To this end, we use α_nΓ and ${u}_{n}\sim {\mu }_{{t}_{n}}$ instead of Γ and u in formulas (39) and (40) to obtain

$\begin{equation}2{\alpha }_{n}^{-1}{\left\langle {\Phi}\right\rangle }_{{t}_{n}}=\mathbb{E}\left\{{{\Vert}{\left({\alpha }_{n}{\Gamma}\right)}^{-1/2}\left(y-\mathcal{G}\left({u}_{n}\right)\right){\Vert}}^{2}\right\}{\leqslant}M\end{equation} \tag{ 42 }$

$\begin{equation}4{\alpha }_{n}^{-2}{\left\langle {\Phi},{\Phi}\right\rangle }_{{t}_{n}}=\mathrm{Var}\left\{{{\Vert}{\left({\alpha }_{n}{\Gamma}\right)}^{-1/2}\left(y-\mathcal{G}\left({u}_{n}\right)\right){\Vert}}^{2}\right\}{\leqslant}2M,\end{equation} \tag{ 43 }$

where we have used the definition of Φ in (3) as well as (30). By choosing ${\alpha }_{n}^{-1}$ via (37) with θ = M/2, we enforce that one of the above criteria (42) or (43) are satisfied, while ensuring that ${t}_{n+1}={t}_{n}+{\alpha }_{n}^{-1}{\leqslant}1$ as required. In other words, the proposed statistical discrepancy principle applied to each intermediate Bayesian sub-problem arising from tempering, yields the DMC (14) which, from the previous subsection, controls Jeffreys' divergence with threshold θ = M/2. We incorporate the data-misfit controller for the selection of ${\alpha }_{n}^{-1}$ in EKI; we summarise the scheme in algorithm 3 (EKI-DMC).

Remark 2. For the applications that we discuss in section 4, we have find that the mean of the data misfit ${\left\langle {\Phi}\right\rangle }_{{t}_{n}}$ is considerably smaller than the standard deviation ${\left\langle {\Phi},{\Phi}\right\rangle }_{{t}_{n}}$ . Therefore,

$\begin{equation}\frac{\langle {{\Phi}\rangle }_{{t}_{n}}^{2}}{\langle {\Phi},{{\Phi}\rangle }_{{t}_{n}}}{\leqslant}1{< }\frac{M}{2},{\Rightarrow}\frac{M}{2\langle {{\Phi}\rangle }_{{t}_{n}}}{ >}\sqrt{\frac{M}{2\langle {\Phi},{{\Phi}\rangle }_{{t}_{n}}}}.\end{equation} \tag{ 44 }$

Hence, in this case, ${\alpha }_{n}^{-1}$ is given by condition (42) rather than condition (43). Intuitively, (44) is likely to occur in problems where the prior is sufficiently wide so that it comprises the truth. However, for a prior that is centred too far from the truth and with small variance, we would then expect for ${\left\langle {\Phi}\right\rangle }_{{t}_{n}}$ to be larger than ${\left\langle {\Phi},{\Phi}\right\rangle }_{{t}_{n}}$ . In this case, DMC selects ${\alpha }_{n}^{-1}$ according to (43) in order to allow for larger steps in the tempering settings.

Remark 3. It is not difficult to see that, except for the case in which ${\left\langle {\Phi},{\Phi}\right\rangle }_{{t}_{n}}\theta ={\left\langle {\Phi}\right\rangle }_{{t}_{n}}^{2}$ , the selection of ${\alpha }_{n}^{-1}$ via (37) satisfies only one of the criteria (42) and (43) while violating the other one. As pointed by one of the anonymous referees, if we instead select ${\alpha }_{n}^{-1}$ via

$\begin{equation}{\alpha }_{n}^{-1}=\mathrm{min}\left\{\frac{\theta }{\langle {{\Phi}\rangle }_{{t}_{n}}},\sqrt{\frac{\theta }{\langle {\Phi},{{\Phi}\rangle }_{{t}_{n}}}},1-{t}_{n}\right\},\end{equation} \tag{ 45 }$

then both criteria (42) and (43) will be satisfied. While (45) is indeed a valid choice of ${\alpha }_{n}^{-1}$ , this is a suboptimal choice for the numerical experiments that we discuss in section 4. From the observations we made in remark 2, using (45) will lead to smaller ${\alpha }_{n}^{-1}$ 's than those obtained by (37) and, thus, to many more iterations which makes the EKI algorithm more costly.

4. Numerical testing

In this section we test the performance of EKI-DMC in the context of EIT with the CEM that we introduce in subsection 4.1. We use two parameterisations of the unknown that we introduce in subsection 4.2. Implementation aspects and measures of performance are discussed in subsections 4.3 and 4.4. In subsection 4.5 and 4.6 we discuss numerical results which includes comparing the performance of EKI-DMC and EKI-LM.

4.1. Complete electrode model

Given electric currents ${\left\{{I}_{k}\right\}}_{k=1}^{{m}_{e}}$ injected through a set of surface electrodes ${\left\{{e}_{k}\right\}}_{k=1}^{{m}_{e}}$ placed on the boundary, ∂D, of D, the CEM consist of finding $\left[v,{\left\{{V}_{k}\right\}}_{k=1}^{{m}_{e}}\right]$ where v is the voltage in D and ${\left\{{V}_{k}\right\}}_{k=1}^{{m}_{e}}$ are the voltages on the electrodes. The dependent variables $\left[v,{\left\{{V}_{k}\right\}}_{k=1}^{{m}_{e}}\right]$ are given by the solution to [45]:

$\begin{equation}\nabla \cdot \kappa \nabla v=0\quad \text{in}\enspace D,\end{equation} \tag{ 46 }$

$\begin{equation}v+{z}_{k}\kappa \nabla v\cdot \mathbf{n}={V}_{k}\quad \text{on}\enspace {e}_{k},k=1,\dots ,{m}_{e},\end{equation} \tag{ 47 }$

$\begin{equation}\nabla v\cdot \mathbf{n}=0\quad \text{on}\enspace \partial D{\backslash}{\cup }_{k=1}^{{m}_{e}}{e}_{k},\end{equation} \tag{ 48 }$

$\begin{equation}{\int }_{{e}_{k}}\kappa \nabla v\cdot \mathbf{n}\enspace \mathrm{d}s={I}_{k}\quad k=1,\dots ,{m}_{e},\end{equation} \tag{ 49 }$

where κ is the electric conductivity of D and ${\left\{{z}_{k}\right\}}_{k=1}^{{m}_{e}}$ are the electrodes' contact impedances. We consider an experimental setting consisting of n_p current patterns ${I}_{1}={\left\{{I}_{1,k}\right\}}_{k=1}^{{m}_{e}},\dots {I}_{{n}_{p}}={\left\{{I}_{{n}_{p},k}\right\}}_{k=1}^{{m}_{e}}$ . For each of these patters ${\left\{{I}_{j,k}\right\}}_{k=1}^{{m}_{e}}$ , we denote by ${\left\{{V}_{j,k}\right\}}_{k=1}^{{m}_{e}}$ the prediction of voltages at the electrodes defined by the CEM (46)–(49). For simplicity we assume that the contact impedances of the electrodes are known. We define the map

$\begin{equation*}\mathcal{F}\left(\kappa \right)=V\equiv \left[{\left\{{V}_{1,k}\right\}}_{k=1}^{{m}_{e}},\dots ,{\left\{{V}_{{n}_{p},k}\right\}}_{k=1}^{{m}_{e}}\right],\end{equation*}$

that for every conductivity, produces voltage measurements. EIT consist of finding the conductivity κ^† of D given measurements of ${V}^{{\dagger}}=\mathcal{F}\left({\kappa }^{{\dagger}}\right)$ . For a review of the EIT problem we refer the reader to [46].

4.2. Parameterisations of EKI

In this subsection we introduce two maps, ${\mathcal{P}}_{1}$ and ${\mathcal{P}}_{2}$ , that we use to parameterise the unknown physical property κ^† that we wish to estimate using the EKI framework introduced earlier.

4.2.1. Parameterisation ${\mathcal{P}}_{1}$ . Smooth properties

Let us first discuss the case in which prior knowledge suggest that the unknown physical property κ^† is a smooth function. For simplicity we only consider the 2D case that we use in the numerical experiments of subsection 4.5 and 4.6 but we emphasise that the approach can be used for 1D and 3D settings. Let us introduce a WM parameterisation of the unknown κ(x) that we wish to infer via EKI. The WM parameterisation involves a positive smoothness parameter denoted by ν, an amplitude scale σ, and two intrinsic length scales, L₁ and L₂, along the horizontal and vertical direction, respectively. Given ${\Theta}=\left(\nu ,\sigma ,{L}_{1},{L}_{2}\right)\in {\mathbb{R}}_{+}{\times}\mathbb{R}{\times}{\mathbb{R}}_{+}^{2}$ , we define an operator ${\mathcal{W}}_{{\Theta}}$ that maps every ω ∈ H⁻¹⁻(D) (with > 0 arbitrary) to ${\Psi}={\mathcal{W}}_{{\Theta}}\omega$ , that satisfies the following fractional PDE in the domain D

$\begin{equation}{\left(\mathbb{I}-\nabla \cdot \mathrm{diag}\left({L}_{1}^{2},{L}_{2}^{2}\right)\nabla \right)}^{\left(\nu +1\right)/2}{\Psi}=4{\sigma }^{2}\pi \frac{{\Gamma}\left(\nu +1\right)}{{\Gamma}\left(\nu \right)}\sqrt{{L}_{1}{L}_{2}}\omega ,\end{equation} \tag{ 50 }$

with Robin boundary conditions (BCs) on ∂D:

$\begin{equation}{\Psi}-{\zeta }_{R}\mathrm{d}\mathrm{i}\mathrm{a}\mathrm{g}\left({L}_{1}^{2},{L}_{2}^{2}\right)\nabla {\Psi}\cdot \mathbf{n}=0.\end{equation} \tag{ 51 }$

where ζ_R is a tuning parameter. In (50) Γ denotes the gamma function, $\mathbb{I}$ is the identity operator, and

$\begin{equation*}\mathrm{diag}\left({L}_{1}^{2},{L}_{2}^{2}\right)\equiv \left(\begin{matrix}\hfill {L}_{1}^{2}\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill {L}_{2}^{2}\hfill \end{matrix}\right).\end{equation*}$

The proposed WM parameterisation of κ is defined by

$\begin{equation}\kappa ={\mathcal{P}}_{1}\left(\lambda ,{\Theta},\omega \right)\equiv \lambda \enspace \mathrm{exp}\left({\mathcal{W}}_{{\Theta}}\omega \right),\end{equation} \tag{ 52 }$

where λ a positive scaling factor for κ. Note that (52) enforces that the electrical conductivity is positive. We can succinctly write (52) as,

$\begin{equation}\kappa ={\mathcal{P}}_{1}\left(u\right),\quad \text{where}\;u=\left(\lambda ,{\Theta},\omega \right)=\left(\lambda ,\nu ,\sigma ,{L}_{1},{L}_{2},\omega \right).\end{equation} \tag{ 53 }$

The motivation behind this parameterisation comes from the theory of Gaussian random fields (GRFs). In particular, the work of [47] that shows that if ω ∈ H⁻¹⁺ is Gaussian white noise (i.e. $\omega \sim N\left(0,\mathbb{I}\right)$ ) then

$\begin{equation*}\mathrm{log}\enspace \kappa =\mathrm{log}\left(\lambda \right)+{\mathcal{W}}_{{\Theta}}\omega \sim N\left(\mathrm{log}\enspace \lambda ,{C}_{{\Theta}}\right),\end{equation*}$

where C_Θ is the covariance operator induced by the Matern autocorrelation function defined by

$\begin{equation*}{\text{ACF}}_{{\Theta}}\left(x\right)={\sigma }^{2}\frac{1}{{2}^{\nu -1}{\Gamma}\left(\nu \right)}{{\Vert}x{\Vert}}_{{L}_{1},{L}_{2}}^{\nu }{K}_{\nu }\left({{\Vert}x{\Vert}}_{{L}_{1},{L}_{2}}\right),\end{equation*}$

where K_ν is the modified Bessel function of the second kind of order ν, and

$\begin{equation*}{{\Vert}x{\Vert}}_{{L}_{1},{L}_{2}}\equiv \sqrt{\frac{{x}_{1}^{2}}{{L}_{1}^{2}}+\frac{{x}_{2}^{2}}{{L}_{2}^{2}}}\end{equation*}$

It can also be shown that if log κ ∼ N(log λ, C_Θ), then almost surely log κ ∈ H^ν−(D) [48] which further shows the role of the smoothness parameter ν.

Our choice for the BCs in (51) follows from the work of [49] that shows that Robin BCs are better suited, compared to Neumman and Dirichlet, to alleviate (via the appropriate choice of ζ_R) undesirable boundary effects which arise from the discretisation of GRFs. Let us reiterate that our goal here is to introduce parameterisations for EKI and then suitable initial ensembles on the parameters. Whether the initial ensemble yields Gaussian (or log-Gaussian) properties is not our main concern. However, it would not be advisable to select parameterisations that, for example, restrict the values of the physical property near the boundary which is something that we would expect if ζ_R = 0 in (51).

The WM parameterisation in (52) allows us to incorporate the smoothness and lengthscales of the underlying field as part of the unknown that we can estimate with EKI. In contrast to the work of [3] where the lengthscales are estimated under isotropic assumptions, here we consider the anisotropic case. We focus on vertical/horizontal anisotropy but a rotation matrix can be further introduced in (50) to characterise properties with an arbitrary (and unknown) direction of anisotropy [49]

A straightforward approach to generate the initial ensemble for EKI with the parameterisation from (52) is to specify (hyper prior) densities, π_λ, ${\pi }_{{L}_{1}}$ , ${\pi }_{{L}_{2}}$ , π_σ and π_ν, and produce samples:

$\begin{equation}{u}_{0}^{\left(j\right)}\equiv \left({\lambda }^{\left(j\right)},{\nu }^{\left(j\right)},{\sigma }^{\left(j\right)},{L}_{1}^{\left(j\right)},{L}_{2}^{\left(j\right)},{\omega }^{\left(j\right)}\right),\sim {\pi }_{\lambda }\otimes {\pi }_{\nu }\otimes {\pi }_{\sigma }\otimes {\pi }_{{L}_{1}}\otimes {\pi }_{{L}_{2}}\otimes N\left(0,\mathbb{I}\right)\end{equation} \tag{ 54 }$

Remark 4. (KL expansion). For some geometries of D, the eigenfunctions and eigenvalues of a Matern covariance C_Θ are available in closed form [48]. In this case, an equivalent formulation of the WM parameterisations can be defined in terms of the spectral decomposition of C_Θ (i.e. KL expansion under the prior). Of course, for computational purposes WM parameterisations can be defined on a larger (simple) domain that encloses D and simply restrict the physical property to the domain of interest. While the spectral/KL approach is more standard, the approach we use here based on the work of [47] allows to naturally define the parametrisation in (53) via (50).

Remark 5. (smoothness parameter and amplitude scales). For simplicity, in this work the smoothness parameter ν and the amplitude scales σ in the WM are fixed, i.e. we leave these parameters our of the inversion. For the experiments that we discuss in the subsequent sections, the spatial variability in the unknown can be captured quite effectively only via ω(x). However, we recognise that including a variable σ can be beneficial in some cases as reported in [49]. Similarly, in the context of the experiments reported later, where the aim is mainly to recover medium anomalies, we find that sensible (fixed) choices of ν are sufficient, provided that the lengthscales are properly estimated via EKI. Nevertheless, as shown in [3], EKI can be used to estimate the smoothness parameter ν.

4.2.2. Parameterisation ${\mathcal{P}}_{2}$ . Piecewise-constant functions

In order to characterise piecewise-smooth functions we use the level-set approach initially proposed in [50] for deterministic inverse problems and, more recently, for fully Bayesian [48, 51] and EKI [2] settings. Our main modelling assumptions is that the unknown property has a background value potentially heterogeneous, and that possible anomalies/defects consist of regions with (also possibly heterogeneous) higher/lower values than those in the background field. More specifically, let us first define the parameterisation

$\begin{equation}\kappa \left(x\right)=\mathcal{H}\left({\left\{{\kappa }_{\iota }\right\}}_{\iota \in \left\{\mathrm{l},\mathrm{b},\mathrm{h}\right\}},f\right)\equiv \begin{cases}_{\mathrm{l}}\quad f\left(x\right){\leqslant}{\zeta }_{1}\hfill \\ {\kappa }_{\mathrm{b}}\quad {\zeta }_{1}{< }f\left(x\right){\leqslant}{\zeta }_{2}\hfill \\ {\kappa }_{\mathrm{h}}\quad f\left(x\right){ >}{\zeta }_{2},\hfill \end{cases}\end{equation} \tag{ 55 }$

where κ_b, κ_l and κ_h denote the background, low-value and high-value fields that characterise the unknown physical property. For simplicity we assume these variables take only constant values. The level-set function denoted by f(x) determines the background Ω_b ≡ {x : ζ₁ < f(x) ⩽ ζ₂} as well as the region of low Ω_l ≡ {x : f(x) ⩽ ζ₁} and high Ω_h ≡ {x ∈ f(x) > ζ₂} values. We assume that f is a smooth fields with some variability that we enforce via a second level of parameterisation in terms of ${\mathcal{P}}_{1}$ introduced earlier. More specifically, we consider

$\begin{equation}f=\mathrm{log}\enspace {\lambda }_{f}+{\mathcal{W}}_{{{\Theta}}_{f}}{\omega }_{f}=\mathrm{log}\left({\mathcal{P}}_{1}\left({u}_{f}\right)\right),\end{equation} \tag{ 56 }$

where ${\mathcal{P}}_{1}$ is defined according to (52) and

$\begin{equation*}{u}_{f}=\left({\lambda }_{f},{\nu }_{f},{\sigma }_{f},{L}_{1,f},{L}_{2,f},{\omega }_{f}\right).\end{equation*}$

Combining (55) with (56) we can write

$\begin{equation}\kappa ={\mathcal{P}}_{2}\left(u\right),\quad \text{with}\;u\equiv \left\{{\left\{{\kappa }_{\iota }\right\}}_{\iota \in \left\{\mathrm{l},\mathrm{b},\mathrm{h}\right\}},{u}_{f}\right\},\end{equation} \tag{ 57 }$

where

$\begin{align}\hfill {\mathcal{P}}_{2}\left(u\right)& =\mathcal{H}\left({\left\{{\kappa }_{\iota }\right\}}_{\iota \in \left\{\mathrm{l},\mathrm{b},\mathrm{h}\right\}},\mathrm{log}\left({\mathcal{P}}_{1}\left({u}_{f}\right)\right)\right)\hfill \\ \hfill & =\mathcal{H}\left({\left\{{\kappa }_{\iota }\right\}}_{\iota \in \left\{\mathrm{l},\mathrm{b},\mathrm{h}\right\}},\mathrm{log}\left({\mathcal{P}}_{1}\left({\lambda }_{f},{\nu }_{f},{\sigma }_{f},{L}_{1,f},{L}_{2,f},{\omega }_{f}\right)\right)\right).\hfill \end{align} \tag{ 58 }$

The selection of the initial ensemble for u_f can be done similarly to the one in (54). This parameterisation can be extended for the case in which ${\left\{{\kappa }_{\iota }\right\}}_{\iota \in \left\{\mathrm{l},\mathrm{b},\mathrm{h}\right\}}$ are unknown functions via using ${\mathcal{P}}_{1}$ to parameterise each of these functions.

4.3. Implementation

For the numerical experiments discussed in the following subsections, algorithms 2 and 3 are implemented in MATLAB. Let us recall that for these algorithms we need to construct the forward map $\mathcal{G}=\mathcal{F}{\circ}\mathcal{P}$ where $\mathcal{F}$ is the operator induced by the CEM and $\mathcal{P}$ is any of the parameterisations defined earlier. The numerical implementation of the CEM from subsection 4.1 is conducted using MATLAB software EIDORS [52]. The experimental setting consists of (i) a circular domain of unit radius centred at the origin, (ii) 16 surface electrodes with contact impedances of values 0.01 Ohms, (iii) an adjacent injection pattern with an electric current of 0.1 Amps, and (iv) measurements at each electrode. All these parameters are assumed known and fixed for the inversions of this section. Synthetic data are generated using the mesh from figure 1 (right) with 9216 elements, while a coarser mesh of 7744 elements is used for inversions. The total number of measurements is M = 16² = 256.

**Figure 1.** True log conductivity log(κ^†) for **Exp_EIT**₁ (left) and **Exp_EIT**₂ (middle). Right: mesh for the generation with synthetic data for all EIT experiments.
Download figure:
Standard image High-resolution image

For the evaluation of the WM parameterisation $\kappa ={\mathcal{P}}_{1}\left(u\right)$ , we solve (50) and (51) using the techniques from [47] and which restrict us to the cases in which $\nu \in \mathbb{N}$ . The discretisation of the operator $\mathbb{I}-\nabla \cdot \mathrm{diag}\left({L}_{1},{L}_{2}\right)\nabla$ with BCs from (51) is performed via cell-centred finite differences. The PDE in (50) and (51) is solve in a square domain, equal to, or enclosing D (the domain of definition for the PDE encoded in $\mathcal{F}$ ). The actual field κ(x) that we pass into the CEM is an interpolation of $\mathcal{P}\left(u\right)$ on D. Parameterisations ${\mathcal{P}}_{2}$ is based on the truncation of the level-set function, f(x) so the implementation is straightforward once all the fields κ_ι (ι ∈ l, b, h) and f(x) are computed. Given these construction of $\mathcal{G}$ , the rest of the steps in algorithms 2 and 3 are computed in a straightforward manner.

4.4. Measures of performance for EKI

Given the ensemble $\left\{{u}_{n}^{\left(j\right)}\right\}$ computed via EKI at the n iteration of the scheme (n = 0 corresponds to the prior ensemble), our estimate of the unknown property κ^† is given by

$\begin{equation}{\kappa }_{n}\equiv \mathcal{P}\left({\bar{u}}_{n}\right)=\mathcal{P}\left(\frac{1}{J}\sum\limits _{j=1}^{J}{u}_{n}^{\left(j\right)}\right).\end{equation} \tag{ 59 }$

where $\mathcal{P}$ is either ${\mathcal{P}}_{1}$ or ${\mathcal{P}}_{2}$ defined above. We measure the accuracy in terms of the relative error with respect to (w.r.t.) the truth defined by

$\begin{equation}{\mathcal{E}}_{n}=\frac{{{\Vert}{\kappa }_{n}-{\kappa }^{{\dagger}}{\Vert}}_{{L}^{2}\left(D\right)}}{{{\Vert}{\kappa }^{{\dagger}}{\Vert}}_{{L}^{2}\left(D\right)}}.\end{equation} \tag{ 60 }$

We often visualise some transformed ensemble members (mainly for the initial ensemble n = 0), i.e.

$\begin{equation}{\kappa }_{n}^{\left(j\right)}\equiv \mathcal{P}\left({u}_{n}^{\left(j\right)}\right),\quad j\in \left\{1,\dots ,J\right\},\end{equation} \tag{ 61 }$

but note that our estimate κ_n in (59) does not involve taking the average of the particles in (61); this would be particularly detrimental for ${\mathcal{P}}_{2}$ since averaging (61) will not preserve discontinuities. For these two parameterisations we also visualise an estimate of the level-set function given by

$\begin{equation}{f}_{n}\equiv \mathrm{log}\enspace {\mathcal{P}}_{1}\left({\bar{u}}_{f,n}\right)\end{equation} \tag{ 62 }$

We additionally monitor the following data-misfit quantities

$\begin{equation}{\mathcal{DM}}_{1,n}={\Vert}{{\Gamma}}^{-1/2}\left[y-\frac{1}{J}\sum\limits _{j=1}^{J}\mathcal{G}\left({u}_{n}^{\left(j\right)}\right)\right]{\Vert}\end{equation} \tag{ 63 }$

$\begin{equation}{\mathcal{DM}}_{2,n}={\Vert}{{\Gamma}}^{-1/2}\left(y-\mathcal{G}\left({\bar{u}}_{n}\right)\right){\Vert}\end{equation} \tag{ 64 }$

$\begin{equation}{\mathcal{DM}}_{3,n}={\left[\frac{1}{J}\sum\limits _{j=1}^{J}{{\Vert}{{\Gamma}}^{-1/2}\left(y-\mathcal{G}\left({u}_{n}^{\left(j\right)}\right)\right){\Vert}}^{2}\right]}^{1/2}.\end{equation} \tag{ 65 }$

4.5. Numerical experiment Exp_EIT₁. Continuous conductivity

For the first series of experiments the true conductivity, κ^†, is a C^∞(D) function that we specify analytically; the plot of κ^† is displayed in figure 1 (left). Synthetic data are constructed via y = V^† + η where ${V}^{{\dagger}}=\mathcal{F}\left({\kappa }^{{\dagger}}\right)$ is computed with the CEM and η is a realisation from N(0, Γ). We chose Γ = diag(γ₁, ..., γ_M) where

$\begin{equation}\begin{gathered}{\gamma }_{m}={\left(1{0}^{-2}\vert {V}_{m}^{{\dagger}}\vert \right)}^{2}+{\left(1{0}^{-3}\left\vert \right.\mathrm{max}{\left\{{V}_{m}^{{\dagger}}\right\}}_{m=1}^{M}-\mathrm{min}{\left\{{V}_{m}^{{\dagger}}\right\}}_{m=1}^{M}\left\vert \right.\right)}^{2}\\ m=1,\dots ,M.\end{gathered}\end{equation} \tag{ 66 }$

The first term in the right-hand side of (66) corresponds to adding 1% Gaussian noise. The second term is added simply to avoid small variances from very small voltage (noise-free) measurements.

For this experiment we use the parameterisation of smooth functions, $\kappa ={\mathcal{P}}_{1}\left(u\right)$ , from (52) with u = (λ, L₁, L₂, ω). Note that we have removed ν and σ from the inversion as we discussed in remark 5; we use fixed values ν = 3 and σ = 1.5. The unknown consist of 3 scalars and 1 function, ω(x), that we discretise on a 100 × 100 grid. Upon discretisation, the dimension of the unknown is $\mathrm{dim}\left(\mathcal{U}\right)=10\enspace 003$ . We follow the discussion of subsection 4.2.1 (see equation (54)) for the selection of the initial ensemble. More specifically, we select J particles ${u}_{0}^{\left(j\right)}=\left({\lambda }^{\left(j\right)},{L}_{1}^{\left(j\right)},{L}_{2}^{\left(j\right)},{\omega }^{\left(j\right)}\right)\sim {\mu }_{0}$ where we define the following prior

$\begin{equation}{\mu }_{0}\equiv U\left[5{\times}1{0}^{-3},1\right]\otimes U\left[0.15,0.6\right]\otimes U\left[0.15,0.6\right]\otimes N\left(0,\mathbb{I}\right)\end{equation} \tag{ 67 }$

where U[a, b] denotes the uniform distribution on the interval [a, b].

In figure 2 (top) we show plots for the logarithm of ${\kappa }_{0}^{\left(j\right)}={\mathcal{P}}_{1}\left({u}_{0}^{\left(j\right)}\right)$ for five of members of the initial ensemble. Note that our choice of smoothness parameter ν = 3 produces an initial ensemble that is quite smooth (recall from subsection 4.2.1 that the draws belong to H³⁻(D)). We also observe substantial differences in the degree of anisotropy which arises from our selection of a reasonably wide distribution of intrinsic lengthscales that we use to produce the initial ensemble.

**Figure 2.** **Exp_EIT**₁. Top row: logarithm of five members from the prior ensemble ${\left\{{\kappa }_{0}^{\left(j\right)}\right\}}_{j=1}^{J}$ . Bottom row: logarithm of five realisations from the final (converged) ensemble ${\left\{{\kappa }_{{n}^{{\ast}}}^{\left(j\right)}\right\}}_{j=1}^{J}$ .
Download figure:
Standard image High-resolution image

4.5.1. Results from the inversion using EKI-DMC with various choices of J

Synthetic data and initial ensembles produced as described earlier are used as inputs for EKI-DMC (algorithm 3) with different choices of ensemble size J: 100, 200, 400, 800. For each choice of J, we conduct 30 experiments with different random selections of the initial ensemble. The plots of (log) ${\kappa }_{{n}^{{\ast}}}={\mathcal{P}}_{1}\left({\bar{u}}_{{n}^{{\ast}}}\right)$ (i.e. upon convergence) from one of these experiments, computed for each J, are displayed in figure 3 together with the truth (right panel). In figure 4 (left) we show boxplots of the relative error w.r.t. the truth ${\mathcal{E}}_{{n}^{{\ast}}}$ (60), computed at the final iteration, from the set of 30 experiments conducted for each J. Boxplots of the data misfits defined in (63)–(65) are shown in the right panel of figure 4 where the mean noise level, approximated via $\delta =\sqrt{M}$ , is indicated via the horizontal dotted red line. The average (over the 30 experiments) number of iterations to converge, n*, is displayed in table 1. These experiments suggest that the choice of J = 200 provides a reasonable value of the data misfit (i.e. around the noise level). Furthermore, the average relative error w.r.t. the truth for J = 200 (approximately 30%) does not improve substantially as we increase J. Given these considerations, we select J = 200 for subsequent runs of Exp _ EIT₁.

**Figure 3.** **Exp** _ **EIT**₁. Logarithm of ${\kappa }_{{n}^{{\ast}}}\equiv {\mathcal{P}}_{1}\left({\bar{u}}_{{n}^{{\ast}}}\right)$ computed via the **EKI-DMC** with ensemble size (from left to right) J = 100, 200, 400, 800. Right panel shows the log of the truth.
Download figure:
Standard image High-resolution image

**Figure 3.** **Exp** _ **EIT**₁. Logarithm of ${\kappa }_{{n}^{{\ast}}}\equiv {\mathcal{P}}_{1}\left({\bar{u}}_{{n}^{{\ast}}}\right)$ computed via the **EKI-DMC** with ensemble size (from left to right) J = 100, 200, 400, 800. Right panel shows the log of the truth.
Download figure:
Standard image High-resolution image

**Figure 4.** **Exp**_EIT₁. Error with respect to the truth (left), ${\mathcal{E}}_{{n}^{{\ast}}}$ (see (60)), and data misfits from (63)–(65) (right) computed at the final iteration n* via **EKI-DMC**. The noise level estimated by $\delta =\sqrt{M}$ is indicated with the dotted red-line in the right panel.
Download figure:
Standard image High-resolution image

**Figure 4.** **Exp**_EIT₁. Error with respect to the truth (left), ${\mathcal{E}}_{{n}^{{\ast}}}$ (see (60)), and data misfits from (63)–(65) (right) computed at the final iteration n* via **EKI-DMC**. The noise level estimated by $\delta =\sqrt{M}$ is indicated with the dotted red-line in the right panel.
Download figure:
Standard image High-resolution image

Table 1. Number of iterations n* for EKI using the DMC (EKI-DMC) with various choices of J.

	Exp_EIT₁	Exp_EIT₂
DM J = 100	10.00 ± 0.53	13.20 ± 2.50
DM J = 200	10.00 ± 0.00	12.83 ± 0.59
DM J = 400	10.30 ± 0.47	14.53 ± 0.63
DM J = 800	11.03 ± 0.18	16.87 ± 0.51

4.5.2. Further results from one run of EKI-DMC with J = 200

In figure 2 (bottom) we displayed 5 members of the final (converged) ensemble of (log) ${\kappa }_{{n}^{{\ast}}}^{\left(j\right)}$ corresponding to the initial ensemble from figure 2 (top). Plots of the log of ${\kappa }_{n}={\mathcal{P}}_{1}\left({\bar{u}}_{n}\right)$ (see equation (59)), at some of the intermediate iterations 1 ⩽ n < n* = 10 can be found in figure 5. In figure 6 (right) we plot, as a function of n, the values of the means $\bar{\lambda }$ , ${\bar{L}}_{1}$ , and ${\bar{L}}_{2}$ , of the ensembles ${\left\{{\lambda }^{\left(j\right)}\right\}}_{j=1}^{J}$ , ${\left\{{L}_{1}^{\left(j\right)}\right\}}_{j=1}^{J}$ and ${\left\{{L}_{2}^{\left(j\right)}\right\}}_{j=1}^{J}$ , respectively⁴ . Note that EKI produces a larger lengthscale in the vertical direction. This comes as no surprize since the truth κ^† (see figure 1 (left)) has two inclusions of lower conductivity with larger correlation along the vertical direction. Finally, in figure 6 (right and middle) we display, for each of the 30 runs, the relative error w.r.t. the truth as well as the (log) data misfit (63), as a function of the iteration number n. We note that EKI-DMC is very robust and accurate across ensembles.

**Figure 5.** **Exp**_EIT₁. Logarithm of ${\kappa }_{n}={\mathcal{P}}_{1}\left({\bar{u}}_{n}\right)$ computed via the **EKI-DMC** at various intermediate iterations n (1 ⩽ n ⩽ n*) computed using one ensemble of size J = 200.
Download figure:
Standard image High-resolution image

**Figure 5.** **Exp**_EIT₁. Logarithm of ${\kappa }_{n}={\mathcal{P}}_{1}\left({\bar{u}}_{n}\right)$ computed via the **EKI-DMC** at various intermediate iterations n (1 ⩽ n ⩽ n*) computed using one ensemble of size J = 200.
Download figure:
Standard image High-resolution image

**Figure 6.** **Exp**_EIT₁. Plots of the relative error w.r.t. the truth (left), data misfit ${\mathcal{DM}}_{1,n}$ (middle) and means ${\bar{L}}_{1}$ , ${\bar{L}}_{2}$ and $\bar{\lambda }$ (right) as a function of n. The left and middle panels show results from 30 runs.
Download figure:
Standard image High-resolution image

**Figure 6.** **Exp**_EIT₁. Plots of the relative error w.r.t. the truth (left), data misfit ${\mathcal{DM}}_{1,n}$ (middle) and means ${\bar{L}}_{1}$ , ${\bar{L}}_{2}$ and $\bar{\lambda }$ (right) as a function of n. The left and middle panels show results from 30 runs.
Download figure:
Standard image High-resolution image

4.5.3. Comparison between EKI-DMC and EKI-LM

We compare the performance of algorithms 2 and 3 using the same set of 30 initial ensembles for each algorithm using J = 200. In [2, 3], this selection of J was sufficient to provide stable and accurate estimates for EIT. We consider different choices of the input ρ in EKI-LM and, for simplicity, we set τ = 1/ρ + 10⁻⁶. In figure 7 we show boxplots of the error w.r.t. the truth (right) and the data misfit ${\mathcal{DM}}_{1,{n}^{{\ast}}}$ (left) obtained with several choices of ρ for EKI-LM; the results from EKI-DMC for J = 200 are also included in these plots. Similar behaviour is observed for ${\mathcal{DM}}_{2,{n}^{{\ast}}}$ and ${\mathcal{DM}}_{3,{n}^{{\ast}}}$ and so these plots are omitted. The number of iterations for EKI-LM to achieve convergence, n*, is displayed in table 2. For one of the 30 runs, in figure 8 we show the plots of the log of ${\kappa }_{{n}^{{\ast}}}={\mathcal{P}}_{1}\left({\bar{u}}_{{n}^{{\ast}}}\right)$ computed with both algorithms using the selections of parameters described above.

**Figure 7.** **Exp**_EIT₁. Error with respect to the truth (left), ${\mathcal{E}}_{{n}^{{\ast}}}$ (see (60)), and data misfit ${\mathcal{DM}}_{1,{n}^{{\ast}}}$ (63) (right) computed at the final iteration n* using **EKI-DMC** and **EKI-LM** with various choices of ρ. The noise level estimated by $\delta =\sqrt{M}$ is indicated with the dotted red-line in the right panel.
Download figure:
Standard image High-resolution image

Table 2. Number of iterations n* for EKI using the LM approach (EKI-LM) with various choices of ρ.

	Exp_EIT₁	Exp_EIT₂
LM ρ = 0.5	8.57 ± 0.50	10.30 ± 0.79
LM ρ = 0.6	10.53 ± 0.51	14.10 ± 1.79
LM ρ = 0.7	14.40 ± 0.56	20.03 ± 1.45
LM ρ = 0.8	21.80 ± 0.76	32.53 ± 2.65

**Figure 8.** **Exp_EIT**₁. Logarithm of ${\kappa }_{{n}^{{\ast}}}={\mathcal{P}}_{1}\left({\bar{u}}_{{n}^{{\ast}}}\right)$ computed using the same initial ensemble (J = 200) with **EKI-LM** for several choices of ρ. In the top-right panel with display the log of ${\kappa }_{{n}^{{\ast}}}$ that we obtain with **EKI-DMC**.
Download figure:
Standard image High-resolution image

**Figure 8.** **Exp_EIT**₁. Logarithm of ${\kappa }_{{n}^{{\ast}}}={\mathcal{P}}_{1}\left({\bar{u}}_{{n}^{{\ast}}}\right)$ computed using the same initial ensemble (J = 200) with **EKI-LM** for several choices of ρ. In the top-right panel with display the log of ${\kappa }_{{n}^{{\ast}}}$ that we obtain with **EKI-DMC**.
Download figure:
Standard image High-resolution image

While EKI-DMC does not depend on any tuning parameter, the results above show that EKI-LM is highly dependent on the selection of ρ (which is specified a priori). From table 2 we see that, for ρ = 0.6, the computational cost of EKI-LM is similar to the cost of EKI-DMC; i.e. convergence in approximately 10 iterations (in average). For this ρ, the average error computed with EKI-LM (≈38%) is, however, larger than the error obtained via EKI-DMC (≈30.5%). Although the improvement in accuracy of EKI-DMC over EKI-LM may not be overly impressive, from figure 8 (computed from one run) we note that the area of high conductivity is not accurately captured by EKI-LM with ρ < 0.8. For these experiments, if instead of ρ = 0.6 we choose, say ρ = 0.8 (see table 2), the computational cost of EKI-LM doubles without improving its accuracy with respect to EKI-DMC.

Furthermore, from figure 7 we also observe that both the data-misfit ${\mathcal{DM}}_{1,{n}^{{\ast}}}$ and the error w.r.t. the truth obtained via EKI-LM decreases as we increase ρ. This behaviour is expected since large ρ implies smaller τ and, hence, smaller data misfits upon convergence (see stopping criteria for EKI-LM in equation (22)). Even though the measures of performance of EKI-LM (see figure 7) seem to approach those of EKI-DMC for increasing ρ, experiments (not shown) conducted with even larger ρ's, (i.e. ρ ⩾ 0.9) do not yield the same level of accuracy that we achieve with EKI-LM. This is due to the fact that the stopping criteria in EKI-LM does not allow the algorithm to iterate once ${\mathcal{DM}}_{1,{n}^{{\ast}}}$ is smaller than $\tau \delta =\tau \sqrt{M}$ (recall τ > 1/ρ). Hover, for this particular problem, it seems that iterating below this threshold yields more accurate estimates. Indeed, we note that for the estimates obtained by EKI-DMC, the value of ${\mathcal{DM}}_{1,{n}^{{\ast}}}$ is smaller than the τδ for all the 30 runs. We conclude that, for this particular problem setting, the stopping criteria in EKI-LM is not optimal.

In order to further understand the potential limitation of the stopping criteria in EKI-LM as proposed in [2], in figure 9 we display the behaviour of log α_n as a function of n, computed from one run (with same initial ensemble) of EKI-DMC and EKI-LM with ρ = 0.6. As mentioned above, for this particular selection of ρ, both EKI-LM and EKI-DMC converged in 10 iterations so the cost of running both algorithms is the same. We find that the selection of α_n via EKI-LM yields quite large values of α_n compared to those obtained via EKI-DMC. Larger α_n's, as discussed in subsection 2.1, means slower convergence. Although we can obtain smaller α_n's using EKI-LM with smaller ρ's, this will result in larger τ's and, consequently, earlier termination with is detrimental to the accuracy of the algorithm.

The aforementioned limitation of the stopping criteria in EKI-LM can be addressed, for example, by adopting the same stopping rule that we use in EKI-DMC. In other words, we may replace (22) by imposing that the α_n's in EKI-LM satisfy (12). Such an approach has been used, for example, in [42] in the context of history matching of petroleum reservoirs. We have repeated our experiments (not shown) using (12) as stopping rule for EKI-LM which show that, indeed, using (12) enable us to run the algorithm beyond the noise level and, hence, obtain more accurate estimates at a lower computational cost. However, the role of ρ still determines the performance of EKI-LM and, while (12) allows us to use smaller ρ's (thus smaller α_n's and faster convergence), for some of these small values the algorithm loses stability.

In summary, we may find problem-specific tuning parameters ρ and τ in EKI-LM, or modifications of EKI-LM by adopting (12) as stopping rule, that will yield comparable level of accuracy and even similar computational cost to those achieved by EKI-DMC. However, finding optimal stopping criteria and/or tuning parameters may require thorough numerical testing which is often computationally intensive and unsuitable for practical large-scale applications.

4.6. Numerical experiment Exp_EIT₂. Discontinuous (piecewise constant) conductivity

The true conductivity for the experiments of this section is the piecewise constant function with plot displayed in figure 1 (middle). The background, low and high conductivity regions have constant values:

$\begin{equation}{\kappa }_{\mathrm{b}}^{{\dagger}}=0.125,\quad {\kappa }_{\mathrm{l}}^{{\dagger}}=0.025,\quad {\kappa }_{\mathrm{h}}^{{\dagger}}=1.0,\end{equation} \tag{ 68 }$

respectively. Synthetic data are generated in the same way as described in the previous subsection, and with noise covariance from (66).

In figure 10 we show the plots of (log) ${\kappa }_{{n}^{{\ast}}}$ computed from one run with EKI-DMC using the same parameterisation (for continuous functions) and initial ensemble from Exp_EIT₁. While these results show that we can identify the three different regions of different conductivity, the reconstruction of the interface between these regions is quite inaccurate because of smoothness enforced by the parameterisation used within EKI. We overcome this limitation by means of a level-set parameterisations within the EKI framework.

4.6.1. Level-set parameterisation and prior ensemble

For the series of experiment in this subsection we test EKI with the level-set parameterisation $\kappa ={\mathcal{P}}_{2}\left(u\right)$ from (57). As discussed in remark 5 we remove σ_f and ν_f from the inversion. Here we use the values ν_f = 2 and σ_f = 0.5. In addition, we also leave λ_f out of the inversion; we take λ_f = 1. This selection is made for simplicity since we expect to capture all the variability of the level-set function via the term ${\mathcal{W}}_{{{\Theta}}_{f}}{\omega }_{f}$ in (56). The unknown reduces to

$\begin{equation*}u=\left({\kappa }_{\mathrm{l}},{\kappa }_{\mathrm{b}},{\kappa }_{\mathrm{h}},{L}_{1,f},{L}_{2,f},{\omega }_{f}\right),\end{equation*}$

which consists of 5 scalars and one function, ω_f, which we discretise also on a 100 × 100 grid. Hence, the dimension of the unknown u is $\mathrm{dim}\left(\mathcal{U}\right)=10\enspace 005$ . We select the initial ensemble according to

$\begin{align}\hfill {u}_{0}^{\left(j\right)}\sim & \;U\left[0.015,0.075\right]\otimes U\left[0.1,0.4\right]\otimes U\left[0.65,1.1\right]\hfill \\ \hfill & \otimes U\left[0.15,0.6\right]\otimes U\left[0.15,0.6\right]\otimes N\left(0,\mathbb{I}\right),\hfill \end{align} \tag{ 69 }$

From (56) we know that this selection produces a centred ensemble of initial level-set functions. It is worth noticing from (69) that there is no overlapping among the support of the distributions for the conductivity values (i.e. κ_l, κ_b and κ_h) on each region. Furthermore, each of these intervals contain the true values form (68). Therefore, we work under the assumption that (i) there is clear contrast between the unknown values on each region and (ii) we have good knowledge of the range of possible values for the unknown conductivity in each of those regions.

In figure 11 we show the plots of some $\mathrm{log}\enspace {\kappa }_{0}^{\left(j\right)}=\mathrm{log}\enspace {\mathcal{P}}_{2}\left({u}_{0}^{\left(j\right)}\right)$ (top panels) and the corresponding level-set functions ${f}_{0}^{\left(j\right)}=\mathrm{log}\left({\mathcal{P}}_{1}\left({u}_{f,0}^{\left(j\right)}\right)\right)$ (bottom panels) from five ensemble members. Our selection ν_f = 2 imposes smoothness in the level-set function and, in turn, in the interface between the three regions of low, high and background conductivity. From figure 11 (bottom) we notice that the ensemble of level-sets displays anisotropy induced by randomising the intrinsic lengthscales in the level-set function. This variability can be seen in the corresponding interface between regions of different conductivities (top row figure 11). Note that the values of conductivity within each region are variable across particles.

4.6.2. Results from several choices of J in EKI-DMC

In figure 12 we show boxplots of the relative error w.r.t. the truth as well as data misfits (63)–(65). As before, these are results from 30 EKI-DMC runs using different initial ensembles for each choice of J. We can clearly see a decrease in the error w.r.t. the truth as we increase J, while the data misfits ${\mathcal{DM}}_{1,n}$ and ${\mathcal{DM}}_{2,n}$ achieve values close to the noise level (indicated via dotted red line) for J ⩾ 200. We see that, again, J = 200 is a good compromise between accuracy and cost. Using J = 400 will double the cost with a marginal improvement in accuracy. The choice of J = 200 also yields reasonable visual results as we can verify from figure 13 where, for one run of EKI-DMC, we display the log of ${\kappa }_{{n}^{{\ast}}}={\mathcal{P}}_{2}\left({\bar{u}}_{{n}^{{\ast}}}\right)$ (top panels) and the level-set function ${f}_{{n}^{{\ast}}}=\mathrm{log}\left({\mathcal{P}}_{1}\left({\bar{u}}_{f,{n}^{{\ast}}}\right)\right)$ (bottom panels).

**Figure 13.** **Exp**_EIT₂. Plots of (log) ${\kappa }_{{n}^{{\ast}}}$ (top row) and the corresponding level-set function ${f}_{{n}^{{\ast}}}$ (bottom row) computed with **EKI-DMC** with different choice of ensemble size J. Top-right panel shows log(κ^†).
Download figure:
Standard image High-resolution image

Although we note that the average data misfit ${\mathcal{DM}}_{3,{n}^{{\ast}}}$ in figure 12 seems to increase with J, experiments (not shown) with larger J suggest that ${\mathcal{DM}}_{3,{n}^{{\ast}}}$ eventually stabilises. The increase in ${\mathcal{DM}}_{3,{n}^{{\ast}}}$ can be attributed to the fact that larger J's produces a better spread/coverage of the distribution of particles; some of these particles yield a larger data misfit within EKI⁵ . The ensemble mean ${\bar{u}}_{n}$ , however, is quite accurate (see figure 13, for J = 200) and so the corresponding ${\kappa }_{n}={\mathcal{P}}_{2}\left({\bar{u}}_{n}\right)$ yields reasonable values of the ${\mathcal{DM}}_{2,{n}^{{\ast}}}$ (i.e. close to the noise level).

4.6.3. Results from one run of EKI-DMC with J = 200

Figure 14 shows some members from the final (converged) ensemble corresponding to the initial ensemble from figure 11; the top panels are $\mathrm{log}\enspace {\kappa }_{{n}^{{\ast}}}^{\left(j\right)}={\mathcal{P}}_{2}\left({u}_{{n}^{{\ast}}}^{\left(j\right)}\right)$ while the bottom panels show the level-set functions ${f}_{{n}^{{\ast}}}^{\left(j\right)}=\mathrm{log}\left({\mathcal{P}}_{1}\left({u}_{f,{n}^{{\ast}}}^{\left(j\right)}\right)\right)$ . This figure shows that there is, indeed, significant variability across particles of the converged ensemble ${\kappa }_{{n}^{{\ast}}}^{\left(j\right)}$ and, in turn, possible large spread in the values of the data misfit obtained using each of these conductivities (hence potentially large values of ${\mathcal{DM}}_{3,n}$ ). The average error w.r.t. the truth and (log) data misfit ${\mathcal{DM}}_{1,n}$ are shown in figure 15, as a function of the iteration number n (these are results from our 30 runs). Note that, in contrast to the previous experiment, the error displays more variability across ensembles.

**Figure 14.** **Exp**_EIT₂. Five members of the converged ensemble of (log) ${\left\{{\kappa }_{{n}^{{\ast}}}^{\left(j\right)}\right\}}_{j=1}^{J}$ (top row) and their corresponding level-set set function ${\left\{{f}_{{n}^{{\ast}}}^{\left(j\right)}\right\}}_{j=1}^{J}$ (bottom row).
Download figure:
Standard image High-resolution image

**Figure 15.** **Exp**_EIT₂. Plots of the relative error w.r.t. the truth (left) and data misfit ${\mathcal{DM}}_{1,n}$ (right) computed from 30 runs with **EKI-DMC**.
Download figure:
Standard image High-resolution image

Plots of log κ_n and the level-set function f_n, at some of the intermediate iterations 1 ⩽ n < n*, are displayed in the top and bottom panels of figure 16, respectively. We can see that EKI not only estimates the shape (via the level-set function) of the regions with different conductivity but also the conductivity values on each region. In figure 17 (top) we plot, as a function of n, the values of ${\bar{L}}_{1,f}$ , ${\bar{L}}_{2,f}$ , ${\bar{\lambda }}_{\mathrm{l}}$ , ${\bar{\lambda }}_{\mathrm{b}}$ and ${\bar{\lambda }}_{\mathrm{h}}$ , i.e. the means of the ensembles ${\left\{{L}_{1,f}^{\left(j\right)}\right\}}_{j=1}^{J}$ , ${\left\{{L}_{2,f}^{\left(j\right)}\right\}}_{j=1}^{J}$ , ${\left\{{\lambda }_{\mathrm{l}}^{\left(j\right)}\right\}}_{j=1}^{J}$ , ${\left\{{\lambda }_{\mathrm{b}}^{\left(j\right)}\right\}}_{j=1}^{J}$ , and ${\left\{{\lambda }_{\mathrm{h}}^{\left(j\right)}\right\}}_{j=1}^{J}$ (we reiterate that these variables are updated at each iteration of EKI). The ensemble mean for the intrinsic lengthscale of the level-set function in the vertical direction is larger than the horizontal one. Again, this is due to the presence of the regions of low conductivity which are longer in the vertical direction. From the middle-right panel we can see that the true conductivity in the background region ${\bar{\lambda }}_{\mathrm{b}}^{{\dagger}}$ is recovered quite accurately. In contrast, the mean values ${\bar{\lambda }}_{\mathrm{l}}$ and ${\bar{\lambda }}_{\mathrm{h}}$ do not seem to vary much with respect to the mean.

**Figure 16.** **Exp**_EIT₂. Logarithm of ${\kappa }_{n}\left.\right)$ (top) and the corresponding level-set f_n (bottom) computed with one run of **EKI-DMC** (with J = 200) at various intermediate iterations n (1 ⩽ n ⩽ n*).
Download figure:
Standard image High-resolution image

**Figure 17.** **Exp**_EIT₂. Top: plots of the means ${\bar{L}}_{1,f}$ and ${\bar{L}}_{2,f}$ (left) as well as the mean conductivity values ${\bar{\lambda }}_{\mathrm{l}}$ (left-middle), ${\bar{\lambda }}_{\mathrm{b}}$ (middle-right) and ${\bar{\lambda }}_{\mathrm{h}}$ (right) corresponding to the low, background and high conductivity regions. Bottom: densities from the initial and final (converged) ensemble of L_1,f, L_2,f, λ_l, λ_b, λ_h.
Download figure:
Standard image High-resolution image

**Figure 17.** **Exp**_EIT₂. Top: plots of the means ${\bar{L}}_{1,f}$ and ${\bar{L}}_{2,f}$ (left) as well as the mean conductivity values ${\bar{\lambda }}_{\mathrm{l}}$ (left-middle), ${\bar{\lambda }}_{\mathrm{b}}$ (middle-right) and ${\bar{\lambda }}_{\mathrm{h}}$ (right) corresponding to the low, background and high conductivity regions. Bottom: densities from the initial and final (converged) ensemble of L_1,f, L_2,f, λ_l, λ_b, λ_h.
Download figure:
Standard image High-resolution image

Although we are mainly focus on the deterministic case here, to further appreciate the accuracy of the inversion for these variables, in figure 17 (bottom) we show their probability densities approximated from the initial and final (converged) ensembles. For most of these variables we can see that the converged ensemble has a much smaller variance compared to the initial one. For the conductivity values we note that the lower and background values are identified accurately with the ensemble mean; the higher value is captured in the tail of the final ensemble.

4.6.4. Comparison between EKI-DMC and EKI-LM

In figure 18 we compare the performance of EKI-DMC and EKI-LM using 30 different initial ensembles. For EKI-LM we explore different choices of ρ. We note that EKI-DMC outperforms EKI-LM for all our choices of ρ. From table 2 note that for ρ > 0.7 the computational cost of EKI-LM is approximately twice the cost of EKI-DMC and the cost even triple if we choose ρ = 0.8. The plots of log κ_n from one run with the three algorithms (same initial ensemble) are shown in figure 19, where we see that all these runs perform well. Similarly conclusions to those in Exp_EIT₁ are also drawn for this case. Namely, EKI-DMC is more accurate than EKI-LM in the chosen metrics although visually we achieve good performance from both algorithms and inputs. Experiments (not shown) indicate that substantial improvement in the performance of EKI-LM can be achieved by using the same stopping rule given by (12) instead of (22). However, some choices of ρ can result in inaccurate estimates in terms of error w.r.t. the truth. The main advantage of EKI-DMC over EKI-LM is that the former does not rely on tuning parameters whose optimal selection is crucial for the stability and accuracy of the scheme.

**Figure 19.** **Exp**_EIT₂. Logarithm of ${\kappa }_{{n}^{{\ast}}}$ computed using the same initial ensemble (J = 200) with **EKI-LM** using different selections of the parameter ρ. In the right panel with display the log of ${\kappa }_{{n}^{{\ast}}}$ that we obtain using **EKI-DMC**.
Download figure:
Standard image High-resolution image

4.7. Further discussions

Let us recall that the proposed stopping criteria for EKI given by (12) comes naturally from the Bayesian tempering scheme. The tempering parameter t_N+1 = 1 in (8) corresponds to the posterior μ_N+1 = μ. From the deterministic perspective in which we at aim approximating (2), this stopping criteria might seem rather arbitrary, and one may argue that further iterations of the EKI algorithm (i.e. for t_N+1 > 1) may produce more accurate approximations of (2). Our numerical examples show that the stopping criteria from (12) yields values of the data misfit which are consistent with the discrepancy principle (38) where, as stated earlier, the noise level δ is approximated by $\sqrt{M}=16$ . In particular, from the experiments with multiple runs (figures 6 and 15) we see that the (log) data misfit approaches the (log) noise level log δ ≈ 2.77 as we reach the stopping criteria given by (12). In fact, for these experiments we note that the error w.r.t. the truth, ${\mathcal{E}}_{n}$ , stabilises before we reach the stopping criteria. Hence, earlier stopping (i.e. t_N+1 < 1) could have avoided one or two unnecessary iterations. In contrast, experiments (not shown) confirm that additional iterations (i.e. for t_N+1 > 1) not only do not decrease ${\mathcal{E}}_{n}$ , but they will eventually lead to an increase in ${\mathcal{E}}_{n}$ and hence loss of accuracy. These observations align with existing work suggesting that the appropriate early stopping is crucial for the stability of EKI. As discussed in subsection 2.1, the work in [1, 2] has shown that the error with respect to the truth increase with the number of iterations once the data misfit takes values below the noise level (i.e. the discrepancy principle is not satisfied).

Similarly, even though we motivate the selection of α_n in the DMC by controlling (via Jeffrey's divergence) the transitions between consecutive tempering measures μ_n+1 and μ_n, the aforementioned results from figures 6 and 15 constitute numerical evidence that DMC leads to a stable/regularised EKI algorithm that addresses the ill-posedness of the classical inverse problem (2). Intuitively, controlling the transitions between consecutive measures yields stable transitions between the statistics of these measures which are, in turn, approximated with Gaussians using EKI (see appendix A). Therefore, one may expect that, for sufficiently large ensemble size, EKI also leads to stable computation of the ensemble mean. Nonetheless, theoretical work beyond the scope of this manuscript is needed in order to deepen our understanding of the link between the transition of tempering measures and the regularisation properties of EKI-DMC that arises from the proposed selection of α_n and the stopping criteria induced by contraint (12).

5. Conclusions

We introduced the DMC: a new adaptive regularisation strategy within the classical EKI setting. This led to an algorithm, EKI-DMC, that in contrast to existing EKI approaches, does not require any tuning parameters. Although we focus on the solution of deterministic identification problem, the proposed DMC is motivated from the Bayesian perspective of EKI within the tempering setting, where the inverse of the regularisation parameter, ${\alpha }_{n}^{-1}$ , controls the transition between two consecutive intermediate measures. The Bayesian tempering setting provides us with a condition that these parameters must satisfy ( ${\sum }_{n=1}^{{n}^{{\ast}}}{\alpha }_{n}^{-1}=1$ ) to bridge the prior and posterior. We encode this condition for the termination of EKI-DMC together with our new method for choosing α_n. We show that the selection satisfies a heuristic statistical discrepancy principle which also controls Jeffreys' divergence between consecutive measures.

We applied EKI-DMC for the solution of EIT with the CEM in which the conductivity was parameterised via different maps that enabled us, via the EKI framework applied to the appropriate parameters, to characterise both smooth and piece-wise constants conductivities. WM fields were at the core of these parameterisations which included the intrinsic lengthscales as inputs that we estimate within EKI. For the piece-wise constant case, we use a truncated WF field (the level-set function) to characterise discontinuities between different regions. As with any other EKI algorithms, our results show that the performance of EKI-DMC relies on reasonable choices of ensemble size, J. For sufficiently large choices of J, our experiments show that EKI-DMC is quite robust and capable of producing accurate identification of physical properties suitably parameterised.

We conducted a performance comparison between EKI-DMC and the EKI-LM approach of [2]. In most cases EKI-DMC outperforms the accuracy of EKI-LM in terms of error w.r.t. the truth and data misfit, but we noted that suitable choices of tuning parameters and possible modifications to the stopping rule can produce comparable performance to EKI-DMC. We recognise that similar performance can also be achieved by using different choices of α_n in EKI including those discussed in section 2. We reiterate that, as with EKI-LM and its variants, most regularisation approaches for EKI rely on the choice of additional tuning parameters which, of course, can be tuned to display comparable or even better performance to EKI-DMC. However, there is no principled approach for the optimal selection of those parameters. Optimal tuning parameters can only be informed via careful numerical investigations on the given problem-specific setting. Our results show that EKI-DMC is a robust self-tuning regularisation strategy of EKI, ideally suited for practical and operational settings for which finding optimal choices of tuning parameters in existing EKI approaches may not be computationally feasible.

Acknowledgments

This work was supported by the Engineering and Physical Sciences Research Council [Grant No. EP/P006701/1]; through the EPSRC Future Composites Manufacturing Research Hub. We would like to thank the two anonymous reviewers for their insightful feedback and suggestions.

Appendix A.: Motivation of EKI from the Bayesian tempering approach

We introduce a series of approximations and Gaussian assumptions that lead to the EKI algorithm (algorithm 1).

A.1. Linearisation and Gaussian approximations

Suppose that the collection of tempering parameters ${\left\{{t}_{n}\right\}}_{n=1}^{N}$ in (9) have been specified. Our objective now is to construct a sequence of Gaussian approximations of each measure μ_n in (8). To this end, let ${\nu }_{0}=N\left({m}_{0},{\mathcal{C}}_{0}\right)$ be a Gaussian approximation of the prior μ₀, and let us denote by $D\mathcal{G}$ the Frechet derivative of $\mathcal{G}$ . We recursively construct a sequence of Gaussian approximations ${\left\{{\nu }_{n}=N\left({m}_{n},{\mathcal{C}}_{n}\right)\right\}}_{n=1}^{N+1}$ of ${\left\{{\mu }_{n}\right\}}_{n=1}^{N+1}$ via the following expression

$\begin{equation}{\nu }_{n+1}\left(\mathrm{d}u\right)\propto {\nu }_{n}\left(\mathrm{d}u\right)\mathrm{exp}\left[-\frac{1}{2}{{\Vert}{\left({\alpha }_{n}{\Gamma}\right)}^{-1/2}\left(y-{\mathcal{G}}_{n}-D{\mathcal{G}}_{n}\left(u-{m}_{n}\right)\right){\Vert}}^{2}\right],\end{equation} \tag{ A.1 }$

where for ease in the notation we have defined ${\mathcal{G}}_{n}\equiv \mathcal{G}\left({m}_{n}\right)$ and $D{\mathcal{G}}_{n}\equiv D\mathcal{G}\left({m}_{n}\right)$ . We note the right-hand side of (A.1) involves the linearisation of the forward map around the mean of ${\nu }_{n}=N\left({m}_{n},{\mathcal{C}}_{n}\right)$ .

Recursive formulas for the mean and covariance of ${\nu }_{n}=N\left({m}_{n},{\mathcal{C}}_{n}\right)$ can be obtained by completing the square in (A.1) (see theorem 6.20 in [31]). Indeed, since ${\nu }_{n}=N\left({m}_{n},{\mathcal{C}}_{n}\right)$ and the model $L\left(u\right)\equiv D{\mathcal{G}}_{n}\left(u-{m}_{n}\right)$ is linear, then ${\nu }_{n+1}=N\left({m}_{n+1},{\mathcal{C}}_{n+1}\right)$ with

$\begin{equation}{m}_{n+1}={m}_{n}+{\mathcal{C}}_{n}D{\mathcal{G}}_{n}^{{\ast}}{\left(D{\mathcal{G}}_{n}{\mathcal{C}}_{n}D{\mathcal{G}}_{n}^{{\ast}}+{\alpha }_{n}{\Gamma}\right)}^{-1}\left(y-{\mathcal{G}}_{n}\right)\end{equation} \tag{ A.2 }$

$\begin{equation}{\mathcal{C}}_{n+1}={\mathcal{C}}_{n}-{\mathcal{C}}_{n}D{\mathcal{G}}_{n}^{{\ast}}{\left(D{\mathcal{G}}_{n}{\mathcal{C}}_{n}D{\mathcal{G}}_{n}^{{\ast}}+{\alpha }_{n}{\Gamma}\right)}^{-1}D{\mathcal{G}}_{n}{\mathcal{C}}_{n},\end{equation} \tag{ A.3 }$

where $D{\mathcal{G}}_{n}^{{\ast}}$ denotes the adjoint of $D\mathcal{G}$ evaluated at m_n.

Remark 6. (LM from linearised Bayesian tempering). Using standard arguments (see for example lemma 3.1 in [2]) it can be shown that m_n in (A.2) satisfies (6). In the case where ${\mathcal{C}}_{n}$ is the identity operator, (6) yields the standard LM iterative scheme [26]. For the modified version in (6), we note that the recursive formula for the mean involves introducing the precision operator for u in the regularisation term in the right-hand side of (6).

Remark 7. (the linear-Gaussian case). Note that, if μ₀ = ν₀ (i.e. the prior is Gaussian) and the forward map $\mathcal{G}$ is linear, then we have that $D{\mathcal{G}}_{n}\left(u-{m}_{n}\right)=\mathcal{G}\left(u-{m}_{n}\right)=\mathcal{G}\left(u\right)-{\mathcal{G}}_{n}$ . Hence, (A.1) and (10) coincide and so μ_n = ν_n for all n = 0, ..., N + 1. In particular, in the linear-Gaussian case the final measure ${\nu }_{n}^{{\ast}}$ coincides with the posterior (i.e. ν_N+1 = μ_N+1 = μ)

A.2. Derivative-free ensemble approximation

We now introduce further approximations that will lead to the EKI algorithm under consideration. Let us denote by u_n a random variable such that ${u}_{n}\sim {\nu }_{n}=N\left({m}_{n},{\mathcal{C}}_{n}\right)$ . Denote by ${\mathbb{E}}_{n}$ expectation with respect to ν_n. Let us consider the first order approximation

$\begin{equation}\mathcal{G}\left({u}_{n}\right)\simeq {\mathcal{G}}_{n}+D{\mathcal{G}}_{n}\left({u}_{n}-{m}_{n}\right)\end{equation} \tag{ A.4 }$

that we used in (A.1) to define our Gaussian approximations for the tempering scheme introduced in appendix A.1. From (A.4) it follows that

$\begin{equation}{\mathbb{E}}_{n}\left[\mathcal{G}\left({u}_{n}\right)\right]\simeq {\mathcal{G}}_{n},\quad \mathcal{G}\left({u}_{n}\right)-{\mathbb{E}}_{n}\left[\mathcal{G}\left({u}_{n}\right)\right]\simeq D{\mathcal{G}}_{n}\left({u}_{n}-{m}_{n}\right).\end{equation} \tag{ A.5 }$

Hence,

$\begin{align}\hfill {\text{Cov}}_{n}\left({u}_{n},\mathcal{G}\left({u}_{n}\right)\right)& \equiv {\mathbb{E}}_{n}\left[\left({u}_{n}-{m}_{n}\right)\otimes \left(\mathcal{G}\left({u}_{n}\right)-{\mathbb{E}}_{n}\left[\mathcal{G}\left({u}_{n}\right)\right]\right)\right]\hfill \\ \hfill & \simeq {\mathcal{C}}_{n}D{\mathcal{G}}_{n}^{{\ast}},\hfill \end{align} \tag{ A.6 }$

$\begin{align}\hfill {\text{Cov}}_{n}\left(\mathcal{G}\left({u}_{n}\right)\right)& \equiv {\mathbb{E}}_{n}\left[\left(\mathcal{G}\left({u}_{n}\right)-{\mathbb{E}}_{n}\left[\mathcal{G}\left({u}_{n}\right)\right]\right)\otimes \left(\mathcal{G}\left({u}_{n}\right)-{\mathbb{E}}_{n}\left[\mathcal{G}\left({u}_{n}\right)\right]\right)\right]\hfill \\ \hfill & \simeq D{\mathcal{G}}_{n}{\mathcal{C}}_{n}D{\mathcal{G}}_{n}^{{\ast}},\hfill \end{align} \tag{ A.7 }$

where we have used the fact that ${\mathbb{E}}_{n}\left[\left({u}_{n}-{m}_{n}\right)\otimes \left({u}_{n}-{m}_{n}\right)\right]={\mathcal{C}}_{n}$ . If we use approximations (A.5)–(A.7) in (A.2) and (A.3) we note that

$\begin{align}\hfill {m}_{n+1}& \simeq {\tilde {m}}_{n+1}\equiv {m}_{n}+{\text{Cov}}_{n}\left({u}_{n},\mathcal{G}\left({u}_{n}\right)\right){\left({\text{Cov}}_{n}\left(\mathcal{G}\left({u}_{n}\right)\right)+{\alpha }_{n}{\Gamma}\right)}^{-1}\left(y-{\mathbb{E}}_{n}\left[\mathcal{G}\left({u}_{n}\right)\right]\right)\hfill \end{align} \tag{ A.8 }$

$\begin{align}\hfill {\mathcal{C}}_{n+1}& \simeq {\tilde {\mathcal{C}}}_{n+1}\equiv {\mathcal{C}}_{n}-{\text{Cov}}_{n}\left({u}_{n},\mathcal{G}\left({u}_{n}\right)\right){\left({\text{Cov}}_{n}\left(\mathcal{G}\left({u}_{n}\right)\right)+{\alpha }_{n}{\Gamma}\right)}^{-1}\enspace {\text{Cov}}_{n}\left(\mathcal{G}\left({u}_{n}\right),{u}_{n}\right).\hfill \end{align} \tag{ A.9 }$

These approximations to the mean and covariance of the sequence of approximate measures ${\left\{{\nu }_{n}\right\}}_{n=1}^{N+1}$ do not involve derivatives of the forward map. However, the covariance and cross-covariance that appear in (A.8) and (A.9) cannot be computed in closed form. This issue is overcome by using particle approximations of each approximate measure ${\nu }_{n}=N\left({m}_{n},{\mathcal{C}}_{n}\right)$ . In other words, we consider approximations

$\begin{equation}{\nu }_{n}^{J}\left({u}_{n}\right)=\frac{1}{J}\sum\limits _{j=1}^{J}\delta \left({u}_{n}-{u}_{n}^{\left(j\right)}\right),\end{equation} \tag{ A.10 }$

where we assume that ${u}_{n}^{\left(j\right)}\sim N\left({m}_{n},{\mathcal{C}}_{n}\right)$ . The classical EKI update formula (4) for the ensemble of particles ${\left\{{u}_{n}^{\left(j\right)}\right\}}_{j=1}^{J}$ is defined in such a way, that the corresponding ensemble mean and covariance approximate those in (A.8) and (A.9) as J → ∞. To see this more clearly, let us note that the ensemble approximation of ${\mathbb{E}}_{n}\left[{u}_{n}\right]$ and ${\mathbb{E}}_{n}\left[\mathcal{G}\left({u}_{n}\right)\right]$ are ${\bar{u}}_{n}$ and ${\bar{\mathcal{G}}}_{n}$ defined in (5). The ensemble approximations of ${\text{Cov}}_{n}\left(\mathcal{G}\left({u}_{n}\right)\right)$ , ${\text{Cov}}_{n}\left({u}_{n},\mathcal{G}\left({u}_{n}\right)\right)$ and ${\mathcal{C}}_{n}$ , denoted by ${C}_{n}^{\mathcal{G}\mathcal{G}}$ , ${C}_{n}^{u\mathcal{G}}$ and ${\mathcal{C}}_{n}^{uu}$ , are defined by (16), (17) and

$\begin{equation}{\mathcal{C}}_{n}^{uu}\equiv \frac{1}{J-1}\sum\limits _{j=1}^{J}\left({u}_{n}^{\left(j\right)}-{\bar{u}}_{n}\right)\otimes \left({u}_{n}^{\left(j\right)}-{\bar{u}}_{n}\right),\end{equation} \tag{ A.11 }$

respectively. From (4) we note that

$\begin{equation}{\bar{u}}_{n+1}={\bar{u}}_{n}+{\mathcal{C}}_{n}^{u\mathcal{G}}{\left({\mathcal{C}}_{n}^{\mathcal{G}\mathcal{G}}+{\alpha }_{n}{\Gamma}\right)}^{-1}\left(y+\sqrt{{\alpha }_{n}}{\bar{\xi }}_{n}-{\bar{\mathcal{G}}}_{n}\right).\end{equation} \tag{ A.12 }$

It can be shown (see for example [53] for a rigorous proof in finite dimensions) that

$\begin{align}\hfill {\bar{u}}_{n+1}\to {\tilde {m}}_{n+1}& \equiv {m}_{n}+{\text{Cov}}_{n}\left({u}_{n},\mathcal{G}\left({u}_{n}\right)\right){\left({\text{Cov}}_{n}\left(\mathcal{G}\left({u}_{n}\right)\right)+{\alpha }_{n}{\Gamma}\right)}^{-1}\hfill \\ \hfill & \quad {\times}\left(y-{\mathbb{E}}_{n}\left[\mathcal{G}\left({u}_{n}\right)\right]\right)\simeq {m}_{n+1}\hfill \end{align} \tag{ A.13 }$

$\begin{align}\hfill {\mathcal{C}}_{n+1}^{uu}\to {\tilde {\mathcal{C}}}_{n+1}& \equiv {\mathcal{C}}_{n}-{\text{Cov}}_{n}\left({u}_{n},\mathcal{G}\left({u}_{n}\right)\right){\left({\text{Cov}}_{n}\left(\mathcal{G}\left({u}_{n}\right)\right)+{\alpha }_{n}{\Gamma}\right)}^{-1}\enspace {\text{Cov}}_{n}\hfill \\ \hfill & \quad {\times}\left(\mathcal{G}\left({u}_{n}\right),{u}_{n}\right)\simeq {\mathcal{C}}_{n+1}.\hfill \end{align} \tag{ A.14 }$

as J → ∞. Moreover, for the particles in (4) we have

$\begin{equation*}{\nu }_{n+1}^{J}\left({u}_{n+1}\right)=\frac{1}{J}\sum\limits _{j=1}^{J}\delta \left({u}_{n+1}-{u}_{n+1}^{\left(j\right)}\right)\to {\tilde {\nu }}_{n+1}\equiv N\left({\tilde {m}}_{n+1},{\tilde {\mathcal{C}}}_{n+1}\right).\end{equation*}$

The development above (informally) shows that if ${\nu }_{n}^{J}\simeq {\nu }_{n}=N\left({m}_{n},{\mathcal{C}}_{n}\right)$ , then ${\tilde {\nu }}_{n+1}^{J}\simeq {\tilde {\nu }}_{n+1}=N\left({\tilde {m}}_{n+1},{\tilde {\mathcal{C}}}_{n+1}\right)$ . Furthermore, if (A.5)–(A.7) are accurate enough, then $N\left({\tilde {m}}_{n+1},{\tilde {\mathcal{C}}}_{n+1}\right)\simeq {\nu }_{n+1}=N\left({m}_{n+1},{\mathcal{C}}_{n+1}\right)$ and so the EKI ensemble ${\tilde {\nu }}_{n+1}^{J}$ approximates ν_n+1. Recall that the ν_n's are Gaussian approximations of the tempered distributions μ_n. Hence the regularisation parameter α_n in EKI is the inverse of the difference between consecutive tempering parameters (see equation (11)) which is, in turn, a Gaussian approximation of μ_n+1.

Remark 8. (squared-root EKI). It is worth noticing that other approaches can be used to approximate $N\left({\tilde {m}}_{n+1},{\tilde {\mathcal{C}}}_{n+1}\right)$ above. This includes the so-called ensemble square-root formulations [54] in which the particles are cleverly updated so that their sample mean and covariance coincide (exactly) with ${\tilde {m}}_{n+1}$ and ${\tilde {\mathcal{C}}}_{n+1}$ . While these has been shown to be beneficial for very small samples (i.e. <50), we note that $N\left({\tilde {m}}_{n+1},{\tilde {\mathcal{C}}}_{n+1}\right)$ does not, in general, coincideswith ${\nu }_{n+1}=N\left({m}_{n+1},{\mathcal{C}}_{n+1}\right)$ (unless $\mathcal{G}$ is linear).

Remark 9. (EKI as a derivative-free approximation of LM). For sufficiently large J, the mean of the ensemble ${\bar{u}}_{n+1}$ approximates ${\tilde {m}}_{n+1}$ and so m_n+1 which is, in turn, the iteration of the LM scheme in (6) (see remark 9). Therefore, we can interpret EKI as a derivative-free approximation of the LM scheme constrained to the subspace generated by the initial ensemble ${\left\{{u}_{0}^{\left(j\right)}\right\}}_{j=1}^{J}$ . More specifically, the ensemble mean ${\bar{u}}_{n}$ define by the recursive formula (5) satisfies the following subspace invariance property (see theorem 2.1 in [1])

$\begin{equation*}{\bar{u}}_{n}\in {\mathcal{S}}_{0}\equiv \text{span}{\left\{{u}_{0}^{\left(j\right)}\right\}}_{j=1}^{J}\end{equation*}$

for all $n\in \mathbb{N}$ . We expect EKI to produce approximate solutions to (4) within the subspace defined above. While the numerical experiments from [2] provides evidence of such a claim, to the best of our knowledge, the convergence of EKI in this context is still an open problem.

Appendix B.: Lemmas, theorems, and proofs

Let $y\in {\mathbb{R}}^{M}$ fixed. We recall that from assumption 1 on the forward map $\mathcal{G}$ , the data misfit ${\Phi}\left(\cdot ,y\right):\mathcal{H}\to \left[0,+\infty \right)$ is a square integrable functional on the probability space $\left(\mathcal{H},\mathcal{B}\left(\mathcal{H}\right),{\mu }_{0}\right)$ , i.e.

$\begin{equation}{\int }_{\mathcal{H}}\vert {\Phi}\left(u;y\right){\vert }^{2}{\mu }_{0}\left(\mathrm{d}u\right){< }\infty .\end{equation} \tag{ B.1 }$

This implies that ${\Phi}\left(\cdot ,y\right):\mathcal{H}\to \left[0,+\infty \right)$ is absolutely integrable functional on the probability space $\left(\mathcal{H},\mathcal{B}\left(\mathcal{H}\right),{\mu }_{0}\right)$ , i.e.

$\begin{equation}{\int }_{\mathcal{H}}\vert {\Phi}\left(u;y\right)\vert {\mu }_{0}\left(\mathrm{d}u\right){\leqslant}{M}_{0}{< }\infty .\end{equation} \tag{ B.2 }$

B.1. Proof of lemma 1

Proof. Assumption 1 allows us to use the techniques from [[31], theorems 4.1 and 4.2] and [[55], theorem 3.4] to show that μ_t ≪ μ₀. We now prove μ₀ ≪ μ_t, i.e. $\forall \enspace \mathcal{X}\in \mathcal{B}\left(\mathcal{H}\right)$ , ${\mu }_{t}\left(\mathcal{X}\right)=0\enspace {\Rightarrow}\enspace {\mu }_{0}\left(\mathcal{X}\right)=0$ . We prove this statement by contradiction. Assume that there exists $\mathcal{X}\in \mathcal{B}\left(\mathcal{H}\right)$ such that ${\mu }_{t}\left(\mathcal{X}\right)=0$ and ${\mu }_{0}\left(\mathcal{X}\right){ >}0$ . Since ${\mu }_{0}\left(\mathcal{X}\right){ >}0$ , we can define an another probability measure, ${\nu }_{\mathcal{X}}$ , on the measurable space $\left(\mathcal{X},\mathcal{B}\left(\mathcal{X}\right)\right)$ , such that for all $\mathcal{S}\in \mathcal{B}\left(\mathcal{X}\right)$ ,

$\begin{equation}{\nu }_{\mathcal{X}}\left(\mathcal{S}\right){:=}\frac{{\mu }_{0}\left(\mathcal{S}\right)}{{\mu }_{0}\left(\mathcal{X}\right)}.\end{equation} \tag{ B.3 }$

Using μ_t ≪ μ₀ and expressions (28) and (B.3) we have

$\begin{align*}\hfill {\mu }_{t}\left(\mathcal{X}\right){:=}& {\int }_{\mathcal{X}}{\mu }_{t}\left(\mathrm{d}u\right)={\int }_{\mathcal{X}}\frac{\mathrm{d}{\mu }_{t}}{\mathrm{d}{\mu }_{0}}\left(u\right){\mu }_{0}\left(\mathrm{d}u\right)\hfill \\ \hfill & =\frac{1}{{N}_{t}}{\int }_{\mathcal{X}}\mathrm{exp}\left(-t{\Phi}\left(u;y\right)\right){\mu }_{0}\left(\mathrm{d}u\right)\hfill \\ \hfill & =\frac{{\mu }_{0}\left(\mathcal{X}\right)}{{N}_{t}}\left({\int }_{\mathcal{X}}\mathrm{exp}\left(-t{\Phi}\left(u;y\right)\right){\nu }_{\mathcal{X}}\left(\mathrm{d}u\right)\right).\hfill \end{align*}$

We use Jensen's inequality, formula (B.3) as well as (B.2) to obtain

$\begin{align*}\hfill {\mu }_{t}\left(\mathcal{X}\right)& {\geqslant}\frac{{\mu }_{0}\left(\mathcal{X}\right)}{{N}_{t}}\enspace \mathrm{exp}\left(-t{\int }_{\mathcal{X}}{\Phi}\left(u;y\right){\nu }_{\mathcal{X}}\left(\mathrm{d}u\right)\right)\hfill \\ \hfill & =\frac{{\mu }_{0}\left(\mathcal{X}\right)}{{N}_{t}}\enspace \mathrm{exp}\left(-\frac{t}{{\mu }_{0}\left(\mathcal{X}\right)}{\int }_{\mathcal{X}}{\Phi}\left(u;y\right){\mu }_{0}\left(\mathrm{d}u\right)\right)\hfill \\ \hfill & {\geqslant}\frac{{\mu }_{0}\left(\mathcal{X}\right)}{{N}_{t}}\enspace \mathrm{exp}\left(-\frac{t{M}_{0}}{{\mu }_{0}\left(\mathcal{X}\right)}\right){ >}0\hfill \end{align*}$

which contradicts our assumption that ${\mu }_{t}\left(\mathcal{X}\right)=0$ . □

B.2. Proof of theorem 1

Let us first prove:

Theorem 1. For any bounded t ⩾ 0, the normalisation constant N_t defined in formula (29), as a function of t, is differentiable and its derivative $\frac{\mathrm{d}{N}_{t}}{\mathrm{d}t}$ is given by

$\begin{equation}\frac{1}{{N}_{t}}\frac{\mathrm{d}{N}_{t}}{\mathrm{d}t}=-{\int }_{\mathcal{H}}{\Phi}\left(u;y\right){\mu }_{t}\left(\mathrm{d}u\right),\end{equation} \tag{ B.4 }$

where μ_t is the probability measure determined via formula (28).

Proof. For any $u\in \mathcal{H}$ and for any bounded t ⩾ 0, let us define

$\begin{equation*}{f}_{u}\left(t\right){:=}\mathrm{exp}\left(-t{\Phi}\left(u;y\right)\right).\end{equation*}$

Then, N_t defined in formula (29) can be rewritten as

$\begin{equation*}{N}_{t}={\int }_{\mathcal{H}}{f}_{u}\left(t\right){\mu }_{0}\left(\mathrm{d}u\right).\end{equation*}$

Notice that the derivative of f_u is bounded by Φ(u; y) for any t ⩾ 0, since

$\begin{equation*}\left\vert \right.\frac{\mathrm{d}{f}_{u}}{\mathrm{d}t}\left(t\right)\left\vert \right.=\left\vert \right.-{\Phi}\left(u;y\right)\mathrm{exp}\left(-t{\Phi}\left(u;y\right)\right)\left\vert \right.{\leqslant}{\Phi}\left(u;y\right).\end{equation*}$

Recall that Φ(⋅; y) is absolutely integrable. Thus, according to the dominated convergence theorem, the derivative of N_t can be calculated by

$\begin{align*}\hfill \frac{\mathrm{d}{N}_{t}}{\mathrm{d}t}& =\frac{\mathrm{d}}{\mathrm{d}t}\left({\int }_{\mathcal{H}}{f}_{u}\left(t\right){\mu }_{0}\left(\mathrm{d}u\right)\right)={\int }_{\mathcal{H}}\frac{\mathrm{d}{f}_{u}}{\mathrm{d}t}{\mu }_{0}\left(\mathrm{d}u\right)\hfill \\ \hfill & =-{\int }_{\mathcal{H}}{\Phi}\left(u;y\right)\mathrm{exp}\left(-t{\Phi}\left(u;y\right)\right){\mu }_{0}\left(\mathrm{d}u\right).\hfill \end{align*}$

From the proof of lemma 1 we know N_t is strictly greater than 0 for any bounded t ⩾ 0. Therefore, N_t' can be divided by N_t,

$\begin{equation*}\frac{1}{{N}_{t}}\frac{\mathrm{d}{N}_{t}}{\mathrm{d}t}=-\frac{1}{{N}_{t}}{\int }_{\mathcal{H}}{\Phi}\left(u;y\right)\mathrm{exp}\left(-t{\Phi}\left(u;y\right)\right){\mu }_{0}\left(\mathrm{d}u\right).\end{equation*}$

Expression (B.4) follows from lemma 1 which establishes the equivalence between μ_t and μ₀. □

Expression (B.4) can be used to characterise N_t by integrating on both sides, arriving at

$\begin{equation*}\mathrm{log}\enspace {N}_{t}-\mathrm{log}\enspace {N}_{0}=-{\int }_{0}^{t}{\left\langle {\Phi}\right\rangle }_{t}\mathrm{d}t,\end{equation*}$

where we have used the notation introduced in (30). This extends, to the infinite-dimensional setting, the characterisation of the normalising constant obtained, for example, in [36].

We now prove theorem 1:

Proof. $\mathbb{E}\left\{g\left({u}_{t}\right)\right\}$ can be rewritten by changing the measure from μ_t to μ₀ (recall μ_t ≪ μ₀),

$\begin{align}\hfill \mathbb{E}\left\{g\left({u}_{t}\right)\right\}& {:=}{\int }_{\mathcal{H}}g\left(u\right){\mu }_{t}\left(\mathrm{d}u\right)={\int }_{\mathcal{H}}g\left(u\right)\frac{\mathrm{d}{\mu }_{t}}{\mathrm{d}{\mu }_{0}}\left(u\right){\mu }_{0}\left(\mathrm{d}u\right)\hfill \\ \hfill & \hfill =\frac{1}{{N}_{t}}{\int }_{\mathcal{H}}g\left(u\right)\mathrm{exp}\left(-t{\Phi}\left(u;y\right)\right){\mu }_{0}\left(\mathrm{d}u\right)=\frac{1}{{N}_{t}}{\int }_{\mathcal{H}}{f}_{u}\left(t\right){\mu }_{0}\left(\mathrm{d}u\right),\end{align}$

where f_u(t) is defined by

$\begin{equation*}{f}_{u}\left(t\right){:=}g\left(u\right)\mathrm{exp}\left(-t{\Phi}\left(u;y\right)\right)\end{equation*}$

for any $u\in \mathcal{H}$ and any bounded t ⩾ 0. By applying the chain rule, the derivative of $\mathbb{E}\left\{g\left({u}_{t}\right)\right\}$ is given by

$\begin{align}\hfill \frac{\mathrm{d}}{\mathrm{d}t}\mathbb{E}\left\{g\left({u}_{t}\right)\right\}& =\frac{\mathrm{d}}{\mathrm{d}t}\left(\frac{1}{{N}_{t}}{\int }_{\mathcal{H}}{f}_{u}\left(t\right){\mu }_{0}\left(\mathrm{d}u\right)\right)\hfill \\ \hfill & =-\frac{\frac{\mathrm{d}{N}_{t}}{\mathrm{d}t}{\int }_{\mathcal{H}}{f}_{u}\left(t\right){\mu }_{0}\left(\mathrm{d}u\right)}{{N}_{t}^{2}}+\frac{\frac{\mathrm{d}}{\mathrm{d}t}\left({\int }_{\mathcal{H}}{f}_{u}\left(t\right){\mu }_{0}\left(\mathrm{d}u\right)\right)}{{N}_{t}}.\hfill \end{align} \tag{ B.5 }$

Notice that the derivative of f_u is bounded by |g(u)Φ(u; y)| for any t ⩾ 0, since

$\begin{equation*}\left\vert \right.\frac{\mathrm{d}{f}_{u}}{\mathrm{d}t}\left(t\right)\left\vert \right.=\left\vert \right.-g\left(u\right){\Phi}\left(u;y\right)\mathrm{exp}\left(-t{\Phi}\left(u;y\right)\right)\left\vert \right.{\leqslant}\left\vert \right.g\left(u\right){\Phi}\left(u;y\right)\left\vert \right..\end{equation*}$

Since both g and Φ are square integrable, it follows from Cauchy–Schwarz inequality that g ⋅ Φ is absolutely integrable. Thus, according to the dominated convergence theorem, we have

$\begin{align}\hfill \frac{\mathrm{d}}{\mathrm{d}t}\left({\int }_{\mathcal{H}}{f}_{u}\left(t\right){\mu }_{0}\left(\mathrm{d}u\right)\right)& ={\int }_{\mathcal{H}}\frac{\mathrm{d}{f}_{u}}{\mathrm{d}t}\left(t\right){\mu }_{0}\left(\mathrm{d}u\right)\hfill \\ \hfill & =-{\int }_{\mathcal{H}}{\Phi}\left(u;y\right){f}_{u}\left(t\right){\mu }_{0}\left(\mathrm{d}u\right).\hfill \end{align} \tag{ B.6 }$

On the other hand, we note that the conditions of theorem 1 apply. Then, we substitute (B.6) and (B.4) in (B.5) to arrive at

$\begin{equation*}\frac{\mathrm{d}}{\mathrm{d}t}\mathbb{E}\left\{g\left({u}_{t}\right)\right\}=\frac{{\int }_{\mathcal{H}}{\Phi}\left(u;y\right){\mu }_{t}\left(\mathrm{d}u\right){\int }_{\mathcal{H}}{f}_{u}\left(t\right){\mu }_{0}\left(\mathrm{d}u\right)}{{N}_{t}}-\frac{{\int }_{\mathcal{H}}{\Phi}\left(u;y\right){f}_{u}\left(t\right){\mu }_{0}\left(\mathrm{d}u\right)}{{N}_{t}}.\end{equation*}$

According to lemma 1, μ_t and μ₀ are equivalent for any bounded t ⩾ 0. Hence, we change measure in the right-hand side of the equation above using formula (28). Thus,

$\begin{equation*}\frac{\mathrm{d}}{\mathrm{d}t}\mathbb{E}\left\{g\left({u}_{t}\right)\right\}={\int }_{\mathcal{H}}{\Phi}\left(u;y\right){\mu }_{t}\left(\mathrm{d}u\right){\int }_{\mathcal{H}}g\left(u\right){\mu }_{t}\left(\mathrm{d}u\right)-{\int }_{\mathcal{H}}{\Phi}\left(u;y\right)g\left(u\right){\mu }_{t}\left(\mathrm{d}u\right).\end{equation*}$

Since both Φ and g are square integrable, it is not difficult to see that the right-hand side in the previous expression gives the desired result in (34). □