Bayesian design of measurements for magnetorelaxometry imaging

The aim of magnetorelaxometry imaging is to determine the distribution of magnetic nanoparticles inside a subject by measuring the relaxation of the superposition magnetic field generated by the nanoparticles after they have first been aligned using an external activation magnetic field that has subsequently been switched off. This work applies techniques of Bayesian optimal experimental design to (sequentially) selecting the positions for the activation coil in order to increase the value of data and enable more accurate reconstructions in a simplified measurement setup. Both Gaussian and total variation (TV) prior models are considered for the distribution of the nanoparticles. The former allows simultaneous offline computation of optimized designs for multiple consecutive activations, while the latter introduces adaptability into the algorithm by using previously measured data in choosing the position of the next activation. The TV prior has a desirable edge-enhancing characteristic, but with the downside that the computationally attractive Gaussian form of the posterior density is lost. To overcome this challenge, the lagged diffusivity iteration is used to provide an approximate Gaussian posterior model and allow the use of the standard Bayesian A- and D-optimality criteria for the TV prior as well. Two-dimensional numerical experiments are performed on a few sample targets, with the conclusion that the optimized activation positions lead, in general, to better reconstructions than symmetric reference setups when the target distribution or region of interest are nonsymmetric in shape.

1. Introduction.This work considers Bayesian optimal experimental design (OED) for choosing positions and orientations of activation coils in a simplified model for magnetorelaxometry imaging (MRXI).We implement both an offline algorithm for simultaneous optimization of multiple consecutive activations assuming a Gaussian prior for the imaged magnetic nanoparticle (MNP) distribution and an adaptive algorithm based on a total variation (TV) prior and the lagged diffusivity iteration as introduced for sequential X-ray imaging in [18].
1.1.Magnetorelaxometry imaging.In MRXI, the goal is to determine the distribution of magnetic nanoparticles inside a physical body from measurements on the relaxation of a superposition magnetic field generated by an alignment of the magnetic moments of the MNPs.The MNPs are aligned using an external activation magnetic field, which is then switched off, and the change in the magnetic field due to the relaxation of the MNPs is finally measured at sensors outside the examined body.
MNPs have a diameter of a few nanometers.They consist of ferro-or ferrimagnetic material and can be manipulated with an external magnetic field [23].MNPs are employed in biomedicine [29]: they have applications in different types of cancer treatment, e.g., in magnetic drug targeting [37] and magnetic hyperthermia [27].In these applications, it is important to have an estimate on the MNP distribution inside the subject because it directly affects how the treatment transpires.
As in [14], we approximately model the dependence of the MRXI measurements of the relaxation magnetic field on the distribution of the MNPs via a linear forward model by working in the linear range of the Langevin function that is used for expressing the magnetization of the MNPs exposed to an external magnetic field.As further simplifications, we model the coil used for producing the external activation fields as a dipole and assume that each measurement sensor records the strength of the relaxation magnetic field (exactly) at a given point in a given direction.The consecutive positions and orientations of the activation dipole are the design parameters we aim to optimize.
After discretization, the forward model corresponds to a measurement matrix that depends nonlinearly on the design parameters, with a possibility to explicitly calculate the derivatives of this nonlinear dependence, which facilitates implementation of differentiation-based optimization algorithms.Due to a certain symmetry in the measurement model, the system matrix depends essentially in the same way on the measurement positions and directions as on the specifications of the activation dipoles, which means that the introduced techniques could as well be used for designing the configuration of the measurement sensors.
1.2.Bayesian experimental design.A Bayesian optimal design p * is defined as a maximizer over the design space P for the expectation of the utility function E u,y [U (p; u, y)] with respect to the data y ∈ Y and the model parameters u ∈ U [9].That is, where π(u | p, y) and π(y | p) are the posterior distribution of the parameter u and the marginal distribution of the data y, respectively, under the design p. Two of the most common choices for U are a negative quadratic loss function that measures the squared distance from u to a specific point estimator such as the posterior mean and the expected information gain for which U is related to the Kullback-Leibler distance between the posterior and prior distributions.
In our setting of MRXI, the design parameter vector p encodes the consecutive positions and orientations of the dipole-like activation coil, u corresponds to the discretized MNP distribution, and y is the vector of measurements on the magnetic field at the sensors.As the measurement model relating u and y is linear, assuming a Gaussian prior and an additive Gaussian noise model considerably simplifies the optimization targets corresponding to the aforementioned two utility functions: employing a quadratic loss function or maximizing the information gain lead to minimizing a weighted trace or the determinant of the posterior covariance, respectively.These correspond to so-called A-and D-optimal designs [2,9].What is more, the posterior covariance is independent of the measurements, meaning that the experimental design can be performed offline, i.e. prior to taking any measurements, and simultaneously for several consecutive activations.
If a smoothened TV prior is utilized for the MNP distribution, it is not possible to get rid of the double integral over the potentially high-dimensional spaces in (1.1) without further simplifications.However, if one proceeds sequentially, choosing the specifications for the next activation only after computing a maximum a posteriori (MAP) estimate for the MNP distribution via the lagged diffusivity iteration [38] based on the measurements from the previous activations, it is possible to interpret the MAP estimate as the mean of a Gaussian distribution whose covariance matrix is available as a side product of the iteration [5,8].Basing the selection of the specifications for the next dipole activation on this covariance structure, one can devise a sequential Bayesian OED algorithm that adapts to the already collected data and has potential to produce edge-promoting experimental designs; see [18] for an application of this idea to sequential X-ray imaging.Let us also note that the sequential optimization approach can be extended to a non-parametric setting with convex priors (such as the smoothened TV) for which the posterior has good approximation properties by Gaussian distributions in the neighborhood of non-parametric MAP estimators [17].
1.3.Our contribution.The main contribution of this work is the application of Bayesian OED to a simplified two-dimensional model of MRXI, where the activation coils and measurement sensors are modeled as point-like objects with orientations.OED has previously been studied in the framework of MRXI in [26,34,35], of which only the master's thesis [26] considers a Bayesian setting.
The presented numerical experiments tackle OED both with a Gaussian prior and simultaneous offline optimization of several activations and with a smoothened TV prior and sequential adaptive optimization of the activations.In both cases, the reconstruction accuracy for the optimal designs is compared to symmetric reference setups, demonstrating that the employment of Bayesian OED improves the performance of MRXI.However, as the the objective functions considered when deducing the 'optimal' designs suffer from multiple local optima, and a greedy approach is used when optimizing the activations sequentially, there is no certainty that globally optimal designs are actually found in all experiments.
For the simultaneous design of activations, the optimization is performed by Newton's method or gradient descent due to the high-dimensional design space, which makes the approach prone to finding local optima.In the sequential algorithm, where only the position and orientation of the next activation dipole need to be optimized at a time, an exhaustive search is also considered in order to evaluate the optimality of the designs produced by the differentiation-based methods.The numerical experiments focus mainly on A-optimality, but extending all presented considerations to the case of D-optimality would be conceptually straightforward.
The considered sequential approach to Bayesian OED has previously been investigated for choosing optimal projection geometries in X-ray imaging with a Gaussian prior in [7] and with a TV prior in [18].However, these papers do not tackle simultaneous optimization of many projection geometries with a Gaussian prior due to computational restrictions.Moreover, compared to X-ray tomography with a limited projection aperture, the measurement matrix for a single activation in MRXI is more ill-conditioned and provides information about the entire imaged object.Hence, the functionality of the sequential OED algorithm with the smoothened TV prior for MRXI is not obvious based solely on the material in [18].
Our method relies on a well-defined discretization of an underlying non-parametric Bayesian OED problem.Non-parametric approach for OED in Bayesian inverse problems has been formalized by Alexanderian (see [2] and the references therein), and it provides the theoretical underpinnings for many applications emerging, e.g., in inverse problems related to PDEs (see, e.g., [2,3,6,11,25,39]) or nonlinear systems [19].Stability properties of the expected utility concerning model approximations in Bayesian optimal experimental design were recently investigated in [13].This finding suggests that, in our framework, each optimization step in the sequential D-optimal approach remains robust when subjected to discretization and linearization of the Langevin function.However, rigorous study of such robustness is part of future work.
This text is organized as follows.The (simplified and discretized) measurement model of MRXI is described in Section 2. Section 3 introduces the Bayesian framework for inverse problems and recalls the probabilistic interpretation of the lagged diffusivity iteration from [5,18].In Section 4, the optimality targets for A-and D-optimal designs are introduced, and their minimization is considered.Section 5 presents the complete optimization/reconstruction algorithm for the sequential design process, and Section 6 is dedicated to numerical experiments.Finally, Section 7 lists the concluding remarks.
2. Measurement model and its discretization.MRXI consists of two main phases that are excitation and relaxation.In the excitation phase, an external magnetic field is generated by using electromagnetic coils, called activations in what follows.The activations realign the magnetic moments of the MNPs so that the superposition field that they generate can be measured with magnetic field sensors outside the imaged object.In the relaxation phase, the activations are switched off, allowing the magnetic moments of the MNPs to reorient through Néel and Brownian relaxations until equilibrium is reached [22].Néel relaxation describes the rotation of the magnetic moment within the core of an MNP and Brownian relaxation the rotation of the entire MNP.Note that magnetic field sensors can only measure a change in a magnetic field, meaning that the MRXI measurements actually correspond to the change in the field over some time period during the relaxation phase.
Ideally, a single MRXI measurement equals the strength of the superposition magnetic field created by the MNPs at a given location in a given direction.To achieve this, the external activation magnetic field should vanish immediately after it is switched off and the employed measurement sensor should be able to measure the change in the magnetic field precisely at a single point, both of which are impossible conditions to satisfy in practice.Be that as it may, in the following analysis we assume such idealized measurements and refer to [20,24] and references therein for information on the practical limitations of MRXI instrumentation.
The imaged object is represented by a domain Ω ⊂ R d , with d = 3, and the density of the MNPs in Ω is modeled by c ∈ L 2 + (Ω), where accounts for the presumed nonnegativity of the MNP density.The aim of MRXI measurements is to reconstruct c based on measurements on (the relaxation of) superposition magnetic fields generated by the MNPs under a series of activations.
2.1.Modeling the activations.Consider an external activation coil at a ∈ R d \ Ω, with its shape described by a smooth enough (closed) path Γ a ⊂ R d .Assume furthermore that gives an arclength parametrization for Γ a .According to the Biot-Savart law, the magnetic field induced when a constant net current J runs through the coil is given as where | • | denotes the Euclidean norm.A common approach to modeling electromagnetic coils is to approximate the coil path with linear segments.This approximation is explained in [15] and used for an example on MRXI in [24].The issue with such a numerical implementation is the computational complexity of the MRXI forward model, especially when the coils are approximated to a high precision.Moreover, the detailed modeling of the coils is case-dependent, and it is not obvious that such details have a considerable effect on the general conclusions on the applicability of Bayesian OED to MRXI.For these reasons, we model the activations as electromagnetic dipoles; cf.[14].
Our use of dipole activations can be motivated, e.g., by assuming that Γ a consists of a single circular loop of radius ρ in the (d − 1)-dimensional plane that contains the center point of the loop a and has the unit vector ν as its normal.If ρ is small compared to |x − a|, it is well known that B a (x) can be approximated by the field of a magnetic dipole at a.More precisely (see, e.g., [36]), where I ∈ R d×d is the identity matrix, α = πρ 2 Jν has an interpretation as a dipole moment, and the sign depends on the direction of the current in the loop and the chosen orientation for ν.In the following, we drop the latter term in (2.2) and also simplify/abuse the notation by writing where α ∈ R d is a dipole moment that includes all physical constants and will together with a act as a design parameter in our numerical experiments.
2.2.Modeling the measurements.Let M : Ω → R d describe the magnetization generated by the MNPs inside the imaged object due to the external magnetic field.The superposition field generated by the magnetization is expressed as that is, as a field induced by a density of magnetic dipoles over Ω described by the magnetization.According to the basic theory on idealized paramagnetic materials [31], the magnetization M generated during the excitation phase can be modeled as (cf.[14]) where q > 0 is a MNP-dependent physical constant, B a is the activation magnetic field and L : R → R is the Langevin function Since in MRXI the argument of L in (2.3) can be assumed to be in the linear range of the Langevin function [14], we drop the second term on the right-hand side of (2.4).Assuming further that a measurement sensor at s ∈ R d \Ω measures the (relaxation-induced change in the superposition) magnetic field in the direction of a vector σ ∈ R d that describes the sensor's orientation, the corresponding measurement is modeled as Observe that in (2.5) all constants have been included in the measurement direction vector σ.

Complete model and its discretization.
Combining our activation and measurement models and recalling the assumption to be able to measure the strength of the actual superposition magnetic field via measuring its relaxation, the complete measurement model for a single measurement due to a single activation reads where the integral kernel is The integral operator K σ,s,a : L 2 (Ω) → L 2 (D), implicitly defined by (2.6), is compact due to the boundedness (or smoothness) of its kernel, assuming the domain for the possible positions of the sensor locations D is bounded and satisfies D ∩ Ω = ∅ [32].This gives an explanation for the ill-posedness of the considered MRXI problem [14].Observe also that (2.6) represents a linear dependence between the MNP concentration c ∈ L 2 + (Ω) and the measurement y s,σ,a,α ∈ R, parametrized by the two vector pairs (a, α) and (s, σ) that define the activation dipole and the measurement sensor, respectively.Moreover, it is easy to check via transposition that κ σ,a,α (s, • ) = κ α,s,σ (a, • ), which means that changing the roles of the pairs (a, α) and (s, σ) does not change the measurement, i.e., y s,σ,a,α = y a,α,s,σ .In particular, our approach for designing the activation pattern introduced in what follows could as well be used for designing the positions of the measurement sensors.
The relation (2.6) can be discretized as where c 1 , . . ., c Nc are the degrees of freedom in the employed parametrization for the MNP concentration and we have abused the notation by denoting now with c a vector in R Nc .The coefficient vector k s,σ,a,α is defined by the function κ σ,a,α (s, • ) and the quadrature rule utilized for numerically evaluating the right-hand side of (2.6).In our numerical experiments, the two-dimensional1 domain Ω is divided into a homogeneous grid of N c pixels with centers x 1 , . . ., x Nc , the degrees of freedom for the MNP concentration are c j = c(x j ), j = 1, . . ., N c , and where the weight ω is the area of a single pixel.Let us assume that there are N s measurements for each activation; the corresponding parameter pairs (s j , σ j ), j = 1, . . ., N s , are not considered as design parameters, but they are predefined and the same for each activation.According to (2.7), a full set of measurements for a single activation can thus be modeled as where p = (a, α) denotes the design parameter pair, i.e., the position and moment for the activation dipole, and Analogously, after k sets of measurements corresponding to the activations defined by a sequence of design parameters p j = (a j , α j ), j = 1, . . ., k, the discretized noiseless measurement model reads where p k denotes a concatenation of all k design variables.This is the model that is used for (sequential) Bayesian OED in the following.In particular, we do not insist on the nonnegativity of the discretized MNP concentration in what follows.
3. Prior models and lagged diffusivity iteration.In this section, a Bayesian model for the measurements deterministically described by (2.9) is introduced and two different prior models for the discretized MNP concentration are considered.The first prior model is Gaussian, which enables optimization of the measurements in an offline mode and, in particular, simultaneous optimization of the specifications of several sequential activations.The second one is a smoothened TV prior, which makes sequential optimization of the activations truly adaptive, that is, the previously collected measurement data affect the subsequent activation designs.
Consider the probabilistic measurement model for a single activation, i.e., for a single patch of N s rows in (2.9), where C is the discretized and MNP concentration now modeled as a random vector, and the noise E j is assumed to follow a zero-mean Gaussian distribution N (0, Γ noise,j ), where is Γ noise,j ∈ R Ns×Ns is symmetric and positive definite.The maximum number of activations is denoted by N a , meaning that 1 ≤ k ≤ N a in all following considerations.The noise processes E 1 , . . ., E Na are assumed to be mutually independent.The prior for the MNP density C is assumed to be independent of the noise processes and to follow a probability density of the form where the role of γ > 0 is separately specified for the two considered priors.According to the Bayes' formula and assuming the measurement model (3.1), the posterior density for c after k activations is where Γ noise,k := diag(Γ noise,1 , . . ., Γ noise,k ) ∈ R kNs×kNs is a block diagonal matrix defined by the noise covariance matrices for the previous measurements.
3.1.Gaussian prior.Let us first assume that a priori C ∼ N ( c 0 , Γ 0 ), where c 0 ∈ R Nc is the prior mean and the symmetric positive-definite Γ 0 ∈ R Nc×Nc is the prior covariance.Hence, γ = 1/2 and 2).For a Gaussian prior, it is well known that the posterior π(c | y k ) is also a Gaussian (see, e.g., [21]), with the covariance and the mean where the latter recursive formulas follow by treating the posterior after the previous measurement as the prior for the newest one.
It is important to notice that the posterior covariance after k measurements depends on the previous experimental designs via K(p k ) (cf. (2.9)), but it does not depend on the corresponding measurement data.As the optimization targets considered in Section 4 are functions of the posterior covariance only, the optimal designs do not depend on measured data either.Hence, the optimization of the measurement setup can be performed offline, with no other reason to resort to greedy sequential optimization of the activations than the computational cost related to considering a high-dimensional decision variable.As a consequence, we aim to simultaneously optimize all activation designs p Na when considering a Gaussian prior for the MNP concentration.
3.2.Total variation prior and lagged diffusivity.For smoothened TV, the scaled negative log-prior Φ is defined as accompanied by the information that c vanishes at the pixels next to the boundary of Ω.The small parameter T > 0 ensures the differentiability of φ, which is required by the lagged diffusivity iteration, and in this case γ > 0 in (3.2) controls the strength of the prior.With this choice, the posterior (3.3) is obviously not Gaussian, which typically makes the evaluation of an optimality target of the form (1.1) computationally demanding.To circumvent this problem, we adopt the approach in [18] and iteratively approximate Φ(c) by quadratic terms in the spirit of the lagged diffusivity iteration [38]; see also [4,16].With a suitable stopping condition, this technique automatically produces a Gaussian approximation for the posterior of c after k activations; the covariance matrix for this approximate posterior can then be deployed when determining the experimental design for the next activation.In particular, this approach leads to a sequential OED algorithm that adapts to the measurement data in hand.
To set the stage for introducing a Bayesian version of the lagged diffusivity iteration, let us identify c with its interpolant in a piecewise linear finite element basis {ϕ j } Nc j=1 ⊂ H 1 0 (Ω).A straightforward differentiation reveals that where for any c ∈ R Nc interpreted as an element of H 1 0 (Ω) via the introduced finite element basis.In particular, Θ is positive definite and thus invertible since it corresponds to a finite element discretization of an elliptic partial differential operator accompanied by a homogeneous Dirichlet boundary condition; see [18] for more details.
Let us then assume that we have been able to deduce (an approximation of) the MAP estimate c k−1 for the posterior (3.3) after k − 1 activations, with Φ defined by (3.6).If k = 1, our initial guess for the MNP concentration plays the role of c k−1 .The aim is to use the lagged diffusivity iteration to compute the MAP estimate c k for the posterior (3.3) after k activations in such a way that we simultaneously form a Gaussian approximation for (3.3).The lagged diffusivity iteration is started by setting c The subsequent iterates are defined recursively via where Γ ) −1 can be interpreted as the covariance matrix for a zeromean Gaussian prior multiplied by γ.The iterate itself c (j) k has an interpretation as the corresponding posterior mean after performing measurements corresponding to K(p k ).Although not needed explicitly in the lagged diffusivity iteration itself, the posterior mean can be accompanied by a Gaussian density with the covariance .9) see once again [18] for more details.
This iterative process is continued until a suitable stopping criterion is satisfied, say, at j = J; the criterion employed in our numerical experiments is considered in Section 5.One then defines c k = c (J) k to be the reconstruction after k projection images, i.e., an approximation of the MAP estimate for (3.3).Moreover, the covariance matrix for the associated approximate Gaussian density Γ k = Γ (J) k is used for choosing the design parameter vector p k+1 defining the next activation.In particular, the covariance matrix (3.9)only needs to be evaluated once at j = J when the iteration is terminated.For more details on the lagged diffusivity iteration see [10,12,38].
4. Computing optimal designs.In this section, we recall the concepts of Aand D-optimal designs and consider numerically solving the associated optimization problems.The presentation is intentionally compact; we refer to [2,9] and [28], respectively, for more information on the optimality conditions of Bayesian OED and the tools of nonlinear optimization.

4.1.
A-and D-optimality.The A-and D-optimality criteria aim to minimize the (weighted) trace and the determinant of the (Gaussian) posterior, respectively, with respect to the design parameters.The design parameter vector is denoted by ξ and the considered posterior covariance by Σ(ξ).These entities can have two meanings corresponding to two different settings: 1.The design ξ parametrizes p Na , i.e., it defines the specifications of all activations, and Σ(ξ) = Γ Na (p Na (ξ)) is the final posterior for a Gaussian prior defined by (3.4) with k = N a .2. The measurements corresponding to the first k − 1 activations have already been performed, ξ parametrizes the kth activation, and Σ(ξ) is defined in accordance with (3.5) as where p = p(ξ) and Γ k−1 is the covariance of a Gaussian distribution that the MNP concentration is assumed to follow after k − 1 measurements.The main motivation for considering the latter case is the method for sequentially building approximate Gaussian posteriors under a smoothened TV prior reviewed in Section 3.2, but it can in principle also be used for deducing greedy sequential designs under a Gaussian prior (cf., e.g., [7]).
In our setting, an A-optimal design is defined as It minimizes the expected squared distance of the unknown from the posterior mean in the seminorm defined by the positive semidefinite matrix A ⊤ A for a given A ∈ R Nc×Nc ; see, e.g., [2,9] for more details.In our numerical tests, A is usually the identity matrix I, indicating that all degrees of freedom in the MNP concentration are considered equally important.However, certain diagonal elements of the identity matrix can also be set to zero in order to only account for the reconstruction error in the other degrees of freedom.Such a diagonal matrix is denoted by I ROI , where "ROI" indicates that the reconstruction error over some region of interest (ROI) is minimized.In our numerical tests, where Ω is divided into N c homogeneous pixels with constant concentration values, an estimate of the expected L 2 -error over the ROI for a design ξ can be computed as where A = I ROI and |Ω| denotes the area of Ω.A D-optimal design maximizes the information gain when the prior is replaced by the posterior [9], which in our setting can be expressed as ▷ Select the step size λ based on Algorithm 3 with an initial guess λ.

end while return ξ i
The minimization target Ψ D equals the negative of the information gain up to an additive constant and scaling, and the inclusion of a logarithm in Ψ D follows from information theory, but it also makes the evaluation of Ψ D more stable (cf.[7]).It would also be possible to only consider information gain over some ROI [7], but such a case is not considered in the numerical examples of this work.
In the rest of this section, the minimization target is denoted generically as Ψ : R N ξ → R, where N ξ is the number of parameters required for parametrizing the positions and moments of the considered activation dipoles.The precise parametrization employed in our numerical tests is introduced in Section 6; at this stage, it is enough to note that the parametrization is smooth and such that no constraints are needed for the decision variables, meaning that one can resort to differentiationbased methods of global optimization.In particular, the needed first and second order derivatives can be calculated explicitly using the employed parametrization and applying basic matrix differentiation formulas to (4.2) and (4.4); see [26] for more details.

Minimization of the target functions.
Our standard algorithms for minimizing Ψ are gradient descent and Newton's method accompanied by an inexact bisection line search that utilizes the Wolfe conditions.As the target function generally suffers from several local minima, the choice of the initial guess for the employed minimization algorithm plays a crucial role.The algorithm could, e.g., be restarted from multiple initial guesses, and among the resulting designs, the one producing the smallest value for the optimization target could be chosen as the final minimizer.However, we simply resort to randomization or some heuristic for choosing a single initial guess in our numerical experiments.
The complete algorithm for optimizing an activation design is a combination of either Algorithm 1 (gradient descent) or Algorithm 2 (Newton's method) and Algorithm 3, which is an inexact bisection line search that is terminated when the Wolfe conditions are satisfied with predefined parameters.H Ψ denotes the Hessian of the optimization target and inequalities between (symmetric) matrices are interpreted in the sense of the partial ordering induced by positive-definiteness.In particular, the if-clause of Algorithm 2 guarantees that H is strictly positive-definite, and d is thus a descent direction for Ψ at ξ i .
The parameter β 1 , which is used in the first Wolfe condition, describes a sufficient decrease in the value of the target function: the step size is accepted if it results in a decrease in the target function that is at least β 1 times the decrease predicted by the directional derivative at the base point of the line search.The parameter β 1 is often chosen to be quite small, which is already enough for guaranteeing a decrease in the Algorithm 2 (Newton's method) Choose a tolerance ϵ > 0, an initial guess ξ 0 , the maximum number of iterations N Newton , a positiveness constraint δ > 0, and a step size parameter λ > 0. Initialize Choose the maximum number of iterations N Wolfe and a pair of scaling parameters β 1 , β 2 ∈ (0, 1) satisfying β 1 < β 2 .Initialize a counter as l = 0 and the lower and upper bounds for the bisection method with γ 1 = 0 and γ 2 = ∞.The initial step size λ > 0, the considered base point ξ i , and the search direction d are given as inputs.
▷ Set l = l + 1. end while return λ = λ target function.The parameter β 2 controls the so-called curvature condition.If the slope at the point proposed by the step size is larger than β 2 times the initial slope, the curvature condition is satisfied.As the initial slope is guaranteed to be negative by Algorithms 1 and 2, this means that the rate of decrease at the proposed point can be at most β 2 times the rate of decrease at the base point of the line search.The parameter β 2 is often chosen to be quite close to one.
▷ Form the system matrix K(p k ) and 'measure' the data y k .
5. Sequential edge-promoting optimization of activations.If a Gaussian prior, i.e., case 1 in Section 4.1 is considered, all essential material for implementing our algorithm for optimizing the activation designs has already been included in Section 4.However, if case 2 of Section 4.1 is tackled, the optimization routine must at each iteration be combined with a lagged diffusivity iteration for computing an approximation of the MAP estimate for the posterior (3.3) with Φ defined by the smoothened TV regularizer (3.6).The purpose of this section is to summarize this combined procedure as a concise algorithm, i.e.Algorithm 4, which is essentially the same as the one presented in [18,Section 5].
In addition to the sequentially optimized activation p 1 , . . ., p Na , Algorithm 4 also returns the final reconstruction c Na and the associated spread estimator Γ Na , i.e., the mean and the covariance of the final approximative Gaussian posterior for the MNP concentration after N a activations.A motivation for the stopping criterion of the lagged diffusivity iteration in the inner loop can be found in [4,18].Note that we also employ an exhaustive algorithm for computing the optimal sequential designs in Algorithm 4 to be able to deduce upper limits for the performance of this adaptive edge-promoting approach to the activation design in MRXI.We refer to [7,18] for considerations on efficient implementation of such an exhaustive optimization routine in the framework X-ray imaging.
6. Numerical experiments.All numerical experiments are performed in a two-dimensional setting, with Ω ⊂ R 2 being a disk of radius ρ > 0 centered at the origin.In particular, all spatial variables, dipole moments and sensor orientations introduced in Section 2 are modeled as two-dimensional vectors.This can be considered an approximation for a (somewhat unrealistic) setting, where the MNPs are concentrated around a single cross-section of a three-dimensional body, modeled by Ω, and the sensors and activations lie in the two-dimensional plane defined by Ω, with their orientations and dipole moments being parallel to that same plane.
To be more precise, the sensors and activations are placed on a circle with radius 1.1ρ, and their orientations and activations are modeled by unit vectors.In all tests, there are N s = 36 sensors that are equiangularly spaced and oriented toward the origin.The position and moment of an activation are parametrized by two angular variables, with one of them defining the position on the measurement circle and the other giving the orientation of the activation.In particular, the decision variable ξ considered in Section 4 is a concatenation of two or several such angular variables, whose periodicity enables using gradient descent and Newton's method for estimating optimal designs without any additional constraints.However, observe that reversing the orientation of an activation only changes the signs of all associated (noiseless) measurements, which does not affect the A-or D-optimality of the design in question.Hence, the deduced optimal designs are post-processed for visualization purposes by reversing the orientations of those activations that originally point toward the interior of the measurement circle.
The integral kernel in (2.6) depends explicitly and smoothly on the parameter pair (a, α) defining the position and moment of an activation dipole, and thus the kernel also depends explicitly and smoothly on the two angular decision variables defining an activation in our parametrization.In particular, one can straightforwardly, yet tediously deduce the first and second derivatives of the elements of the measurement matrix K(p k ) in (2.9) with respect to the angular decision variables defining the components of p k .The gradient and Hessian of the optimization target, defined by (4.2) or (4.4), can thus also be given explicitly by resorting to basic rules of matrix differentiation, independently of whether one or several activations are optimized simultaneously.We do not write down the formulas for these derivatives here but refer to [26] for more details.
The measurement noise is assumed to have independent components with a common standard deviation η > 0. The disk Ω is discretized by initially constructing a uniform pixel grid for an origin-centered square with side length 2ρ and then removing the pixels that lie outside Ω.The MNP concentration is modeled as a piecewise constant function with respect to such a pixelification of Ω, with c carrying the corresponding constant concentration values.Three levels of discretization are used: the number of components in c is N c = 4293 for computing reconstructions, a coarser grid with N opt = 901 pixels is used for optimizing the designs, while measurement data is simulated with N data = 5140 degrees of freedom to avoid an inverse crime.We refer to [7,18] for technical details on performing the optimization of the measurement design on a sparser grid by resorting to interpolation between pixelifications.
The free parameters in Algorithms 1, 2, 3 and 4 are set to ϵ = 10 −5 , λ = 1, N GD = N Newton = 50, δ = 10 −5 , N Wolfe = 15, β 1 = 10 −10 , β 2 = 0.9, T = 10 −6 , γ = 10 and τ = 10 −3 .The effect of the last three parameters on the performance of Algorithm 4 is discussed in [18], where the same values are used in X-ray imaging.The low value for β 1 means that even a small decrease in the objective function is considered sufficient in the bisection line search.Due to the existence of several local minima for the A-and D-optimality objective functions (cf.[7,26]), the parameters controlling Algorithms 1, 2 and 3 may have a considerable effect on the proposed experimental designs since gradient descent and Newton's method can, e.g., converge to different local minima even when starting from the same initial guess.We do not claim that the aforelisted parameter choices are optimal.If one resorts to exhaustive search in Algorithm 4, the value of the optimization target is evaluated on a equidistant grid of 100 × 100 points over [0, 2π) × [0, 2π) to deduce the decision variable pair that minimizes the considered optimization target.
Remark 6.1.Although we have ignored many physical parameters in Section 2 and the orientations/moments of the sensors/activations have been scaled to unit length, there still remain two parameters that can be tuned in the experiments: the radius ρ of Ω and the noise level η.As the ignored physical parameters and the magnitudes of the activations only scale the measurements, controlling the domain size and the noise level still provide sufficient flexibility for modeling the physical properties of the measurement setup.
6.1.Gaussian prior.Our first two experiments consider a Gaussian prior and simultaneous optimization of N a = 10 activations.The prior covariance is given elementwise as where ℓ > 0 is the so-called correlation length, γ > 0 is the pixelwise standard deviation, and x i and x j are the coordinates of the pixels with indices i and j.We employ the same standard deviation γ = 1 in all experiments with a Gaussian prior, but the correlation length varies between individual tests.
6.1.1.Gaussian Test 1: a homogeneous disk.In the first experiment, we select the standard deviation of the noise, the radius of the imaged domain and the correlation length for the prior as η = 1, ρ = 0.5 and ℓ = 0.15, respectively.The initial positions and orientations of the activations are independently drawn from the uniform density over [0, 2π], and Algorithm 2, i.e., Newton's method, is used for finding both A-and D-optimal designs.For the A-optimality target function defined in (4.2), the weight A is chosen to be the identity matrix, meaning that the reconstruction quality is considered equally important everywhere in Ω.
The initial activation configuration is visualized in the left-hand image of Figure 6.1, and the middle and right images show the estimated A-and D-optimal activation designs, respectively.For both A-and D-optimality, the optimized activations lie approximately equidistantly around the object and point roughly away from its center (after a possible reorientation by 180 degrees), though in the D-optimal design the directions vary considerably more.This suggests that in the case of nonadaptivity and a radially symmetric target, placing the activations symmetrically is a reasonable approach, which cannot be considered very surprising.
The left-hand image in Figure 6.2 shows the expected L 2 (Ω) reconstruction error ΨA , defined in (4.3), as a function of the iteration number of Newton's method.Similarly, the right-hand image illustrates the evolution of the properly scaled information gain (cf.[7]) during the optimization procedure for a D-optimal design.The horizontal lines in Figure 6.1 depict the values of ΨA and ΨD for a precisely equidistant configuration of activations pointing exactly away from the center of Ω.For both A-and D-optimality, the optimized target function values are close to those for the reference design, with  the deduced A-optimal design even slightly surpassing the reference value, possibly thanks to the slight variations in the activation directions.The final D-optimal design corresponds to a lower information gain than the reference design, which suggests the optimization process got stuck in a local optimum.
6.1.2.Gaussian Test 2: a semidiscoidal ROI.Our second test with a Gaussian prior investigates a setting where the ROI is the left half of Ω, as illustrated in Figure 6.3.Only A-optimality is considered.The initial guess for the set of activations is the same random configuration as used in the first test, shown on left in Figure 6.1.Newton's method is employed for finding approximate A-optimal configurations for two sets of parameter values: η = 1, ρ = 0.5 and ℓ = 0.15 as in the first test, and η = 0.1, ρ = 5 and ℓ = 1.5.The resulting A-optimal activation configurations are visualized in the middle and right-hand images of Figure 6.3, respectively.In both cases, the activations are again oriented approximately away from the center of the domain, but their locations have clearly shifted toward the ROI on the left-hand side of Ω.This phenomenon is more pronounced for the latter parameter combination, presumably due to the larger size of the imaged object that makes activations on the side opposite to the ROI less useful.Figure 6.4 illustrates the evolution of the expected L 2 (Ω) reconstruction error ΨA during Newton's iterations for the two sets of parameter values.Again, the horizontal lines denote the expected errors for a reference design consisting of equally spaced ROI Fig. 6.3: Gaussian Test 2. Left: ROI is the left half of Ω. Center: approximate A-optimal design for η = 1, ρ = 0.5 and ℓ = 0.15.Right: approximate A-optimal design for η = 0.1, ρ = 5 and ℓ = 1.5.The black arrows depict the sensors and the red arrows the optimized activations.The images are not in scale, but the diameter of the right-hand version of Ω is ten times larger than of that at the middle.activations oriented away from the center of Ω.The optimized configurations show a greater advantage over the reference design when compared to the experiment with the ROI covering all of Ω.This is to be expected since the reference design has not been adapted to the current setup where only the left side of the object is of interest.
6.2.TV prior and sequential designs.Next, we apply the adaptive edgepromoting Algorithm 4 to the piecewise constant test phantom illustrated in the left-hand image of Figure 6.5, with two different choices for the noise-size parameter pair (η, ρ).Only A-optimality is considered in this adaptive sequential context.6.2.1.TV Test 1: P-shaped inclusion.The target is characterized by a homogeneous background contaminated by a P-shaped inhomogeneity located in the bottom-right quadrant of Ω and carrying unit MNP density.The size of the domain and the measurement noise level are defined by the parameter values and ρ = 0.5 and η = 0.1, respectively.The target is deliberately chosen to demonstrate advantages of Bayesian OED: if the P-shaped inclusion were positioned close to the center of Ω, the optimal design would most likely approximately correspond to activations placed equiangularly around the object, and the optimization would not provide a clear benefit.However, considering a target that exhibits interesting features only in a small subregion of Ω close to its boundary leads to optimal designs where the activations are concentrated close to the particular area of interest.Algorithm 4 is run for a total of N a = 15 iterations, considering all three options for sequentially deducing the specifications of the activations: gradient descent, Newton's method and an exhaustive search.For comparison, Algorithm 4 is also run so that the optimization step on the first line of the for-loop is skipped and the next activation is chosen deterministically from a reference set: The full reference set of activations consists of 15 equiangularly positioned dipoles pointing away from the center of Ω.The activations are introduced one by one into the lagged diffusivity-based reconstruction algorithm so that new activations are placed in turns onto each quadrant of the measurement circle.The images in the right-hand column of Figure 6.6 show the reference setups with five, ten and fifteen activations included.The reference activations are also used as the initial guesses when Newton's method and gradient descent are used for sequential optimization of the activations in Algorithm 4.
The right-hand image in Figure 6.5 shows the relative L 2 (Ω) reconstruction error as a function of the number of activations for the four options of sequentially choosing the activation designs in Algorithm 4. It is obvious that the deterministic reference designs perform the worst and the exhaustive search performs in the end the best in terms of the reconstruction error.A bit surprisingly, gradient descent is in this case the second best optimization routine, resulting in even smaller reconstruction errors than the exhaustive search between three and nine activations.This highlights two aspects: (i) The relative performance of gradient descent and Newton's method is highly case-dependent due to the existence of several local minima in the target function; the tendency of these optimization methods to jump between local minima could be tuned by the initial step size parameter λ in Algorithms 1 and 2. (ii) Due to the sequential nature of Algorithm 4, it is possible that not finding the global, but only local minima by a differentiation-based method when determining individual activations leads to a more optimal combination of activations than employing the exhaustive search.
According to our experience from other numerical tests not documented here, gradient descent produces on average approximately as optimal designs as Newton's method, but the run time of our nonoptimized MATLAB implementation of Algorithm 4 is typically slightly shorter with Newton's method.This advantage is explained by the possibility to explicitly calculate all needed second derivatives, the small size of the Hessian in Algorithm 2 and the better theoretical convergence rate of Newton's method close to a local minimum.In the presented example, Algorithm 4 was 32% faster with Newton's method than with gradient descent.Figure 6.6 shows the reconstructions and optimal designs produced by Algorithm 4 after five, ten and fifteen activations when the sequential designs are deduced by Newton's method (left), exhaustive search (middle) and the deterministic reference procedure (right).According to a visual investigation, the differences in the reconstruction quality are not huge.On the other hand, it is obvious that Newton's method and the exhaustive search find different (sequential) local minima in the optimization target: both methods concentrate the activations close to the location of the P-shaped inclusion, but already after five activations the fine details of the designs are considerably different.This is due to the existence of multiple local minima that makes the performance of differentiation-based optimization routines suboptimal without further modifications, such as using several different initializations.Figure 6.7 illustrates this problem by visualizing the A-optimality target function after four and nine activations when Newton's method is employed.On the left, the optimized fifth activation found by Newton's method is close to the global (sequential) minimizer.However, on the right in case of the tenth activation, we can see that a bad initial guess has resulted in Newton's method getting stuck in a local minimum far from the global optimum.6.2.2.TV Test 2: P-shaped inclusion a with larger domain.Our second test on sequential edge-promoting Bayesian ODE essentially repeats the previous experiment with the noise and size parameters reset to η = 0.01 and ρ = 5, i.e., the noise level is lower but the object is larger than previously.The right-hand image of 6.8 the relative 2 (Ω) reconstructions errors for four methods for determining the specifications of the activation dipoles in Algorithm 4. This time, Newton's method and gradient descent lead to roughly equally accurate reconstructions as the exhaustive search, with the deterministic procedure for choosing the activation designs again performing the worst.In fact, compared to the previous test with a smaller Ω, the deterministic reference procedure results now in much larger reconstruction errors compared to the other three options, presumably due to the larger size of the imaged object that makes activations, e.g., on the top-left side of Ω less informative.Figure 6.9, which is organized in the same way as Figure 6.6 for the previous test, demonstrates that the optimal designs produced by Algorithm 4 with both Newton's method and the exhaustive search have the activations more tightly packed around the interesting area in Ω than in the preceding test.Although these optimized designs seem qualitatively very similar, there are differences in their fine details due to the effect of local minima.As the difference between the reference and optimized designs is considerable, it is hardly surprising that there is also a larger difference between the corresponding reconstruction errors compared to the previous test with a smaller target domain.
7. Concluding remarks.This work investigated the application of Bayesian OED techniques to MRXI in a simplified two-dimensional simulated setting.The numerical experiments tested the general applicability of the studied approach to simultaneous optimization of several activations with a Gaussian prior and to sequentially choosing optimal activations with a smoothened TV prior following the ideas in [18].The presented results demonstrate that choosing the activations according  to the A-optimality criterion can improve reconstruction quality, especially when the imaged object or the ROI is nonsymmetric, which makes symmetric reference designs suboptimal.
The numerous local minima in the optimization targets (cf. Figure 6.7) constitute a major obstacle for efficiently applying Bayesian OED to realistic MRXI, a problem for which no solutions were proposed in this work.Moreover, no effort was made to model practical measurement settings of MRXI, that is, to account for the inherently three-dimensional nature of the measurements, the shape of the imaged object, accurate models for activations and measurements, and the effect of realistic values for the involved physical quantities.Designing optimization algorithms that are robust to local minima and working with more detailed and realistic measurement models are interesting topics for future research on Bayesian OED for MRXI.

Fig. 6 . 1 :
Fig. 6.1: Gaussian Test 1. Left: initial activation configuration.Center: approximate A-optimal design.Right: approximate D-optimal design.The black arrows depict the sensors and the red arrows the optimized activations.

Fig. 6 . 2 :
Fig.6.2:Gaussian Test 1. Left: the expected L 2 (Ω) reconstruction error ΨA as a function of the iteration number when deducing the A-optimal design.Right: the information gain ΨD as a function of the iteration number when deducing the Doptimal design.The horizontal lines depict the values of ΨA (left) and ΨD (right) for equidistant activations pointing away from the center of Ω.

Fig. 6 . 5 :
Fig. 6.5: TV Test 1. Left: target MNP concentration.Right: relative L 2 (Ω) reconstruction errors as functions of the number of activations in Algorithm 4 when the specifications of the activations are deduced by Newton's method, Gradient descent, exhaustive search and the deterministic reference procedure.

Fig. 6 . 7 :
Fig. 6.7: TV Test 1. Visualizations of the A-optimality target Ψ A as a function of the two angular decision variable on [0, 2π] × [0, 2π] during a run of Algorithm 4 with Newton's method for the phantom in the left-hand image of Figure 6.5.The vertical axis corresponds to the position of the activation on the measurement circle of radius 1.1ρ and the horizontal axis to the direction of the activation dipole.Left: Ψ A after four activations.Right: Ψ A after nine activations.The circle depicts the initial guess for deducing the next activation, the star shows the local minimum found by Newton's method, and the cross indicates the global minimizer.

Fig. 6 . 8 :
Fig. 6.8: TV Test 2. Left: target MNP concentration compared to the much smaller target from Figure 6.5.Right: relative L 2 (Ω) reconstruction errors as functions of the number of activations in Algorithm 4 when the specifications of the activations are deduced by Newton's method, Gradient descent, exhaustive search and the deterministic reference procedure.