A Bayesian formulation for perturbed current tomography in tokamaks

An initial investigation into the application of Bayesian inference to the reconstruction of the spatial distribution of current perturbations in tokamaks from diagnostic signals is presented. Previous work in Bayesian equilibrium current tomography is extended to the case of a complex phasor representation of harmonically time varying current distributions. A forward function to predict the response of magnetic diagnostics is constructed using only electrodynamics and not reduced models of ideal MHD. The extension of this forward function to incorporate a fully kinetic model of the plasma state is suggested. The response of soft x-ray diagnostics, and the motional Stark effect diagnostic to the current perturbations are also predicted and the integration of all diagnostics into a single estimate of the current perturbation is proposed. Simulations with synthetic diagnostics in simple geometry demonstrate that the perturbed current distribution can be reconstructed subject to prior assumptions regarding solution smoothness.


Introduction
Plasmas are fundamentally a very complicated electromagnetic system where currents, inertial effects, and electromagnetic waves all interact. Hence, they support a variety of electrodynamic phenomena. In toroidal magnetic confinement devices the existence of large standing wave modes in the plasma can destabilise it thereby reducing the confinement. Therefore, developing new methods to diagnose wave modes in real-time is a task of practical value for the future development of nuclear fusion research. In this paper we will propose a Bayesian approach for the tomographic imaging of oscillating plasma configurations. * Author to whom any correspondence should be addressed.
Original Content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
Consider any physical quantity Q which varies in both space x and time t. In toroidal geometry and adopting standard toroidal coordinates (r, θ, ϕ), a pure standing mode of Q at temporal frequency ω, is taken to be any configuration of the form where n, m ∈ Z are called the toroidal and poloidal mode numbers respectively. Technically we also consider linear combinations of these configurations as constituting standing modes. The quantisation of the mode numbers is enforced by the requirement of the continuity of the Q function under 2π rotations around both the toroidal and poloidal axes. This document is concerned with standing modes in the current density j(x, t) within the interior of a tokamak. More precisely, we are interested in current densities which describe time harmonic perturbations superimposed onto a static equilibrium current density. That is, of the form j(x, t) = j 0 (x) + δj(x, t), where δj is a standing mode in the sense of (1). We will study how we can infer the distribution of the harmonic current perturbation δj from measurements extracted from diagnostic systems at the edge of the device. We will see later that the advantage of describing plasma modes primarily as a current fluctuation, instead of a density or magnetic field fluctuation, is that several different diagnostic signals can be predicted from the same plasma state and then these several measurements can be integrated together to provide a single higher confidence estimate of the plasma state. The objective of this paper is to demonstrate that the spatial structure of a current perturbation can be inferred from magnetic diagnostics and that the quality of the reconstruction can then be improved via the integration of several diagnostic signals. We also want to show that we can perform the above reconstruction without any knowledge about the Magnetohydrodynamic (MHD) spectrum. The long-term objective of this research is to provide a mode tomography system which can provide secondary validation of the results from existing mode tomography systems which depend upon reduced MHD models.

Bayesian inference
Tomographic imaging belongs to a class of problems called inverse problems where we attempt to determine model parameters from observed data. This is in contrast to forward problems where we attempt to predict measurements given model parameters. Bayesian inference is the name given to a formalism for solving inverse problems using probability theory [1].
In earlier work in tokamak physics, standing modes have been reconstructed from observed data using the standard least-squares method [2]. Here, we instead adopt a Bayesian approach to model-parameter estimation. Suppose we have some vector of parameters I which parameterises the current distribution j throughout the tokamak, and a vector D of measured values which are related to the current distribution via some forward function D = f(I). Our aim is to infer a plausible set of parameters from the data. In the Bayesian approach we accept the notion that probability theory provides the tools for performing what is commonly called plausible reasoning. In this sense, probabilities do not necessarily represent the limiting frequency of an outcome of a random process but instead are a quantification of the belief, or more accurately confidence, that we have about the precise state of the system [1,3]. The benefit of this approach is that it will allow us to strongly incorporate knowledge we may have about the system a priori. It also requires us to make our assumptions about the system extremely clear, which is a desirable property.
We define P(I|D), called the posterior distribution, to be the function describing the probability density that the model parameters I are the true values given the observed data D.
To proceed further, we will observe that Bayes theorem will allow us to simplify the above.
Bayes theorem is a fundamental result of probability theory which in our context can be written as P(I|D) = P(D|I)P(I) where: P(I), called the prior, is a probability distribution representing our underlying knowledge about the state of the currents; P(D|I), called the likelihood function, describes the probability of a particular measurement given the state and our knowledge of measurement errors; P(D), called the marginal probability, quantifies our belief that we will observe a particular piece of data in the case where we know as little as possible about the state. Note that we will usually ignore the marginal likelihood in what follows because in our context it is just a normalising constant which can be restored when needed. While we will largely construct our formalism by directly manipulating the full probability distributions, as is Bayesian tradition, we aim to construct a single estimate of the state vector I. To do this we choose to adopt maximum a posteriori estimation. That is, we take that our specific estimateÎ is the state vector which maximises the posterior distribution P(I|D) [3]. For the case of multi-modal posterior distributions this estimation procedure is not always possible, but we will not be concerned with this complexity here.
In order to use Bayes rule to perform parameter estimation of the current distribution in a tokamak, that is to solve the inverse problem, we will need to first develop several related tools. We need to determine a representation of the current distribution which can be described in a parameter vector I. We must also determine a prior which encodes some of the expected behaviour of the system. Finally, we need to model the underlying physics of the system and the noise to make the likelihood function. That is actually solve the forward problem.

Notational conventions
For the purposes of this document, both bold-face and calligraphic letters are, unless otherwise specified, assumed to refer to vector valued symbols. For example, the parameter vector I is a calligraphic letter and so is assumed to be vector valued. The symbol P is reserved for only probabilities.

Problem statement
The formal problem statement with which this paper is concerned is as follows. We intend to demonstrate, as a proof of concept, a Bayesian approach to imaging current perturbations in tokamak devices from simple diagnostic signals which does not rely upon reduced models of ideal MHD. Our interest is in observing the spatial structure, often referred to as radial structure, of the current perturbation. This is tomography.
To solve this problem we separate it into several subproblems. We must construct a Bayesian framework for the estimation of harmonically varying parameters which allows for the integration of measurement from several different diagnostic systems. We do this in section 3. We also need to solve the forward problem, that is predict diagnostic signals we would expect from our current perturbations. We look at the cases of magnetic diagnostics, soft x-rays observations, and the motional Stark effect (MSE) diagnostic in sections 4, 9 and 10 respectively. We also investigate the effect of prior choice and physics model on the reproduction of test current distributions. This constitutes the majority of sections 7 and 8 of this paper. Finally, sections 11 and 12 provide some conclusions and comments on possible further work.

Prior work
In this subsection we review previous research into the fields of both current tomography and Bayesian inference as applied to tokamak plasmas.
Regarding current tomography, in [4] Svensson explored the use of Bayesian inference to reconstruct the static current distribution of an axisymmetric tokamak. The equilibrium current tomography methods developed by Svensson and collaborators have been applied to several different machines. In [5], Liu et al presented the application of Bayesian equilibrium current tomography to the EAST machine. Notably they adopt the same conditional auto-regressive (CAR) prior, as discussed below, as used in Svensson's earlier work. Recently, in [6], Liu et al demonstrated that they can improve on the CAR prior by replacing it with an Advanced Squared Exponential (ASE) kernel prior. Specifically, they show that the ASE prior resolved a minor problem where the CAR prior would underestimate the current in the core. We will demonstrate that a version of this problem also occurs in perturbed current tomography in section 8. Svensson's work forms the fundamental foundation of many of the ideas presented in this document, which can be viewed as a study in the associated time harmonic antenna problem. There has been little research into the construction of a solution for the time harmonic case. One possible approach to developing a forward function to perform time harmonic eigenmode tomography is presented in [7]. This approach is based on reduced MHD mode theory and therefore relies upon stronger physics assumptions than the models proposed in this document. An alternative more fundamental approach to mode tomography in presented in [2], in which the authors demonstrate that the radial structure of an eigenmode in the H-1NF stellarator can be inferred from fluctuations of the intensity of the light observed at detectors surrounding the machine. The authors perform the inversion step of their problem with the more traditional least squares method rather than the Bayesian approach which we will discuss. In [8] Li et al present the observation of MHD modes in the HL-2A tokamak from soft x-ray measurements using Bayesian inference and a non-stationary Gaussian Process (GP) prior. This method is capable of observing the time dependence of single MHD modes in the machine but it does not admit integration of the soft x-ray measurements with those of the Mirnov coil array, aside from to confirm the frequency of the observed mode. The perturbed current tomography approach we present here can integrate these different diagnostic measurements into a single estimate of the plasma current distribution because both the magnetic and soft x-ray diagnostic signals are predicted from the same description of the plasma state.
Previous research has investigated the application of Bayesian inference to plasma diagnostics outside of current or mode tomography. In [9] it is demonstrated that the plasma equilibrium can be inferred without the explicit requirement to infer the current distribution first. In [10] the electron density and temperature profiles are inferred from experimental data taken from the mega ampere spherical tokamak experiment. The implementation of an x-ray tomography method for the inference of the static emissivity profiles of a tokamak is presented in both [11,12]. Wang et al [11] adopts a Bayesian approach where Jardin et al [12] uses a traditional regularised fitting approach. The two methods are found to perform similarly in practice and therefore on existing evidence we are justified in adopting the Bayesian method. This is beneficial as it will allow us to utilise its improved capability to integrate several diagnostics. The methods for hyperparameter fitting presented later in this document were largely adapted from these two publications. It is also possible to use Bayesian methods to infer the static visible light emissivity profile instead of the x-ray emissivity profiles discussed above. This is demonstrated in [13] where the visible light emissivity of the HL-2A tokamak is inferred using a Bayesian GP. The primary contribution of this work is a method to enforce the constraint that the emissivity must be a positive distribution through truncation of the prior.

A pixel model for a tokamak cross section
To describe the current perturbations in the plasma we need to choose a description of the internal state of the plasma. In older work, a continuum decomposition into a set of orthonormal eigenfunctions was common [14,15]. In more modern work the preference has been to adopt a pixel model of the plasma as it is more natural for line integrated measurements and has been shown to produce empirically superior results [4]. We will adopt such a pixel model here where it will carry the physical interpretation of considering the plasma as made of a set of current filaments which behave like loop antennas.
We will assume that the current distribution in our tokamak is a pure toroidal mode with mode number n but we allow for arbitrary variation in the radial and poloidal directions. This choice admits us a reduction to a truly two-dimensional pixel model. Weakening this assumption would require upgrading from a pixel model to a fully three dimensional voxel model for the current distribution. While this is a simple extension in theory it would add substantial complexity in practice and so will not be attempted here.
We partition the cross section of the tokamak into a grid of square pixels and define a coordinates system which is aligned with this grid with axes x and y. The origin of our coordinates system is defined to be the centroid of the cross section. To handle the edges we define that a pixel is included in our grid only if it lies entirely within the boundary of the cross section. A diagram of this grid is shown as figure 1.
Later we will need total ordering for the pixels in the grid. We choose to index pixels by i where the integer i increases in the −y and +x direction. We can then associate with each pixel P i an ordered pair x i = (x i , y i ) describing the position of the centre of the pixel p i . To describe the perturbed current distribution in the plasma, with only finite information, we construct a local basis function expansion for j, where we associate a basis current perturbation δj i density to each pixel. By local here, we mean that each basis function j i is 'mostly contained' to a single pixel. The notion of the basis functions being 'mostly contained' does not describe a formal mathematical criterion and instead captures the qualitative concept that the basis functions impact the pixels upon which they are not centred negligibly. We could make the notion of 'contained' basis functions a formal one by requiring that the basis functions be nonsingular and compactly supported upon their pixel. That is, we require that j i (x) = 0 for x / ∈ p i . In practice though, if the basis functions die off sufficiently fast then there is no problem using global basis functions instead of compactly supported ones since we can design the basis functions so that an arbitrarily large proportion of their total measure is contained within the pixel. This is really what we mean by 'mostly' contained. We describe the full current distribution over the cross section of the tokamak by just summing the basis currents associated to each pixel For reference we define the local basis current distribution we use here. The current j i (x) in the pixel p i is taken to be a normal distribution with standard deviation 1/6th of the lengths of the sides of each pixel. We will also give a sinusoidal time dependence on the current so that they describe an oscillating current localised to the pixel at some driving frequency ω d . Defining the length of the side of each pixel to be 6σ then the current distribution can be taken to be where θ is the toroidal angle. Note that J i ∈ C 3 is a complex valued three vector representing the current polarisation. As a complex value it encodes both the amplitude and the phase of the oscillating currents in each of the possible polarisation directions.
It will also be convenient later to factorise the above into the where the scalar basis functions are The motivation for this choice is purely simplicity. Specifically, we intend to perform the majority of the analysis which follows in the Fourier domain and choosing a Gaussian distribution in space-time ensures that the current distribution will also be simple in frequency-wavevector space since the Fourier transform of a Gaussian is also a Gaussian. We will see later that another advantage to constructing our basis functions from Gaussians is that they admit first order derivatives in space.
As an aside, it will be convenient to define a new coordinate z = Lθ, where L describes the circumference of the magnetic axis of our tokamak [16]. This is because we will largely operate in a periodic quasi-cylindrical model in the remainder of this document. Adopting this choice and taking the Fourier transform of the above we have a wave-vector domain pixel current distribution It may seem that a more natural choice would be to assign a Dirac delta spike in the centre of each pixel. This could be done but we will see later that there are numerical advantages to choosing the pixel distribution so that its Fourier transform vanishes quickly at large wavevector. The delta function's constant Fourier transform does not vanish at large wavevector. The delta function model would also introduce a mathematical singularity which we would rather avoid if possible 1 .

Magnetic diagnostics
There are many diagnostics systems which are included in modern tokamaks designs, examples include: soft x-ray measurements, the MSE, poloidal flux loops, toroidal flux loops, and Mirnov coils [15]. We will begin our discussion with measurements at Mirnov coils.
A Mirnov coil array is a set of coils arranged around a poloidal ring just interior to the plasma chamber. The purpose of these coils is to measure perturbations in the macroscopic magnetic field of the tokamak. Here we treat then as though they are radio frequency antennas. Suppose that we have a coil located at x d . We take that the coil is capable of producing a measurement of the time dependent local magnetic field B(x d , t). In practice, we will choose to focus on fluctuations at a single frequency ω and therefore we can define the data we extract from each Mirnov coil, in an abuse of notation, as In practice we cannot measure the field fluctuations over all time to construct the true Fourier transform above, but we will assume that it is sufficient to instead restrict the integral to a short time, or adopt a windowing procedure. The details of this do not concern us here. In this paper, a major task is to predict Mirnov coil measurements for a given state of the pixel model of the current perturbation. In the Bayesian inference literature this is called developing a forward function modelling the diagnostic signals.

Some desirable properties
We will return to the specific problem of determining the full forward function for our problem later but first need to briefly discuss two simple properties which it is desirable for our forward function and full likelihood function to possess.
Then, the forward function D = f(I) is just given by (8). This is both a very common and very useful case. For example, the Biot-Savart law approach to static current distribution detection presented in [4], is of this form. We will be able to construct a variety of models for diagnostic signals of this form as long as we assume that first order approximations are sufficient, which is not an unreasonable assumption in this case. Note that it is tempting to suppose that we can invert, or at least pseudo-invert, the transfer matrix and therefore perform our inference by claiming I = T −1 (D − M). However, this is not usually possible as we normally have more parameters to fit in the state vector than we have pieces of independent information in the data-vector and so the T matrix is normally ill-conditioned. Therefore the inverse does not exist and the pseudo-inverse is often unreliable. We will discuss this further below.

Normality.
When we have a linear forward function, we can further simplify the analysis in the case when our probability distributions are Gaussian distributions. Suppose that the likelihood function P(D|I) and prior P(I) are both multivariate normal distributions when expressed as functions of I. Then, as the product of normal distributions is also a normal distribution, P(D|I)P(I) must also be a multivariate normal distribution. Since the marginal likelihood does not depend on I it follows from Bayes rule that the posterior must therefore be a multivariate normal distribution, as a function of I. This is computationally very useful because it ensures that we can define the entire posterior distribution with only a mean vector and a covariance matrix. This will allow us to analytically determine formulas for the best estimate of the parameters.
We model the likelihood by assuming that we have an underlying deterministic process, given by the forward function, upon which our measurement devices add zero-mean noise. Defining a measurement covariance matrix Σ m we can write the likelihood function as the multivariate normal distribution where we have dropped the normalisation constant, which is currently unnecessary. Similarly, by choosing a prior mean I P and a prior covariance Σ P we can write the prior distribution as Formulas for the posterior mean and posterior covariance which can be derived from the above are available [1,3,7]. However, they all deal with the case of purely real variables. Our state vector, which is formed from the current polarisation parameters J p , is complex. Similarly, our data vector is also formed from complex numbers, specifically the complex magnetic field coefficients. Therefore, the standard solutions are not directly applicable.

Extending to complex values
To resolve the problem explained above we will recalculate the formulas for the posterior mean and covariance for the case of multivariate complex normal distributions. In this case we are admitted a helpful simplification of our complex normal distributions. Specifically, we can assume they are circularly symmetric, that is, if I is a complex vector then P(I) = P(e iϕ I) in which case the normal distributions will follow the specific form presented below instead of the more general forms which can be found in other sources [18]. This assumption is associated to the fact that the global phase of our inferred current distribution is arbitrary, or more accurately corresponds to an irrelevant time shift, and therefore the likelihood function should not preference any particular phase. To replace our real distributions with probability densities that take complex inputs we follow a simple, and standard, prescription of directly replacing any transposes with Hermitian conjugates. Therefore we have that In the complex case the covariance matrices are not required to be real and symmetric, but must be Hermitian and positive semi-definite 2 . So, adopting a linear forward function and applying Bayes rule we can write the posterior distribution as We aim to factorise this into the form whereÎ represents the maximum of the posterior distribution, and is therefore our best estimate of the parameters, and Σ is the covariance matrix associated with the estimate. Performing some algebra and dropping any factors which do not depend on the state vector we can show that Here we encounter a problem which will need some further discussion. It is guaranteed that (15) allows us to calculate Σ −1 but we cannot be sure that Σ −1 will be well conditioned, or really even full rank, and therefore we cannot guarantee that Σ will always exist. In the cases when Σ does not exist we are unable to calculateÎ reliably and therefore cannot estimate the current parameters in the tokamak. This behaviour is associated with the underlying degeneracy of the problem and to solve it we will need to incorporate some prior information.

Impracticality of uniformity.
Consider assuming that we have no prior information on the state of the current distribution inside the tokamak. We can implement this in our model by assuming that the prior distribution is just a uniform distribution. Notably, we cannot define a true uniform distribution because the space of state vectors I is not compact and therefore our posterior distribution would not be normalisable. Traditionally, when performing Bayesian parameter estimator we can resolve this by assuming that there exists reasonable cutoffs at large values above which the prior P(I) vanishes [3]. You also usually then assume that the cutoff you pick is so large that it can be freely ignored as all of the other distributions in the problem will have already approximately vanished by the cutoff. For the sake of argument we will assume that we adopt this approach. In which case, the equations describing Σ −1 andÎ reduce to Suppose that we have N complex parameters in the state vector and that we perform M independent measurements. Then Σ m is an M × M matrix and T is an M × N matrix of rank N. Then T † Σ −1 m T will be a rank min(M, N) matrix. This poses a problem when we perform fewer measurements than we have parameters, when N > M, since T † Σ −1 m T will then be an N × N matrix of rank M < N and so its inverse will not exist. This is the specific problem that a lack of normalisabilty and the degeneracy of the solutions causes.
Since we are attempting to estimate the current in a large number of pixels using the measurements of relatively few Mirnov coils, we will necessarily be using less measurements to fit more parameters and so will always encounter matrix rank limitations when we choose a uniform prior. Therefore it is evident that the adoption of a uniform prior is untenable and we will need to choose a prior which is more selective in order to build a full estimator. There are several common priors which we could choose. Here we will discuss two in detail, the uninformative Gaussian prior, and the CAR prior.

3.4.2.
The uninformative Gaussian prior. The next simplest choice after a uniform prior is that of an uninformative Gaussian prior. By uninformative we mean that the covariances are large enough to not apply any real restriction on the parameters. We take that the distribution is zero-meaned so that our prior is The benefit of this choice is that, if Σ −1 P is full rank, then it is usually true that Σ −1 = T † Σ −1 m T + Σ −1 P will also be full rank and so we will be able to compute Σ reliably.
Technically, when we introduce hyperparameters to Bayesian models which we know nothing about, such as the Σ P matrix, we should integrate, really marginalise, over all possible values of the parameter. However, doing so would transform our posterior from a multivariate normal distribution into a multivariate student's-t distribution [3]. Our calculation procedure relies upon the posterior being normal and so we cannot integrate over the hyperparameters without changing the computation method. To avoid this we will therefore just sample the hyperparameters from a sufficiently large and wide uniform distribution when they are needed. We will demonstrate later that this is an effective, if imperfect method.

The CAR prior.
The uninformative Gaussian prior will allow us to perform Bayesian inference but it encodes none of the a priori information we have about the current distribution in the tokamak. We would prefer to find a different prior distribution, which still produces us a Gaussian posterior, but which includes some a priori information. The CAR model will allow us to do this.
CARs are a class of stochastic model which are designed for the study of random processes which vary in space. In our case they will allow us to include an implicit assumption that the current distribution should be as continuous and smooth as possible. A very comprehensive discussion of CAR models can be found in [19] but a simpler analysis which is more applicable to the problem at hand is presented in [4].
Fundamentally, what makes a CAR a conditional autoregressive model is that the probability assigned to the value in a pixel should be conditioned on the values of the pixel's around it. Suppose for now, that we have a pixel grid of pixels p i and we intend to estimate a univariate parameter I i within each pixel. The extension to the full multi-variate case is largely straightforward and we will discuss the one notable complexity at the end of this section. Let I(Î i ) refer to the full ordered list of I j values with the I i term removed. We can construct a CAR model, by supposing that the probability distribution P(I i |I(Î i )) can be modelled as where {τ 2 i } is an associated list of variances and {β ij } is a matrix of coupling coefficients. We assume without loss of generality that β ii = 0 for all i. We aim to determine the joint probability density function P(I). A detailed discussion of this calculation is beyond the scope of this document but a full discussion can be found in [19]. We will just present the section of the derivation which is relevant to our use case here.
. . x n ) and Y = (y 1 , y 2 , . . . y n ) are both valid elements of our space of parameter vectors. In [19] it is shown that for any CAR process, the following factorisation of ratios of the joint PDF in terms of the conditional PDF holds The benefit of this factorisation is that we can choose Y = (0, 0, . . . , 0), that is as the trivial parameter configuration. This allows us to write the joint PDF for X purely in terms of the conditional probabilities, and a few normalisation factors Then if we suppose that the conditional PDF is Gaussian, we can deduce that the joint PDF must be the product of Gaussians and so is itself a multivariate Gaussian distribution. This is a very helpful result because it ensures a CAR prior will have a Gaussian form and therefore will admit the analytic form for the posterior distribution calculated earlier.
Since we know P(I) is normal, we choose to suppose that we can write it in the zero-meaned form with Q a coupling matrix. Equating this distribution to the factorisation from earlier produces Noting β ii = 0, equating the exponents of the above, expanding the products, and swapping indices in the second sum we have In practice, we will choose for the coupling constants β ij to all be real. So, from the above we can immediately conclude that These conditions can be combined into the single rule Thus, we have shown that in the case of a CAR model, we can construct a joint probability distribution P(I) by only prescribing how the probability of the current parameters should be conditioned upon each other. To proceed further, we need to choose explicit β ij values. If we choose β ij = 1/4 when the pixel associated with I i is adjacent to the pixel associated with I j , and β ij = 0 otherwise. Then, we are free to construct an adjacency matrix W describing the adjacency of the pixels and can recognise that β ij = W ij /4 so that This particular choice of β ij encodes the prior assumption that the most likely current value in a particular pixel is the average of the current values surrounding it. Therefore we are placing higher prior probability upon smoother solutions. This does create special effects at the edge of the plasma, where there does not exist a full set of adjacent pixels, but this will be of little concern to us. The multivariate upgrade of the above is mostly direct. However, it is here where we must make one additional choice. Let p be a pixel, with a current polarisation J p . After flattening the elements of this current polarisation vector are combined into the I current state vector. Suppose that (l, m, n) are such that J p = (I l , I m , I n ). We can now ask the question, 'do we consider I l to be adjacent to I m ?'. We choose to assume that W im = W in = W nm = 0. This means that from the perspective of our prior distribution, we have three separate tokamak pixel grids, one for each physical polarisation direction, which have no fundamental smoothing assumption connecting them. This assumption could be weakened without modifying the formalism but we will not do so here. Notably, whether this is a good assumption in the toroidal geometry, and highly anisotropic conditions of a tokamak is a question which requires further investigation.

Predicting magnetic diagnostic signals using electrodynamics
We want to construct a forward function for the harmonic current perturbation inference problem in a tokamak. That is, we want to compute the measurements we would expect at the Mirnov coil array due to a current perturbation. Since our model for the perturbed current is formed from a set of tubes of oscillating charge we recognise that the field oscillations they produce should be analogous to radio antenna radiation in classical electromagnetism. However, we cannot directly apply the vacuum theory of electrodynamics because the plasma in which our radiating currents are embedded is a dispersive medium. Instead we will need include the response of the plasma to electrodynamic phenomena into our theory.

Maxwell's wave equation in dispersive media
Recall the Maxwell equations in vacuum, which for the case of source distributions which vary as e iωt , can be written as Taking the Fourier transform with respect to x, introducing the vector potential, adopting the temporal gauge, and simplification yields the Maxwell wave equation where A is the vector potential [20].
To describe the interaction of the EM field with a medium, at lowest order, we construct a generalised Ohm's law which describes the dispersive and anisotropic responses of the medium [21,22]. We suppose that there exists a conductivity tensor σ which relates an induced current to the electric field via the constitutive relation In practice, we can construct several different versions of this constitutive equation each of which highlight a different aspect of the relevant physics. We will focus on spatial dispersion, so that we can write the above relationship for an oscillating field in the space frequency domain as We can then separate out the source term into a free and induced part j = j f + j ind , so that we have This is the Maxwell wave equation in dispersive media and it will be the subject of much of the analysis which follows. Firstly, it will be helpful to make a few small modifications. We let k d = ω/c be the driving vacuum wavenumber and adopt both tensor notation and the Einstein summation convention, so that we can write (30) as Then if we define a wave operator Λ ij as we can write our wave equation in the simple form We now define a matrix function G ij such that where δ ij refers to the Kronecker delta. Note that we will usually call G the photon propagator. This nomenclature is taken from relativistic field theory, where an identical function serves the role of characterising the spacetime propagation of particles associated with the EM field, that is of photons. Determining the photon propagator for an arbitrary conductivity tensor is a non-trivial task. We will consider a few cases which are relevant for our discussion here and which can be handled analytically. Note however, that performing this inverse numerically is hypothetically a relatively simple task. The author has not investigated different numerical approaches and considers it to be a topic for future investigation.
The purpose of defining the propagator is that we can compute the field response in the Fourier domain, by multiplying (33) by the propagator on the left, so that we have Observe that the propagator contains an important piece of dynamical information. It describes how the polarisation of the emitted radiation, that is the EM field perturbation, depends upon the polarisation of the current source which generated it. This information allows for the reconstruction of the current polarisation as long as the full magnetic field polarisations are measured, as is standard for Mirnov coil arrays [23]. Note that this reconstruction will be imperfect but this due to the ill-posedness of the full tomography problem not any inherent inability to observe the current polarisation. If the propagator was not invertible there would not be sufficient information for the polarisation to be determined and further model assumptions would be required however.
Taking the inverse Fourier transform of the above, we are able to write the vector potential of the radiating fields in space as However, our Mirnov coil array cannot measure the vector potential directly and instead must measure the magnetic field B i . Fortunately, we can convert one into the other by noting that the definition of the vector potential B = ∇ × A becomes, upon Fourier transformation and adoption of a component form, where ε ijk refers to the Levi-Civita tensor. Which further implies that we can write the magnetic field at a point in the integral form Consider a set of Mirnov coils indexed by d, each located at x d . The oscillating signal observed at each coil is given by We flatten the set of all measurements into a data vector , . . .) T . Now recall the local basis function expansion of our oscillating current density defined earlier, where the sum is taken over all of the pixels in our tokamak grid. We flatten the polarisation three-vectors J p into a state vector If we take the perturbed current as the free current in (38) then upon substitution we have that the measurement at a Mirnov coil is given by This is an element of a matrix vector product of the form D = TI, where the transfer matrix elements are given by the integral expressions in the above. Therefore, we have arrived at a linear forward function of the type we were seeking.

The photon propagator
The first case worth considering is that of vacuum electromagnetic waves. That is, if we assume that our medium has no notable conductivity then the wave operator reduces to There are many different methods for determining the explicit form of the photon propagator given the wave operator: the brute force Kramer rule, recognising that it must live in the span of all of the other rank 2 tensors in the problem, and educated guessing. We will only need to guess here because in this case we can actually determine the propagator in full by only analogy. We observe that, (40) is identical in form to the wave operator for a massive vector field in 3 + 1 dimensional Minkowski spacetime [24]. The only differences are: that our metric is 3 dimensional Euclidean not Minkowski, and that k d takes the place normally occupied by a field mass. The inverse of said wave operator is presented in all standard field theory texts and so we can immediately read off the form of the photon propagator for our problem as If we multiply this by the free wave operator we immediately confirm that it is indeed the propagator associated with (40).

Some notes on field normalisation
To simplify the calculation which follow, it will be useful to make a small observation and a few definitions. Firstly, recall (4) defines the basis currents j i p (b) as Gaussians in x and y and as Fourier modes in the toroidal angle and time. This definition implicitly also defines the Fourier transform j i p (k) of the pixel current density. The existence of the Fourier mode ensures that j i p (k) carries a delta function 2πδ(k z − n/L) where n is the toroidal mode number and L describes the circumference of the tokamak. Therefore, we can freely set k z = n/L whenever we are operating in wavevector space with no loss in generality. As a consequence, we can choose to define a parameterisation of the wave-vector in polar coordinates. We will make use of this repeatedly later. Next we observe that (38) includes an integral over the k z component. This integral will be trivial because of the delta function the current distribution carries, and will contribute only a factor of 2π to cancel the normalisation factor included in the differential element. Therefore under this parameterisation, we can rewrite (38) as where it is understood that k z = n/L and that we take our observations in a single poloidal plane defined by z = 0. Physically, it is these magnetic field values which we observe at the detectors. But we are free to adjust the normalisation of the field as we see fit. This is advantageous, as it will allow us to non-dimensionalise the equation above by absorbing as many factors as possible into the field normalisation. To do this we define a dimensionless wavevector where η = n k d L . We also define a dimensionless position vector by Then, we are free to define a dimensionless propagator by so that the field can be written as Defining a field B i = −i 4π µ0k d B i and changing our integral into polar coordinates we have It is actually the calculation of this B field which was implemented as the forward function for the inference tests which will follow later.

A note on numerics
There are a few observations we need to make about the integral (48) which are of concern numerically. It will be useful to focus on a specific transfer matrix element. The comments made cleanly generalise to the full calculation. Consider a pixel p ν and a detector located at (x d , y d ). In the full current parameter vector I there are three elements corresponding to the three values in J ν = (J 1 ν , J 2 ν , J 3 ν ). Choose to focus on the J 2 ν element and call it I m . The transfer matrix element describing the z direction response of the detector to this current parameter element is Substituting in the definition of the photon propagator, and expanding the sums we can write the above as Hypothetically this is a computable integral, at least numerically computable. The Gaussian kernel e −σ 2 κ 2 /2 strongly regulates the integral at large κ so that it will converge to a finite value. This is actually one of the core motivations for the choice of Gaussian basis functions. However, it is highly oscillatory for large k d which means that a fine mesh is required to ensure stable convergence when adopting general numerical integration methods.
Notably, the integrand is also singular at κ = 1, which does lie in the integration region since κ runs from 0 to ∞. The origin of this singularity is actually the existence of infinitely propagating vacuum wave solutions to the underlying equations. That is, when the wave operator satisfies det Λ = 0, called the dispersion relation, its inverse G does not formally exist and there are singularities present in G(κ). In this case the singularity is a minor issue, as we know its location and therefore can freely compute a Cauchy principal value by approaching the singularity symmetrically from both above and below.
In the context of generalising to models with more complicated propagators, this will become a larger issue as it is a nontrivial numerical task to identify solutions to the dispersion relation. These solutions will be needed in general so that the singularities they create in the propagator can be identified and then 'darted' during the integration procedure.

The cold plasma propagator
The model described above will be a useful first test but it is fundamentally non-physical as it relies upon purely vacuum electromagnetics. A plasma necessarily has very different electrodynamic response to the vacuum and thus a more realistic physical model would be preferred. Here we will present the case of a cold plasma as a pedagogical example of a more realistic physical model. This will not actually be sufficient in practice, as tokamaks plasma are very hot and nonuniform, but it will be sufficient for the proof of concept we are examining in this document. In that, adopting the cold plasma propagator will add one piece of relevant physics, specifically plasma shielding, into our forward function.

The cold plasma conductivity tensor
Suppose we have a uniform cold plasma in the presence of a uniform background magnetic field B = (0, 0, B 0 ), in the toroidal direction. This plasma is composed of several species of particle which we index with µ. We will later choose to adopt a pure deuterium plasma but we retain generality for now.
Detailed derivations of the cold plasma conductivity tensor can be found in any standard text on the theory of plasma waves. Here we choose to adopt the formulation from Swanson [25], in which it is found that where: m µ , q µ , and N µ are the mass, charge, and number density of the particles of species µ; ω is the temporal frequency of the oscillating fields; and ω cµ = qµB0 mµ is the cyclotron frequency of the particle species µ.
Equipped with this conductivity tensor, we can construct the cold plasma wave operator Λ ij as in (32) and then take its explicit matrix inverse to construct a cold plasma propagator. In fact, the cold plasma wave operator is simple enough to admit analytic expressions for its matrix inverse, computable by Kramer's rule. Some consequences of this formulation are explored in [26]. However, computing the location of the singularities in the resulting propagator, so that they can be avoided in the numerical integration, is a non-trivial computational task and therefore we will instead consider a particular limiting case in which the computation is much simpler.
Also, observe that the cold plasma propagator depends upon equilibrium quantities, the toroidal magnetic field for example. This is a general feature of propagator based forward functions and it is this feature which would allow us to incorporate large changes in the equilibrium plasma state into our Mirnov coil measurement predictions. If the equilibrium plasma state changes substantially, and we observe that change, then we can recompute a new propagator in the new equilibrium. Performing this in practice would require a substantial diagnostics integration and so it will not be attempted here.

The high field strength limit
Since the toroidal field strength inside a tokamak reactor is very large it is reasonable to consider the limit in which the background field strength B 0 ≫ 1 [27]. Because the cyclotron frequency is linear in the field strength it will also be very large and as a result 1/ω cµ ≪ 1. Then, we can expand (51) to first order in 1/ω cµ . To do this observe that Substituting these expansions into (51) we have, to first order, that which we simplify by defining some useful symbols so that Then recalling (32) we can construct the cold plasma, high B 0 limit wave operator The propagator in this model is then given by the matrix inverse of the wave operator the explicit elements of which are included in the appendix. We can non-dimensionalise this propagator identically to the vacuum case and can compute the field response at each detector, due to our local basis functions, using (48), with G interpreted as the cold plasma propagator. For the purposes of the example Bayesian inference calculations which follow we will only deal with the case of an n = 0, or really η = 0, mode. This is because, at the time of writing, only the n = 0 case of the cold plasma propagator has been implemented in software.
This completes our demonstration of how the calculation of the response of a Mirnov coil array to the perturbed current density can be extended to include dispersive effects and the full geometry of electromagnetism. Further extensions which include other physics effects are direct. The same procedure can be applied with Λ replaced with the wave operator appropriate to the theory. Hypothetically, non-uniformity could be included by constructing either a perturbative series solution to the same problem or by returning to the space domain and solving the differential equation numerically.

Synthetic diagnostics and reproductions by Bayesian inference
We now proceed to demonstrating the Bayesian inference system discussed earlier in the context of the electrodynamic forward function presented above. Synthetic diagnostic signals will be calculated using the forward function. Noise can then be added to these synthetic measurements. Bayesian inference will then be applied to attempt to reconstruct a hidden current distribution using only the information included in the measurements.

The circular tokamak with only magnetic diagnostics
Consider the case of a tokamak with a circular cross section and a minor radius of 3 m which we partition into a grid of 816 pixels. For the purposes of what follows we will confine ourselves to the case of an array of 30 Mirnov coils which are arranged uniformly around the machine at a radius, relative to the centre of the machine, of 3.5 m. An image of this setup is presented as figure 2. Note that we experimented with minor perturbations away from this highly symmetric configuration and found that the inference system was insensitive to the shape. As a result there is no practical loss in generality by restricting to this simple case. In the following we will consider only modes with a driving vacuum wavenumber of k d = 0.02, corresponding to a frequency of approximately 1 MHz. This choice is motivated by our scientific interest in radio frequency modes, but it is formally arbitrary. We will also only present results from simulations with toroidal mode numbers of n = 0. Higher toroidal mode numbers were experimented with, but the behaviour was largely the same and so we will not explicitly present the results of those simulations. Also, note that, as a general rule, in what follows we take the measurement error to be Gaussian noise with a variance of 5% of the signal magnitude.

Sensitivity of the detectors across the tokamak
At this point it is interesting to briefly visualise the transfer matrices as this will explain some later behaviour. Consider a Mirnov coil d located at (x d , y d ). We intend to construct a greyscale image indicating how sensitive that coil is to each pixel in the grid. To do this we suppose that the current is purely  where the mapping from subplot to element of this matrix is obvious. Note that all of the images are normalised so that their brightest pixel has value 1.
in the z direction, that is J p = (0, 0, 1) for each pixel. If we consider the response of the detector to each pixel independently then for a current in a given pixel we can predict the full field response B d of the detector due to that current. Setting the brightness of each pixel to be proportional to the norm of the field strength observed at the detector, due to a current in that pixel, we can make an image representing the relative sensitivity of the detector to currents in each pixel. Figure 3 demonstrates this for the case of a detector located at (3.5, 0) with the medium modelled as a high field strength cold plasma. Six images are shown, which demonstrates how the sensitivity changes as the density of the plasma increases. The images are arranged by increasing density ordered from top-left down to bottom-right. The fact that the pixels are all brighter near to the right edge indicates that our magnetic diagnostics are more sensitive to currents near to the edge of the machine. This is expected because radiating solutions to the vacuum Maxwell equations in our geometry should fall off in magnitude as 1/d near to the machine, where d is the perpendicular distance from the source pixel to the detector. This suggests that using only magnetic diagnostics we will likely have high confidence in our estimate of the perturbed current at the edge of the machine but will be much less confident in our estimate at the centre of the machine.
Recall that we mentioned earlier that the piece of physics which our cold plasma propagator captures is plasma shielding. It is now that we can see this effect in action. From figure 3 we observe that, as the density of the plasma is increased, the rate at which the sensitivity falls with the distance between detector and source pixel is increased. This is a manifestation of the fact that in a cold plasma the waves excited by our tubular antennae are evanescent rather than infinitely propagating and as such they have the characteristic exponential fall off in amplitude of an evanescent wave [20].

Inference of test patterns
For the purpose of demonstrating the Bayesian inference of current perturbations in full we will consider three test distributions, two based upon the cylindrical harmonics and one unrealistic checkerboard test pattern, which is included to analyse the effect of a CAR prior on distributions which are discontinuous. The two cylindrical harmonic distributions are written, with their time dependence suppressed, as where J η (r) refers to a Bessel function of the first kind, in the standard notation [28]. The choice of these particular functions is technically arbitrary but we will see later that their comparison will allow us to demonstrate an interesting behaviour. The two poloidal mode numbers 3 and 4 in the j y direction are intended to model the multiple mode numbers which are standard in a toroidal Alfven eigenmode.

The case of an uninformative prior
As a first numerical experiment we will consider the uninformative Gaussian prior introduced previously. For our purposes here the covariance of this Gaussian will just be sampled from a uniform distribution. This will have the effect of making our problem formally well-posed at the cost of losing some physical meaning. Figure 4 presents the reproduction of j 1 distribution produced using this uninformative prior. As an aside, recall that we defined the estimateÎ to be the maximum of the posterior P(I|D). It is the image of the perturbed current associated with I which we refer to with the term reproduction. Also, formally the inferred state is a vector of complex numbers; rather than produce two separate images we will simply take the real part here. Promisingly, the reproduction does capture some of the features in the hidden distribution. The phase oscillation has been well reproduced and the bending structure visible in j y is partly captured. However there are three major problems: the images are very noisy, there is a ghost image which has appeared in the reproduction of j x despite the data being generated for the case of no current there, and finally the reproduced distributions are all very flat in the core, that is in the centre of the machine. The extensions which follow below will be able to solve the first two of these problems.
As an aside, it is instructive to note the origin of the j x ghost image. This image emerges because the forward function governed by (38) couples the effects of j x and j y together. This is because oscillating currents in x and y both produce field fluctuations whose direction, in the vacuum case with a toroidal mode number of zero, is given by the right hand rule. This implies that the radiation they produce will be polarised only in the toroidal z direction. In contrast, the z direction currents produce field oscillations which have polarisation components in both the x and y directions. As a result the j x and j y currents are strongly coupled together, that is the inference system sees linear combinations of them as equivalent, and are de-coupled from j z . As the toroidal mode number is increased and the conductivity of the medium is changed the j z current becomes coupled with the other two directions.
Before we move on to solving the problems mentioned above it is interesting to look at the reproduction of the checkerboard pattern. This is presented as figure 5. Again, some of the structure has been captured but it is far from a perfect reproduction. The phase oscillation at the edge can be seen but the straight line edges of the non-zero current regions have not been well captured. This is likely due to the low sensitivity of our magnetic diagnostics to current perturbations in the core.

The case of a CAR prior
We now turn to the problem of solving the problems in the perturbed current images produced using an uninformative Gaussian prior mentioned earlier. We will begin with the issue of image noise. Recall that earlier, we constructed the formalism required to extend the CAR prior to the context of complex valued state vectors. The advantage of the CAR prior was that it encoded an a priori assumption that the current density should be smooth in space. Our notion then is that by choosing such a smoothing prior, the image noise observed above will be reduced. To test this, we adopt a CAR prior, and attempt to infer the same modes as presented earlier. We will discuss the results from each mode in turn.
Consider figure 6 which presents the reproduction of the j 1 distribution. We observe that the inferred radial structure is similar to that of figure 4 but is, as expected, much smoother. However, the radial structure near the core is still somewhat poorly captured, and looking at the residual plot in j y we can see that the bent shape of the mode has not been reproduced. Also, the ghost j x image is still present. Regardless, this is a positive result since adopting a CAR prior has resolved one of the initial problems with our Bayesian inference system.
Note that, it is not surprising that our estimate of the state is very flat near to the core of the machine, for two reasons. Firstly, as mentioned earlier our detectors are much more sensitive to the edge of the plasma and can therefore only provide very limited information about the state of the core. Also, and more relevantly here, choosing a CAR prior will force our solutions to be flatter in the core than we may otherwise expect. This is because CAR priors tend to prefer flatness. Similar  behaviour was observed by Svensson when they studied the static equivalent of our problem [4]. Figure 7 presents the reproduction of the j 2 distribution when we assume a CAR prior. The difference between j 1 and j 2 is the addition of non-trivial modes in both j x and j y simultaneously. We see that these modes have failed to be observed, while the j z mode has been reproduced identically to the previous results. This is due to the x-y coupling which was noted earlier. We will see later that we can improve the ability of our system to resolve modes of this type but to do this we will need to integrate information from more diagnostic signals into the estimate of the state vector.
Finally, figure 8 presents the reproduction of the checkerboard distribution for the case of a CAR prior. Again, as expected we observe that the CAR prior has preferenced a smooth reproduction. In this particular case we see one of the major problems with such a smoothing prior; it biases towards smoothness. Since we are trying to observe a distribution which is not smooth, the effect of such a prior is to blur the true sharp geometric lines of the underlying test distribution into the blob which we observe.

Hyperparameter fitting via the L-curve and evidence maximisation methods
Recall that the CAR prior includes a parameter τ 2 which describes the variance of a single estimated parameter conditioned upon its neighbours. Varying τ 2 changes how much we preference smoother solutions during the inference procedure. Very high τ 2 corresponds to us allowing non-smooth solutions. This parameter is the single global parameter of the joint probability distribution P(I) and we need some method to specify it. As was mentioned earlier the proper Bayesian approach to handle a hyperparameter of this type is to add it to the state vector and estimate it by posterior maximisation, or to integrate over it [3]. However implementing either of these approaches would make the posterior not normal and they are therefore impractical. Instead, we will fit the parameter after the fact. We will discuss two methods here: one heuristic approach adapted from the literature of Tikhonov regularisation, and a simplified Bayesian approach. Note that in the following subsections it is understood that the test current density is always j 1 .

The L-curve method.
There exist methods for performing ill-posed fitting other than the Bayesian approach adopted here. One of the more common approaches is performing least mean square fitting with a regularisation term. This term can even be selected so that smooth solutions are preferred in a process known as Tikhonov regularisation. This is a well studied process and as such there exist methods for fitting the smoothing parameter, which is analogous to our CAR variance, used in Tikhonov regularisation from only the measured data. Here we adapt one such method, called the L-curve method [12].
The L-curve method is a graphical procedure which involves plotting the regularisation parameter against the mean squared error in the estimated measurements χ 2 m = ||D − TÎ|| 2 as the smoothing is varied. The curve that is formed is usually L shaped and describes a trade-off between over and under smoothing. It is accepted that choosing the smoothing parameter at the bend in the L, the point of maximum curvature, produces a 'ball-park' estimate of the best smoothing parameter [29].
In our case we will use the τ 2 quantity as our in-place replacement for the Tikhonov smoothing parameter and the norm of the mean state vectorÎ †Î as the regularisation parameter. We can then construct an L-curve and investigate both whether it has the L-shape characteristic of Tikhonov regularisation, and whether the point of maximum curvature gives a good estimate for the optimal τ 2 . To do this, the L-curve was computed for our vacuum electrodynamics forward function with a measurement error of 5% for an n = 0 mode. Since we are testing with purely synthetic diagnostics, we are able to check the real error in the reproduction. We do this by also computing χ 2 S = ||I h −Î|| 2 , where I h is the flattened vector of the hidden current parameter j 1 .
Both the L-curve and the χ 2 s metric are presented on figure 9. We can immediately observe a version of the expected L-shape, and very notably see a clear point of maximum curvature. From the actual χ 2 s graph we observe that there exists a well-shaped region in which τ 2 is fit acceptably well and the estimated current perturbations are very similar, this is the flat bottom of the well. Notably, the point of maximum curvature in the L-curve τ 2 ≈ 5.7 × 10 −1 lies within the well of the χ 2 s curve, or really on the rising edge. This is a good result and suggests that when using a CAR prior the L-curve method can be used to fit a reasonably good estimate to the variance hyperparameter.
To check this more completely, the same test as above was run over a range of measurement errors. The results of this are presented as figure 10. Note that the τ 2 value listed in the legend is the point which produces the black dot shown on each curve indicating a rough estimate of the point of maximum curvature. Again, we can observe, by pulling the values of τ 2 at the bend and comparing them to the associated values on the χ 2 s curve, that the bend lies within the well. For all of the cases tested this same result was observed.
The result above is promising because it does provide us a method to perform hyperparameter fitting from data only available externally. This is because the L-curve is computed from only the real measurements, that is, none of the hidden data is required. However, it does have a few problems. Most notably, the curves can be less L-shaped at times and therefore it can be difficult to determine the point of maximum curvature [12]. The other problem is that it lacks any formal theoretical  justification, in our Bayesian case, and while it is not surprising that it does work, given the similarities between our Bayesian procedure and Tikhonov regularisation, we would still prefer a more complete Bayesian method whose use can be motivated from the principles of probability theory.

Evidence maximisation.
In a complete Bayesian formulation you would determine the hyperparameters the same way as we did the physical model parameters. That is we would maximise the posterior distribution P(τ 2 |D) ∝ P(D|τ 2 )P(τ 2 ). Rather than completing the somewhat complex process of selecting a good prior here, we will instead just maximise the distribution P(D|τ 2 ), called the evidence. This approach is preferred in the literature for its relative simplicity and numerical ease [11]. Here, we will be able to numerically calculate the evidence and just read off its maximum value. Note that this procedure will not give us the true best choice of hyperparameter, but, as above, it will give us a very good 'ball-park' estimate, which is more than sufficient for our purposes.
To do this fitting then, we need to calculate the evidence P(D|τ 2 ). This is a relatively straightforward procedure, in which we just marginalise over the state vector I Since our distributions are circularly symmetric complex Gaussians, with complex covariance twice the original real covariance, we can quote their density functions as where we have collapsed any geometric factors into the coefficients out the front [18]. Performing some algebra, and taking the integral we can show that the evidence is of the form where: Σ andÎ are as defined before and C 3 is just another geometric factor which does not depend upon the data or the covariances. Now, we want to determine the maximum of this evidence function for a fixed sample of the data, as we vary over the hyperparameter τ 2 . In practice we actually maximise the logarithm of the evidence and also drop any factors which do not depend upon the hyperparameter. Therefore, we can write the objective function we intend to maximise as log P(D|τ 2 ) = − log det(2Σ P ) − log det(2Σ) To test the maximisation of this quantity as a method of hyperparameter fitting we plot this function against the hyperparameter for the case of a vacuum n = 0 mode, while also computing the mean square error in the reproduction. Figure 11 presents this for the case of 5% measurement error. Looking at the two curves we can see that the maximum of the evidence, while not aligning perfectly with the global minimum of the true reproduction error does lie in the well region within which all reproductions are qualitatively similar and  acceptably good. This is similar to what was observed with the L-curve method, but is a better result because the peak in the evidence is a much more pronounced feature than the point of maximum curvature in the L-curve. As before, we can then check how the behaviour changes as we vary the measurement error. Figure 12 presents the same plots as before but for a set of measurement errors. We see that at all measurement errors tested the maximum of the evidence does lie within the well in the χ 2 plots. The only difference is that as the measurement error increases the magnitude of the evidence falls and the peak becomes slightly less pronounced.
For the purposes of comparison, table 1 presents the τ 2 value fit by both the L-curve and evidence maximisation approaches for each of the measurement errors tested. Note that several values of τ 2 are repeated because the scans in questions were performed over a relatively coarse logarithmic  scale of 5 points per decade. Higher resolutions can be utilised when required. Notably the two methods produce values of τ 2 which differ by ≈3 orders of magnitude. However figure 12 indicates that the quality of the reproduction is insensitive to the precise value of τ 2 within an ≈4 order of magnitude range, within which the results from both methods lie, and so the two methods still perform similarly in practice. The numerical tests performed do indicate that evidence maximisation is an effective method for fitting the CAR variance hyperparameter. In fact, because it produces similar results to the L-curve method, but is more formally justified, and its peak is more pronounced we suggest that the evidence maximisation approach should be taken as the preferred method for hyperparameter fitting.

Testing the cold plasma model
Now, we discuss the effect of adopting the high field strength limit cold plasma propagator (86) for both the generation of our synthetic data and the inversion problem which follows. Figure 13 presents the reconstruction of the j 1 current density which is produced when the cold plasma model is adopted. We see that our reconstruction indicates that there is no perturbation to the current density except at the very edge of the machine. This is because the plasma shielding further reduces the amount of information that our Mirnov coil measurements contain about the core of the machine and as a result the inference system cannot detect the radial structure in the core. This is an amplified version of the flatness problem which was mentioned earlier. This is representative of a serious practical problem which restricts our ability to use Mirnov coil measurements alone to infer the radial structure of current perturbations. Fortunately, as discussed below, we are not restricted to incorporating information from only magnetic measurements.

Soft x-ray diagnostics
There are several different approaches we could possibly take to resolve some of the reconstruction problems noted above.
The simplest is to integrate information from several different diagnostic systems into the state estimate. This ensures that we have several different information sources, from which the estimate is derived. One of the chief advantages of the Bayesian inference framework for parameter estimation is that it is amenable to such integration. To demonstrate that this is possible we will briefly consider incorporating the measurements of soft x-ray detectors into the inference system.

Predicting fluctuations in measured bremsstrahlung radiation intensity
The fundamental physics concept behind soft x-ray diagnostics in plasma physics is that an ionised gas will emit xrays in the form of bremsstrahlung radiation [20]. Importantly, the intensity of emitted bremsstrahlung radiation, at a given xray photon frequency ε(f x ), can be shown to be proportional to the square of the particle number density N µ [15]. So, for the case of a pure deuterium plasma, we model the intensity of bremsstrahlung emission at photon frequency f x , at a given position and time with the equation where g(f x , T, N i ) is a function characterising the emission profile with respect to x-ray frequency, temperature T, and impurity concentration N i . We will assume that the T and N i are uniform across the plasma. This is a highly unrealistic assumption but it is sufficient for the demonstration we provide here. With knowledge about the equilibrium T and N i profiles, obtained from other diagnostic systems, this model could be easily extended by weakening the uniformity assumption. Note that in what follows we suppress the dependence on T and N i . The claim upon which the use of this diagnostic rests is that a perturbation in the current density will be associated with a perturbation in the number density in the plasma. That fluctuating density should then cause a fluctuation in the intensity of the observed x-ray radiation at the same frequency as the underlying current perturbation, the observation of which provides information about the perturbation which created it.
Recall that our model describes a current density of the form where j 0 (x) describes the equilibrium current density and δj(x) defines the perturbed current density which we aim to infer. We want to predict how the fluctuating current density can product a fluctuating rate of bremsstrahlung emission. In practice this is quite easy because the charge density and current density are related by the continuity equation Substituting in the current density and solving the resultant differential equation in time, noting that the equilibrium current density must satisfy ∇ · j 0 = 0, we can write the charge density as where again ρ 0 refers to the equilibrium charge density. In the case of a plasma the charge density is entirely constructed from the particles which constitute the plasma. So, again if we assume that we have a pure deuterium plasma 3 , then the particle density can be written as Substituting this form into (66), and assuming that the perturbation to the current is small enough that we only need to consider terms which are first order in δj, we find that we should expect a harmonic perturbation in the emissivity of x-rays with phasor magnitude Recall that in our model, we write the total perturbation to the current density as a linear combination of basis functions which are local to each pixel in our grid. Indexing the pixels by p we write Therefore, the x-ray emissivity perturbation δε and current polarisations J p , the quantities we are attempting to infer, are linearly related through (71) and (72). It is this fact which allows us to construct a linear forward function to predict fluctuations in the x-ray intensity observed at a detector.

Line integrating the local basis functions
The soft x-ray diagnostic we are attempting to model here is an example of a line integrated measurement. These are measurements usually associated with cameras and are a general feature of tomography systems. The measured x-ray intensities are line integrals of the emissivity because, if the camera can only see the machine through a pinhole, then the only photons which arrive at the detector are those which are emitted at points lying on the narrow wedge of the machine with a direct line of sight through the pinhole to the camera. Jardin et al [12] provide a detailed discussion of the relevant geometry. Here we will simply state the primary result.
Consider an x-ray detecting camera, positioned on the boundary of the tokamak, which can view the interior of the machine through a pinhole. The intensity of x-rays, at frequency f x , which is incident onto the sensor is given by where we are implicitly integrating over the ray extending from the detector, through the pinhole, and then out to infinity. Substituting in (71) we observe that we should expect a harmonic perturbation in the x-ray intensity observed at the detector of the form which can be written as a linear combination of the current polarisations As a technical note, we should also integrate over f x to determine the total x-ray intensity summed over all possible frequencies of the absorbed photons. We choose to only do this implicitly and just drop the f x dependence in the above. So, suppose we have a specific set of x-ray detectors indexed by d with pinhole positions (x d , y d ). Define the unit vector normal to the machine at There is a continuous set of rays through each pinhole which will impact upon the d detector and we can describe a specific one by the angle θ that it makes with the normal vectorn d . We can then choose to approximate this set by taking a finite subsample of the angles {θ i }. If the camera sensor is 'long' then the intensity associated with each of these rays can be separated and constitutes an independent measurement. An example set of rays is presented in figure 14. Define δI di as the perturbation in the intensity observed at detector d along the ray θ i . It is a straightforward analytic geometry exercise to demonstrate that (75) predicts an explicit representation for δI di of the form As a technical note, we should really cut off the above integral over a ray when that ray collides with the edge of the tokamak opposite to our detector. However, since the basis functions are almost perfectly localised to their pixel, regions outside of any explicit pixels contribute negligibly to the actual value of the integrals. Therefore taking the upper limits to be ∞ is an acceptable approximation, which we adopt for convenience. Finally then, we define vector valued geometry factors so that the fluctuation of x-ray intensity at a detector can be predicted with the linear forward function If we again flatten both the set of current polarisations into the tokamak current state vector I, as before, and also flatten the set of measured x-ray phasors {δI di } into a single x-ray measurement vector D x , then (78) defines a transfer matrix T x akin to that developed in section 5 for magnetic diagnostics. The perturbation in our soft x-ray detectors can then be predicted with an expression of the form The transfer matrix T x was calculated numerically for the case of an array of eight soft x-ray detectors spaced uniformly around a circle of radius 3.7 m. The field of view of each detector was segmented into 23 rays. It is this particular configuration of detectors which is presented in figure 14 and which was used in the reproduction tests which follow.

The Bayesian formulation for the integration of several diagnostics
One of the core advantages to choosing Bayesian inference as the tool we use to perform the inverse step of our problem is that it in a Bayesian framework it is simple to integrate the measurements from several different diagnostic systems into a single estimate of the state I. Suppose for example that we had two different diagnostic systems which produce measurements D and D ′ . We have constructed likelihood functions for both of these diagnostic systems and we also know that they are independent. Then it is straightforward for us to construct the probability density for I conditioned on our observation of both D and D ′ , that is P(I|D, D ′ ). All we need to do is apply Bayes rule twice. Observe that performing one Bayesian update we have Since the measurement D ′ must be entirely determined by the state vector I, we gain no information by conditioning the likelihood function upon the observation of D.
That is P(D ′ |I, D) = P(D ′ |I). Therefore, we recognise that we can perform a second Bayesian update to construct the posterior which is written entirely in terms of the two likelihood function and the initial prior P(I). To integrate many independent diagnostic signals we just iterate this procedure for each new diagnostic that we add.

Analysis of the effect of including the soft x-ray diagnostic information
We are now in a position to integrate the measurements from the soft x-ray diagnostic into our estimate of the perturbed current distribution. We do this assuming a CAR prior and will consider both the vacuum and cold plasma models which were discussed earlier. The analysis of the improvement to the reproduction performance which follows is qualitative. Numerical metrics can be defined which quantify the improvement but are not instructive in this context and so will not be explicitly discussed. Firstly, consider figure 15 which presents the reproduction of the j 1 test distribution for the vacuum model with the soft x-ray diagnostic integrated. The j x ghost image is still visible but has been heavily attenuated by the information provided by the x-ray measurements, which are not subject to the same xy coupling problems as the magnetic perturbations. Also, the offset and bent structure of the j y distribution is much more clearly visible. This is because the x-ray measurements contain much more information about the state at the core of the machine since they are not subject to the same 1/d sensitivity fall off as our magnetic measurements. Instead they are only sensitive to the lines of sight, but since those lines of sight extend across the whole machine they can observe the radial structure near the core more clearly than the magnetic diagnostics. Now, consider figure 16 which presents the reproduction of j 2 we obtain upon including x-ray diagnostics. We again observe that the quality of the reproduction of j x and j y when compared to figure 7 is much improved. The reduction in the  strength of the x-y coupling has allowed the inference system to detect modes in j x and j y simultaneously, a feat which was not possible using only magnetic diagnostics.
Finally consider figure 17 which presents the reproduction of j 1 when the medium is modelled as a cold plasma. Plasma screening still has the effect of ensuring we produce a noticeably poorer reproduction than we would expect in the vacuum, but more of the radial structure is visible with x-ray measurements incorporated than when we tested purely magnetic diagnostics. This suggests that Bayesian inference of the radial structure of current perturbations is possible in a plasma, despite plasma shielding limiting the effectiveness of magnetic diagnostics, but many diagnostics will be required to construct a good reproduction of the current.

A MSE diagnostic
We now proceed to discussing the integration of measurements of a third diagnostic into our perturbed current estimate. Specifically, we model a simple version of the MSE diagnostic here. This diagnostic analyses the polarisation of the light emitted by species injected into the plasma as a neutral beam. This polarisation depends upon the magnetic field at the point of the photon emission and so can provide an effective internal magnetic probe. A detailed study on the origin of this effect is beyond the scope of this document and will be omitted. Instead we just quote that, if the light is emitted at a point x along the neutral beam then the observed polarisation γ will be of the form where R, Z and, θ refer to the toroidal coordinates and the coefficients A i are depend on the geometry of both the neutral beam and the viewing optics [7]. The use of A ′ s as the notation for the coefficients in (82) is convention. Note that they are completely unrelated to the components of the vector potential despite the use of similar notation.
We computed the dependence of the magnetic field on the harmonic current perturbation above. We now extend this to predict a first order perturbation to tan γ. Recall that the field consists of an equilibrium field and a field perturbation where δB can be predicted by computing a transfer matrix T B using (43). So, we just need to determine a matrix T MSE which describes the first order variation of the perturbation to the polarisation to the field perturbation. That is we want to construct a first order relation We can do this by simply expanding tan γ in a multivariate Taylor series around B 0 . For convenience we define B). Then, expanding to first order and extracting the transfer matrix we find that Notably, (85) is a function of both the geometric coefficients but also of the equilibrium field. Thus, in order for the MSE diagnostic to be integrated into our perturbed current tomography estimate we need to also integrate the equilibrium field, a further model data fusion than is presented here. Such an equilibrium is routinely available through either equilibrium reconstruction, for example with EFIT, or Bayesian equilibrium tomography [4]. For our purposes here we will just use a Soloviev solution to the toroidal MHD equilibrium.
To perform a numerical test to incorporate the MSE diagnostic we need to define model geometry for the neutral beam and the viewing optics. We can then compute the geometry coefficients. Detailed discussions of their derivation can be found elsewhere [30]. We choose to consider both the neutral beam and viewing optics to be restricted to the mid-plane and we also adopt a radial neutral beam for convenience, that is a beam orthogonal to the toroidal field.
As before, we can then compute generate synthetic data to model the observed polarisation for a set of lines of sight. We arbitrarily include a set of 40 lines of sight. These synthetic measurements are then added as a third data vector D MSE and are incorporated into the perturbed current estimateÎ with a Bayesian update. As a test, we can then look at the reproduction of the j 1 current configuration. The inferred x(R) and y(Z) currents do not visually change from the inclusion of this information but there are minor changes in the Z(θ) current. Figure 18 presents the inferred, hidden, and residuals for the toroidal perturbed current for this test. We can see from the residual plot, that the reconstructed perturbed current is more accurate along a horizontal line running through the middle of the image. This line is coincident with the neutral beam in the model and so demonstrates that the inclusion of MSE is improving the reproduction.
While the estimate of the perturbed current was only slightly change by the addition of the MSE diagnostic, the confidence of the estimate is non-trivially modified. Figure 19 presents the diagonal components of the posterior's covariance matrix with MSE included. These components quantify the uncertainty in the estimate of each element of I, with higher values corresponding to higher uncertainty. The key feature is the large horizontal line along which the variance is lower. This indicates that we are more confident about the current values along said line, and it is again notably coincident with the neutral beam used for the MSE diagnostic. So, including  the MSE diagnostic increases our confidence in the estimated current along the beam.
Note that for the numerical experiment discussed above a measurement noise of 1% was used rather than the 5% adopted earlier. At higher measurement noise levels the variance images are more uniform because the measurement noise covariance becomes the dominant contribution to Σ. Hence, we chose a lower noise level for this simulation to improve the contrast in figure 19 and therefore make it easier to discern the effect of the integration of an MSE measurement into the posterior.

Conclusion
The development of practical toroidally confined fusion plasmas remains a problem of scientific interest as it has been for well over half a century. However there are many physics and engineering challenges which must still be overcome before a functioning fusion power plant can be constructed. The diagnosis of standing wave modes in real time is one such challenge. In this document we have demonstrated, at least in a preliminary fashion, that performing perturbed current tomography via Bayesian inference is a viable method to diagnose wave modes.
We have shown that it is possible to extend methods developed previously to describe static current distributions in tokamaks to the case of imaging harmonic perturbations to the equilibrium current density. Specifically we extended the local basis function expansions which are preferred for modern tomographic imaging systems to the realm of phasor valued current distributions. We also demonstrated that the CAR prior, which had been used previously to preference smooth reproductions of static currents, could be modified to encode similar smoothness assumptions in the oscillatory case.
We demonstrated that the signals of magnetic diagnostics could be predicted from the pixel model for the perturbed current density. Specifically, we showed that a propagator method could be constructed which allows us to write integral solutions for the oscillating field produced by harmonic current perturbations. This method is sufficiently general to solve for EM wave amplitudes in most dispersive and non-uniform media. The conductivity tensor of the medium just needs to be specified. We have demonstrated this for the vacuum and a cold plasma, but the extension to a magnetised plasma is straightforward if numerically complicated [31]. Further, for the case of linear media, the predicted measurements take the form of a linear transformation of the vector of coefficients defining the local basis function expansion of the perturbed current density. These linear models are preferred for real time Bayesian inference calculations because the posterior can be computed analytically.
We experimented numerically with the use of Bayesian inference for the reconstruction of the current density from synthetic data. It was found that after adopting a CAR prior, the perturbed current could be reconstructed using only magnetic diagnostics but imperfectly. The strong coupling between directions created by the geometry of electrodynamics, caused extra ghost current images to be inferred. We then demonstrated that this unfavourable behaviour could be removed by integrating measurements from multiple diagnostic signals into a single estimate of the state. Specifically, we showed that fluctuations in the intensity at soft x-ray detectors caused by perturbations to the current could be predicted. These soft x-ray measurements could then be incorporated as a synthetic diagnostic. Performing Bayesian inference of the perturbed current using measurements from both magnetic and x-ray diagnostics it was found that the ghost images were largely removed and the ability to resolve the spatial structure of modes nearer to the core of the machine was also improved. We also presented two different approaches for the fitting of the CAR hyperparameter, and adapted L-curve method, and an evidence maximisation approach and found that they provided comparable results when tested numerically.
Finally, we constructed a forward function for the MSE diagnostic and incorporated in into the estimate of the state. This was observed to have a minor effect on the state estimate but it did increase the confidence in the estimate along the line of the neutral beam. This tool will likely be required to construct detailed estimates of the current perturbation near to the core of the machine.

Further work
Perturbed current tomography is a field which is arguably still in its infancy. This is especially true when discussing the application of Bayesian inference to the problem. Therefore, there are numerous avenues which could be studied as novel extensions to this work.
Firstly, as we have seen, the reproductions of perturbed current distributions are substantially improved when several diagnostic signals are integrated. Therefore, it would be preferable to integrate as many different diagnostic signals as possible into the estimate of the perturbed current. We constructed forward functions to predict magnetic, soft x-ray, and MSE diagnostics here but simple extensions are possible to incorporate other diagnostics such as polarimetry [15].
While the formal integral solution to the Maxwell equations used here is a very general construction, it does have substantial drawbacks. The largest being that it is very difficult to integrate either non-uniformity or reflecting boundary conditions. For our constructions here we largely ignored these problems, but they could be incorporated into a more complete forward function designed to predict Mirnov coil measurement in the full geometry of a tokamak. For example, the Maxwell equations in dispersive media could be solved, for the case of harmonic currents, using pure numerical analysis and without any need for computing Fourier transforms. Another possibility is to adapt well studied methods to handle high frequency wave propagation in dispersive media via the WKB approximation and ray tracing [25].
Here we have demonstrated that we are capable of inferring perturbed current densities from synthetic data but this is no guarantee that our inference system will produce correct results when input real data. There will inevitably be several sources of extra error, such as un-modelled physics, and unpredictable noise covariance, which are likely to bias the inference system away from the true solution. To determine if this is the case the inference system needs to be applied to real data measured from a tokamak experiment.
In this document we discussed only the uninformative Gaussian and CAR priors. Alternative priors do exist such as the GP prior, which provides another method to enforce the smoothness of the current density [13,32]. However, the application of these alternative priors to perturbed current tomography has not yet been explored in sufficient detail to warrant their inclusion here and instead we consider this to be an avenue for further research.
Finally, we restricted ourselves to the case of linear and Gaussian likelihood functions here, but this is not really an acceptable restriction. To incorporate more interesting diagnostic signals and more subtle physics effects, that is those which depend upon a power of the current perturbation higher than one, non-linear and non-Gaussian probability distributions will be required. It is still possible to apply Bayesian inference to estimate the perturbed current densities using these more general tools, but it is no longer possible to analytically compute the posterior distribution. Instead numerical tools such as Markov Chain Monte Carlo (MCMC) sampling are needed [3]. Therefore, the implementation of the tools developed here on one of many readily available Bayesian inference packages, which can perform MCMC, is a necessary first step in incorporating information from non-linear non-Gaussian processes to our perturbed current tomography framework.

Data availability statement
All data that support the findings of this study are included within the article (and any supplementary files).
Then the g ij elements are given by