Robust sparse linear regression for tokamak plasma boundary estimation using variational Bayes

Precise control of the shape of plasma in a tokamak requires reliable reconstruction of the plasma boundary. The problem of boundary estimation can be reduced to a simple linear regression with a potentially infinite amount of regressors. This regression problem poses some difficulties for classical methods. The selection of regressors significantly influences the reconstructed boundary. Also, the underlying model may not be valid during certain phases of the plasma discharge. Formal model structure estimation technique based on the automatic relevance principle yields a version of sparse least squares estimator. In this contribution, we extend the previous method by relaxing the assumption of Gaussian noise and using Student’s t-distribution instead. Such a model is less sensitive to potential outliers in the measurement. We show on simulations and real data that the proposed modification improves estimation of the plasma boundary in some stages of a plasma discharge. Performance of the resulting algorithm is evaluated with respect to a more detailed and computationally costly model which is considered to be the “ground truth” The results are also compared to those of Lasso and Tikhonov regularization techniques.


Introduction
The tokamak is one of the most promising concepts for future thermonuclear fusion reactors. In a tokamak, hot plasma is confined by a combination of strong toroidal and poloidal magnetic fields. The almost negligible weight of the plasma dictates that magnetohydrodynamic (MHD) equilibrium, i.e. a force-free state, describes its macroscopic evolution. The MHD description also includes the plasma boundary (the so-called last closed flux surface), which must be controlled during a tokamak discharge because of performance and safety reasons. Realtime feedback control is necessary since the plasma behaviour cannot be reliably predicted, particularly because the equilibrium is unstable. Such a feedback control requires the plasma boundary as an input in real-time. See e.g. [1] or [2] for more details.
One of the possibilities for real-time plasma shape reconstruction is offered by magnetic sensors near the plasma boundary. However, full reconstruction of the equilibrium magnetic field (which includes the plasma shape) is a complex task due to non-linear behavior of the plasma. In [3], it was shown that the plasma boundary can be modeled by a linear combination of a specific kind of basis functions. Since the basis functions are known, the estimation problem can be formulated as a simple linear regression with a potentially infinite number of regressors. However, many of these regressors will be uninformative or extremely sensitive to noise. Thus, our task is the selection of a relevant subset of the basis functions of the model. This task is a well studied problem in statistics and optimization. Examples of classical methods are the Tikhonov [4] or the Lasso [5] regularizations. Crucial problem of these methods is selection of the tuning parameter. This problem is elegantly solved by the Bayesian approach which treats all tuning parameters as unknowns and estimates them from the data. Therefore, we focus on the Bayesian model known as automatic relevance determination (ARD) [6]. It is closely related to scaled l1 optimization [7]. In effect, the penalization forces weights of spurious regressors to zero, effectively removing them from the regression. This approach was shown to be a suitable model for estimation of the plasma boundary in [8].
In this paper, we extend the previous approach to recursive evaluation of the model in time during the full duration of a discharge. Moreover, we will assume that the model noise (or model residue) is not Gaussian distributed but that it may contain outliers or systematic errors. We will use the model of [9] to estimate the parameters. The resulting algorithm is essentially an iterated weighted least squares algorithm with adaptive estimation of the weight and regularization matrices.
The performance of the presented algorithm will be compared to traditional approaches using simulated and real measurements from a tokamak. We will attempt to provide physical interpretation of the introduced latent variables.

Plasma boundary model
Feedback control of plasma in a tokamak requires on-line reconstruction of the plasma boundary. The quality of the reconstruction is highly significant for tokamak operation safety and performance. The reconstruction of the boundary is done in two steps: i) the spatial distribution of scalar magnetic flux ψ(r, z) is determined from the measurements, ii) the boundary is identified as the largest closed flux isoline. In this paper, we will be primarily concerned with the first step, however the quality of the reconstruction will be judged by output of the second step.
The spatial distribution of the magnetic flux ψ(r, z) can be decomposed into an infinite series of basis functions f i (see [3] for details). Using the assumption of symmetry of the threedimensional plasma column in the toroidal direction, for each point in the vicinity of the plasma with coordinates (r, z) in the poloidal plane the flux distribution can be decomposed into an infinite series: where f i are functions based on Legendre functions with half integer degree [10], closely related to toroidal harmonics, and θ i ∈ R, i ∈ N are multipliers associated with each basis function. For increasing i, the basis functions f i represent higher frequencies of magnetic flux isolines. The expected shape of the boundary is relatively smooth, hence it is expected that only M lower order harmonics will be sufficient for its reconstruction. The magnetic field effects are measured by N magnetic probes around the tokamak chamber. These measurements are transformed by inverse operation into vector y ∈ R N such that where X ∈ R N ×M is a design matrix whose rows are composed of evaluations of basis functions f i (r j , z j ), i = 1, . . . , M, where (r j , z j ) are coordinates of the j-th probe, j = 1, . . . , N, θ = (θ 1 , . . . , θ M ) is a vector of unknown coefficients and e ∈ R N is the vector of noise/residue. When the θ vector is identified, we are able to evaluate ψ(r, z) in an arbitrary point, enabling us to identify the plasma boundary as the largest closed isoline. Illustration of the spatial distribution of the magnetic flux reconstructed from recorded data using least squares solution of (2) for different values of M are displayed in Fig. 1.
Note that the order of decomposition M significantly influences the spatial distribution of the magnetic flux making the task of plasma boundary identification sensitive to it. While the largest closed isoline in the right plot of Fig. 1 can be recognized around the yellow area inside the tokamak chamber, finding such isolines in the remaining two pictures is impossible. Therefore, we seek an optimal selection of the relevant elements of the model (2). For this purpose, we use a statistical model known as the automatic relevance determination (ARD) that identifies the parameters that are relevant and the remaining are set to zero. Using such an approach, we can then construct an ill-posed problem (2) even with M > N and the algorithm should automatically selects the necessary regressors. The selected number M is thus interpreted as an upper bound on the necessary coefficients and when selected sufficiently high, the results should not be sensitive to it.
The proposed algorithm is essentially an iterative solution of the least squares with Tikhonov regularization where the Tikhonov matrix diag(α) is adaptively updated in each iteration. This is in contrast with the classical Tikhonov regularization where the matrix diag(α) is assumed to be known. Although the proposed adaptation of α implies higher computation cost than fixed value of α, the increase is still acceptable and can be computed in real time in a closed control loop.
In this contribution, we extend the previous approach by considering unknown precision of the measurements. While the accuracy of the measuring probes is approximately similar, the error of the model may vary across the probes. Specifically, the inverse transformations required to compute vector y from the measurements is nonlinear and the variance of the model residue e may be time-and space-dependent. Since the exact model of the residue is unknown, we consider the variance of each residue to be unknown and we will estimate it from the data using a variational Bayes approach. We chose a prior model that implies Student's-t distribution of the residue. In effect, the resulting variational Bayes algorithm will solve the weighted least squares with regularization with adaptively estimated weight matrix T in tandem with the Tikhonov matrix diag(α).

Bayesian model selection
Hierarchical priors are a commonly used tool to achieve model structure selection. The conventional least squares formulation of (2) corresponds to maximum likelihood estimation of likelihood function given by multivariate Gaussian probability distribution where β ∈ R is a precision parameter of the model error. The likelihood model alone cannot decide which structure of the model is relevant. This can be achieved by Bayesian treatment with carefully chosen prior distribution on unknown parameters [11]. An extension of the model by the following prior leads to a maximum likelihood solution that is equivalent to Tikhonov regularization (3) where A is the Tikhonov matrix. The choice of diagonal matrix corresponds to the assumption that the elements of θ are apriori independent, each with its own prior precision α i . In our approach, the precision parameters are treated as unknown and are estimated jointly with the parameters θ. Prior distribution of the precisions α i and β is chosen in the form of gamma distribution: Here, values of parameters a 0 , b 0 , c 0 , d 0 are assumed to be known. This leads to a more complicated model but removes the burden of expert selection of the precision parameters. Now, the task is to obtain the posterior distributions of parameters θ, A, β. This is done using the variational Bayes method. It consists of factorizing the joint posterior and then marginalizing it over individual parameters. For details see [12]. The approximate posterior densities are identified to be of forms conjugate to the priors: with equations that govern the shaping parameters given in the appendix. The resulting estimation algorithm is essentially a least squares estimation with ARD adaptation of the regularization (LS-ARD) [8]. The LS-ARD algorithm is guaranteed to converge to a local minimum of the Kullback-Leibler divergence from the estimated to the true posterior. We assume that prior parameters a 0 , b 0 , c 0 and d 0 are chosen to be 10 −10 to yield non-informative prior.

Robust Least Squares using Student's t noise model
To improve the results of plasma reconstruction, we alter the original algorithm from Section 3 by allowing a richer error distribution which can model outliers and systematic errors, as explained in Section 2. Specifically, we replace the Gaussian noise distribution by a Student's tdistribution with unknown degrees of freedom d. For low values of d, this distribution has heavier tails than the Gaussian. The estimation algorithm can thus identify outliers and provide more robust estimate of the parameter of interest.
While prior distributions for θ and A are the same as in the previous model, i.e. (6), (7), the likelihood model (5) is changed to The values of u i (more exactly its square root) represent the weights in a diagonal matrix T of the weighted least square problem (4). The Student's t-distribution of the noise arises from the fact that Again, we use the VB framework to obtain equations for shaping parameters of the posteriors. Their forms arep with shaping parameters in the appendix. The final estimation algorithm -robust least squares with ARD adaptation (RLS-ARD) algorithm -is given in Algorithm 1. This more elaborate structure of the model may provide us with further information on the physical system we are describing as will be discussed in the next section.

Tokamak COMPASS and data from a validated numerical model
Both simulated data and measurements from operation of the tokamak COMPASS will be used in this section. Simulated data were obtained as a fit to real measurements with the EFIT++ [13] plasma reconstruction code. This code evaluates a complex numerical model of magnetohydrodynamic equilibrium which includes the full toroidal flow and anisotropy changes to the Grad-Shafranov equation. The result of the EFIT++ reconstruction is also the plasma boundary, however obtained at high computational cost. Therefore, when using the simulated measurements, we can use the boundary from the EFIT++ as "ground truth" to compare the performance of the proposed algorithm. Because we are mainly interested in the quality of the plasma boundary reconstruction, we use the Hausdorff distance [14] to compare the estimated 2D shape of the plasma boundary to the ground truth. This metric represents the maximum difference between the two boundaries evaluated on discretized angular grid. For the purpose of this paper, all simulated data were generated with the configuration which agrees with the real geometry of the tokamak and measuring devices. The EFIT++ reconstructs the plasma field from measurements at standard sampling period 1ms, which is sufficient for real time shape control. Thus the plasma boundary ground truth is available in a time series. The proposed algorithm is designed to reconstruct the plasma boundary at each sampling time independently. Therefore, we will show results of reconstruction at selected time samples. In each time sample, an iterative algorithm (such as Algorithm 4 for RLS-ARD) will be run until convergence. The relatively slow temporal evolution of the shape can be used to warm start the iterative algorithm by a solution obtained in the previous time instant. This will be important to achieve computational cost permitting real time evaluation.

Model structure selection
First, we demonstrate how the ARD property selects the optimal regularization coefficients for reconstruction of the magnetic flux at a single time instance. We have selected data from two time samples of a discharge and run the RLS-ARD algorithm (Algorithm 4) for n = 100 iterations with initialization α 0i = 100. Convergence of the estimated values of θ i and their corresponding precision parameters (penalization coefficients) α i are displayed in Fig. 2. With increasing precision α i grows prior probability that θ i is zero. This has a direct link to Tikhonov regularization as was hinted in section 2 where α i is the penalization parameter that governs how much the squared value of a regression coefficient is penalized.
For these particular reconstructions, one coefficient out of 20 (i.e. M = 20) in the decomposition (2) seems to have a major contribution. Clearly, its precision/penalization converges to a smaller number than the common starting value of 10 2 while the other coefficients have larger precisions and thus are more penalized. Next, we compare the performance of tested algorithms on full duration of a selected plasma discharge. To understand the results better, we will describe the individual phases of a tokamak plasma discharge, which are also distinguished by shading in Fig. 3. When the discharge starts, the 2D cross section of the plasma column is elliptic. Afterwards, it undergoes a transition which, in an ideal case, ends with the plasma being elongated with a pointed tip (x-point). This part is marked by gray background color at the left side of the discharge figures. When the x-point is established, the plasma rapidly reaches the desired equilibrium state (white background). After that, the plasma collapses back to an elliptic shape before completely vanishing (gray background color on the right side of each plot).

Reconstruction of a full discharge
Since the x-point is a saddle point in the ψ(r, z) contours and it is usually close to the tokamak chamber wall, the identification of the plasma shape is very sensitive to its predicted position. If a saddle point is predicted to be outside of the chamber, even just a few centimeters off its true position, the reconstructed plasma shape is elliptic and very different from the true shape.
In Fig. 3 the MSE of model residue of the linear equation (2) is plotted for the whole plasma discharge for different approaches to regularization and for both simulated and real measurements. For the Tikhonov regularization, the regularization constant α was set to be equal for all regressors. The MSE of the model residue is accompanied by values of the Hausdorff distance D H to the ground truth for each algorithm.
Although the fit using Tikhonov has the lowest value of the residue at the measurement points during the whole discharge, the final boundary is not very close to the ground truth. The remaining methods clearly gain advantage by using stricter regularization and thus achieving a better generalization beyond the measurement points, which is crucial for identifying the boundary. Also, the plasma computed with tested methods may be predicted to have an x-point when the ground truth does not have one or vice versa, which leads to high peaks in the D H metric. Lasso algorithm for l1 optimization yields results very similar to those of LS-ARD or RLS-ARD.  Interestingly, the LS-ARD algorithm does not perform well on simulated data in the transition phases. The value of D H is higher in the transition phases than for the other approaches. Higher deviation from the EFIT boundary in the transition phases means that assumptions for one of the models (EFIT or toroidal harmonics) may be violated. Also, the assumption of non-Gaussian residue of the RLS-ARD model may be more appropriate in the transition phases because of the nonlinear propagation of errors as is demonstrated by the results produced using synthetic data. Apart from the transition phases, the difference of results between LS and RLS-ARD is marginal. This similarity is also apparent from the estimated degrees of freedom of the Student's t-distribution (parameter d) from the RLS-ARD algorithm is displayed in Fig. 4. Note that it decreases during the first transition phases. Smaller values of d can be interpreted as a deviation of the measurement errors from the Gaussian distribution. This behaviour indicates the presence of outliers or a systematic error in the model. However, because we observe the deviation even on simulated data, the most likely explanation is insufficiency in the used model as discussed above. Also, we can track which measurements are suppressed from the fit since The effect of the magnetic flux reconstruction on the shape of the reconstructed plasma boundary is displayed in Fig. 5, where the estimated boundary shapes are compared to the ground truth results at different moments of the discharge. These moments were selected from different stages of the discharge, see dashed lines in Fig. 3. Again, we see that better results are obtained in the stationary phase of a discharge. The value of u i estimates at a given time is illustrated at the corresponding measurement points. Those measurement points with u i <1 are highlighted with a larger circle. This can provide experiment designers with additional information about the precision of measurements.
While the RLS-ARD algorithm appeared to improve results on the simulation data, its improvement on real data is not so significant. Moreover, due to its lower computational cost, the LS-ARD was selected for implementation. The computational cost of this algorithm can be lowered by initialization of the parameters from those obtained in previous time sample (warm starts). The effect of this strategy is displayed in Fig. 6, where the number of iterations needed for convergence of the LS-ARD algorithm are shown for two variants: offline, where the initial guesses of parameters are the same for each time of the discharge, and online, where the parameters estimated in the previous time are used as initialization values for the algorithm. This results in a significant speedup, where only a few iterations are needed.

Evaluation of multiple discharges
A quantitative study of quality of reconstruction was tested on a data set of ten plasma discharges. The relative performance of the tested algorithms remains the same for all discharges. At two discharges, the RLS-ARD algorithm was worse than the LS-ARD in the initial phase of the discharge. We conjecture that this is cause by incorrect suppression of the relevant measurements (i.e. incorrectly classifying them as outliers). Statistical evaluation of the distribution of the mean and median of the Hausdorf distance for all times of the ten discharges is displayed in Fig. 7. From these results, the LS-ARD method seems to provide the most consistent and reliable results.

Conclusion
In this paper, we have described an inverse problem that arises in the field of tokamak plasma reconstruction and the difficulties it presents for classical methods. We have proposed a Bayesian approach to the problem resulting in an algorithm that selects a sparse and robust solution. The proposed method has been shown to provide consistently better results compared to the conventional approach. It has been shown that a model with fewer automatically selected parameters avoids the overfitting problem and thus achieves better reconstruction of the overall shape of the plasma boundary. This has been achieved on both simulated and measured data. The complexity of the computation is intentionally kept low so that the algorithm can be run in real time on existing hardware. The robust algorithm was able to improve quality of reconstruction only at the non-stationary discharge phases in simulation. However, its improvement on real data was marginal and not very reliable. Since the distribution of the error was estimated to be almost Gaussian in the most important stationary regimes, we selected the sparse LS-ARD algorithm for final implementation.