Accelerating the estimation of collisionless energetic particle confinement statistics in stellarators using multifidelity Monte Carlo

Frederick Law; Antoine Cerfon; Benjamin Peherstorfer

doi:10.1088/1741-4326/ac4777

1. Introduction

In generic non-axisymmetric magnetic configurations, collisionless particle orbits are not guaranteed to be confined [26]. Energetic particle confinement therefore is a key point of concern for the design of tokamaks which have significant toroidal ripple and of stellarators [4, 6, 9, 18, 22, 27, 34, 37]. In order to design configurations which maximize energetic particle confinement, one must identify metrics which accurately capture the level and quality of confinement, and are at the same time as easy and as inexpensive to include in design optimization codes as possible.

One can distinguish two distinct approaches to estimating energetic particle confinement. The first approach may be viewed as a direct approach, or Monte Carlo (MC) approach. One launches an ensemble of energetic particles, follows their individual orbits—usually their drift orbits—and then directly calculates the fraction of particles which have been lost and the time of flight of particles before being lost [1, 3, 5, 22, 27, 31, 39, 49, 52, 58]. On the one hand, this method can be considered as physically accurate, since it directly captures the effects on confinement of the wide variety of orbits energetic particles follow in nonaxisymmetric magnetic configurations. On the other hand, the convergence of the MC estimator as a function of the number of particles launched is slow, requiring a large ensemble of particle trajectories for accurate estimations. This means that for guiding center calculations this first method is more time-consuming than most other calculations involved in the design optimization of nonaxisymmetric equilibria, and prohibitively expensive for full-orbit trajectories. Furthermore, this method does not lend itself easily to the computation of derivatives of the metric with respect to magnetic field variations, which implies that it cannot be easily implemented in gradient-based optimization schemes.

The second approach used for estimating the quality of energetic particle confinement is deterministic and relies on proxy functions with closed analytic forms, which can be directly computed from the output of a nonaxisymmetric equilibrium solver, and which have empirically been found to often correlate well with energetic particle confinement. One may for example make the approximation that good energetic particle confinement will be a direct consequence of good low collisionality neoclassical transport properties, and thus characterize the confinement performance through the evaluation of the effective ripple _eff first introduced by Nemov et al [40, 59]. In order to target energetic particle confinement more directly, one may instead rely on another metric recently proposed by Nemov, called the Γ_c metric [4, 41]. Γ_c is better suited for this purpose because it applies to fully collisionless orbits and unlike _eff it is also strongly influenced by particles close to the trapped/passing boundary, which often are the dominant contribution to energetic particle losses [4, 5]. The benefits of these proxy quantities for energetic particle confinement are three folds. First, they are significantly less expensive computationally than MC based solely on expensive, high-fidelity simulations. Second, their closed analytic forms, which were derived from physics principles, enable easier interpretation of the numerical results and identification of trends. Third, it may be easier to compute with high accuracy their derivatives with respect to optimization parameters, as was recently demonstrated [45].

Despite the advantages of the deterministic approach, its application may remain limited to preliminary optimization studies, because the currently used deterministic functions have known limitations in their predictive capabilities [1]. In the study of a vacuum quasi-helically symmetric configuration, it was for example recently found that energetic particle confinement could be anti-correlated with _eff [4]: energetic particle confinement was improved while at the same time the _eff metric increased. The Γ_c metric was found to be reliable in that same study [4], and a strong correlation between alpha particle energy loss and Γ_c in the outer half of the plasma was recently highlighted for a wide variety of stellarator configurations [3]. However, that correlation is not perfect. Stellarators with significantly different Γ_c values can have comparable levels of energy loss, and stellarators with comparable Γ_c values can have significantly different levels of energy loss. It therefore appears that for the goal of implementing reliable and efficient energetic particle confinement calculations in multi-objective optimization codes, it is necessary to include high-fidelity simulations with lower costs than a direct MC approach. This is precisely the purpose of this article: we apply the multifidelity Monte Carlo (MFMC) method [42, 47, 48] to reduce the variance of the MC estimator for energetic particle confinement, and thus reduce the computational cost of such an estimator.

As we will explain in more detail in the main text, the MFMC method is a strategy for variance reduction of MC estimators based on control variates; see, e.g. [23, 38]. In MFMC, the control variates are given by low-fidelity models, i.e. approximations of the high-fidelity physics model one is interested in, which are carefully designed to satisfy two key criteria: they have a high correlation with the high-fidelity model, and they are significantly less expensive to evaluate than the high-fidelity model. These criteria are required to obtain variance reduction as shown in [42, 47, 48]. The MFMC approach is well suited to situations in which scientists rely on the MC method to propagate the uncertainty associated with the initial conditions and with parameters defining the model, as is precisely the case for the confinement of energetic particles in nonaxisymmetric magnetic configurations. The method and some of its close variants have recently been applied for the first time to kinetic plasma models [15, 16, 29]. In [15, 16], the authors demonstrate that physics based reduced models can be effectively used for variance reduction of the MC estimator for the solution of the Boltzmann equation subject to uncertainty, in highly collisional plasmas not subject to a far-field force field. In contrast, in [29], the authors rely on data-driven low-fidelity models to reduce the variance of their MC estimators for quantifying the uncertainty in kinetic microturbulence. To the best of our knowledge, our article represents the first time the MFMC method is applied to the study of energetic particle confinement in stellarators, and stellarator performance in general. As in [29], we rely on a data-driven low-fidelity model for MFMC based variance reduction, since traditional sources of low-fidelity models are inappropriate for our high-fidelity model. For example, our low-fidelity model cannot just utilize simplified physics because our high-fidelity model already is a reduced physics model, i.e. collisionless guiding center trajectories, and we are not aware of simpler physics based models which maintain a reasonably high correlation with the high-fidelity model during the entire energetic particle trajectory. We report that despite the large variety of particle orbits depending on the initial conditions, and despite the chaotic nature of the dynamics, our relatively simple data-driven model—which can be trained with standard tools available in common scientific software packages in Python and Matlab—can lead to significant variance reduction in the MFMC framework, and corresponding speedup.

The structure of the article is as follows. In section 2, we present our guiding center model for collisionless energetic particle trajectories, and briefly discuss why numerical simulations based on this model are intrinsically expensive. In section 3 we give a brief review of the standard MC approach for estimating energetic particle confinement, which we contrast with our new MFMC approach in section 4. We describe the set up for our numerical tests and demonstrate the speed-up obtained with our MFMC method in these tests in section 5. We summarize our work in section 6, where we also identify some of its limitations, and suggest corresponding directions for future work.

2. Energetic particle dynamics

The main purpose of our article is to demonstrate in a realistic setting the potential of the MFMC method to reduce the variance of the commonly used MC estimators for energetic particle confinement, and thus to speed up the evaluation of these estimators for optimization applications. For simplicity, we will consider MC estimators based on collisionless orbits, as is still often done in stellarator optimization studies [4, 5, 14, 17, 27]. Our MFMC approach will be directly applicable to MC estimators based on single-particle orbits including collisions as well [27, 31, 49]. In this section, we begin by detailing the equations of motion we solve to compute the trajectories of energetic particles. We then highlight certain features of the model which underscore the expensive nature behind simulating particle trajectories.

2.1. Guiding center trajectories

We consider the collisionless dynamics of 3.5 MeV alpha particles as the byproduct of deuterium–tritium fusion in a stellarator magnetic field. We assume the magnetic configuration has nested flux surfaces which enables us to work in flux coordinates: (s, θ, ζ) are the coordinates of our flux coordinate system, where s ∈ [0, 1] is the normalized flux surface label, θ is a poloidal angle, and ζ is a toroidal angle. In this coordinate system, we identify s = 0 with the magnetic axis and s = 1 with the last closed flux surface.

We model the dynamics of energetic particles using the guiding center equations [7, 11, 33] which arise from the asymptotic decoupling of the fast gyromotion from the full-orbit equations given by the Lorentz force. Our multi-fidelity MC approach is in principle also applicable to MC estimators based on full orbits of energetic particles. However, simulating such orbits on the desired long α-particle slowing down time [34, 57] remains prohibitively expensive for optimization studies on existing computing architectures. To highlight the immediate relevance of our work for tokamak and stellarator optimization studies, we thus chose to focus on drift orbits. The guiding center equations can be expressed in terms of a four-dimensional system of coupled ordinary differential equations (ODEs) for (r, v_||), where r is the position of the guiding center and v_|| is the parallel velocity of the energetic particle. In flux coordinates (s, θ, ζ), the guiding center equations take the form

$\begin{equation*}\frac{\mathrm{d}s}{\mathrm{d}t}=\frac{1}{{\Omega}\sqrt{g}}\left[-\mu \frac{{B}_{\zeta }}{B}\frac{\partial B}{\partial \theta }+\mu \frac{{B}_{\theta }}{B}\frac{\partial B}{\partial \zeta }+\frac{{v}_{{\Vert}}^{2}}{B}\frac{\partial {B}_{\zeta }}{\partial \theta }\right.\end{equation*}$

$\begin{equation}\quad \left.-\frac{{v}_{{\Vert}}^{2}{B}_{\zeta }}{{B}^{2}}\frac{\partial B}{\partial \theta }-\frac{{v}_{{\Vert}}^{2}}{B}\frac{\partial {B}_{\theta }}{\partial \zeta }+\frac{{v}_{{\Vert}}^{2}{B}_{\theta }}{{B}^{2}}\frac{\partial B}{\partial \zeta }\right]\end{equation} \tag{ 1 }$

$\begin{equation*}\frac{\mathrm{d}\theta }{\mathrm{d}t}=\frac{{v}_{{\Vert}}{B}^{\theta }}{B}+\frac{1}{{\Omega}\sqrt{g}}\left[\mu \frac{{B}_{\zeta }}{B}\frac{\partial B}{\partial s}-\mu \frac{{B}_{s}}{B}\frac{\partial B}{\partial \zeta }-\frac{{v}_{{\Vert}}^{2}}{B}\frac{\partial {B}_{\zeta }}{\partial s}\right.\end{equation*}$

$\begin{equation}\quad \left.+\frac{{v}_{{\Vert}}^{2}{B}_{\zeta }}{{B}^{2}}\frac{\partial B}{\partial s}+\frac{{v}_{{\Vert}}^{2}}{B}\frac{\partial {B}_{s}}{\partial \zeta }-\frac{{v}_{{\Vert}}^{2}{B}_{s}}{{B}^{2}}\frac{\partial B}{\partial \zeta }\right]\end{equation} \tag{ 2 }$

$\begin{equation*}\frac{\mathrm{d}\zeta }{\mathrm{d}t}=\frac{{v}_{{\Vert}}^{2}{B}^{\zeta }}{B}+\frac{1}{{\Omega}\sqrt{g}}\left[-\mu \frac{{B}_{\theta }}{B}\frac{\partial B}{\partial s}+\mu \frac{{B}_{s}}{B}\frac{\partial B}{\partial \theta }+\frac{{v}_{{\Vert}}^{2}}{B}\frac{\partial {B}_{\theta }}{\partial s}\right.\end{equation*}$

$\begin{equation}\quad \left.-\frac{{v}_{{\Vert}}^{2}{B}_{\theta }}{{B}^{2}}\frac{\partial B}{\partial s}-\frac{{v}_{{\Vert}}^{2}}{B}\frac{\partial {B}_{s}}{\partial \theta }+\frac{{v}_{{\Vert}}^{2}{B}_{s}}{{B}^{2}}\frac{\partial B}{\partial \theta }\right]\end{equation} \tag{ 3 }$

$\begin{equation*}\frac{\mathrm{d}{v}_{{\Vert}}}{\mathrm{d}t}=-\frac{\mu {v}_{{\Vert}}}{B{\Omega}\sqrt{g}}\left[\frac{\partial B}{\partial s}\frac{\partial {B}_{\zeta }}{\partial \theta }-\frac{\partial B}{\partial s}\frac{\partial {B}_{\theta }}{\partial \zeta }-\frac{\partial B}{\partial \theta }\frac{\partial {B}_{\zeta }}{\partial s}\right.\end{equation*}$

$\begin{equation*}\quad \left.+\frac{\partial B}{\partial \theta }\frac{\partial {B}_{s}}{\partial \zeta }+\frac{\partial B}{\partial \zeta }\frac{\partial {B}_{\theta }}{\partial s}-\frac{\partial B}{\partial \zeta }\frac{\partial {B}_{s}}{\partial \theta }\right]\end{equation*}$

$\begin{equation}\quad -\frac{\mu }{B}\left({B}^{\theta }\frac{\partial B}{\partial \theta }+{B}^{\zeta }\frac{\partial B}{\partial \zeta }\right)\end{equation} \tag{ 4 }$

in the domain $U\times \mathbb{R}$ where $U\subset {\mathbb{R}}^{3}$ is the stellarator volume. In flux coordinates, U is represented by [0, 1] × [0, 2π] × [0, 2π], where {s = 0} × [0, 2π] × [0, 2π] is the simple closed curve corresponding to the magnetic axis. The terms (B_s, B_θ, B_ζ) and (B^θ, B^ζ) are the covariant and contravariant components of the magnetic field B(r), B = |B(r)|, and $\sqrt{g}=\mathrm{det}\left(\frac{\partial \mathbf{r}}{\partial (s,\theta ,\zeta )}\right)$ is the Jacobian of the transformation from flux coordinates to Cartesian coordinates. Additionally, $\mu ={v}_{\perp }^{2}/2B$ is the magnetic moment, which is conserved for individual particles. For each particle it is chosen so that the total initial kinetic energy matches that of a 3.5 MeV particle. Moreover, Ω = 2eB₀/m is a characteristic gyrofrequency for the alpha particles within the stellarator. Here B₀ represents a characteristic field strength, which we take to be the field strength at s = 0, θ = 0, ζ = 0. Given initial conditions, we solve equations (1)–(4) up to T_final, and consider a particle confined if s(t) < 1 for t < T_final and lost otherwise.

Note that for stellarators, the functions ${B}_{s},{B}_{\theta },{B}_{\zeta },{B}^{\theta },{B}^{\zeta },B,\;\text{and}\;\sqrt{g}$ appearing on the right-hand side of (1)–(4) are often only accessible as numerical outputs of a 3D field solver (vacuum or finite pressure magnetohydrodynamics (MHD) solve) [28, 35, 53]. They are typically represented by a double Fourier representation in (θ, ζ) with an interpolant or spline in s for the coefficients, and are accessed in the form:

$\begin{align}\hfill f(s,\theta ,\zeta )& =\sum\limits _{m,n}{f}_{mn}(s)\mathrm{cos}(m\theta -n\zeta )\quad \text{or}\hfill \\ \hfill f(s,\theta ,\zeta )& =\sum\limits _{m,n}{f}_{mn}(s)\mathrm{s}\mathrm{i}\mathrm{n}(m\theta -n\zeta ).\hfill \end{align} \tag{ 5 }$

2.2. High expense of simulating energetic particle dynamics

Considering equations (1)–(4), one may at first wonder why MC estimators need to be improved. Indeed, equations (1)–(4) 'only' is a four-dimensional system of coupled ODEs, and since the equations are not modified by the presence of the other energetic particles considered in the MC analysis, the calculations are embarrassingly parallelizable. It would be natural to conclude that the slow convergence of MC estimators can be easily offset by the relative ease of brute-force computations. We briefly review here why this is not the case, and why direct computations of energetic particle losses remain too expensive for many design optimization applications. The reasons are intrinsically tied to the nature of the motion of energetic particles in non-axisymmetric stellarator magnetic fields, and to the vastly different time scales of interest.

As we discussed previously, energetic particle confinement analyses require the simulation of their dynamics for a time at least as long as their thermalization time [4, 32], after which their orbits become comparable to those of the background thermal ions. We note that thermalization is due to collisions, which are absent in collisionless guiding center models for the study of alpha particle confinement in stellarators, such as the one we consider here. However, even for collisionless models, thermalization time is still often used as a characteristic time scale of interest to evaluate the confinement properties of a given magnetic configuration. This energetic particle thermalization time, corresponding to tens of thousands of toroidal transits for passing particles [36], and thousands of bounces for trapped particles, is significantly longer than the characteristic time scales of their single-particle dynamics. Because of the fast cross-field drift of the energetic particles and the high sensitivity of the orbits to local variations of the magnetic field, as illustrated in figure 1, the motion on these short time scales needs to be resolved with high accuracy. One is therefore faced with an intrinsically multi-scale problem, which is made even more difficult by the chaotic nature of some of the particle trajectories [1]. Numerically, this means that short time steps must be chosen when integrating equations (1)–(4) to accurately resolve the dynamics on the short time scales. At the same time, high order integration schemes must be implemented, in order to prevent the numerical error from accumulating on the long thermalization time scale. The design of efficient and inexpensive numerical schemes satisfying this stringent criteria remains an open problem in computational science. Progress has been made for certain classes of multi-scale problems [19, 20, 61], but the potential of these methods to speed-up guiding center calculations for energetic particles has not been proven. To this day, integrating equations (1)–(4) with high accuracy on the long thermalization time scale for a single energetic particle remains an intrinsically expensive problem.

**Figure 1.** 3D trajectories and flux label vs time for trapped (top) and passing (bottom) particles launched from the same location. The flux label for trapped and passing particles exhibit significantly different temporal and spatial scales.
Download figure:
Standard image High-resolution image

3. Estimating energetic particle losses with standard Monte Carlo

In this section, we first explain how we mathematically treat the uncertainty concerning the initial conditions required to compute the single particle motion according to equations (1)–(4), corresponding to the uncertainty in the location and velocity of the energetic particles at birth. We then present three different quantities we consider in this article to characterize energetic particle confinement. The first quantity is the energetic particle loss fraction, which is the standard figure of merit in optimization studies. The second and third quantities are the mean loss time and the mean flux surface position, which we motivate physically below. Finally, we briefly review the direct approach based on the standard MC method for estimating these quantities.

3.1. Uncertain initial conditions for guiding center dynamics

The energetic particles we consider are alpha particles, born as byproducts of deuterium–tritium fusion reactions. There is intrinsic uncertainty in where and in what direction the particles are emitted, as well as in their kinetic energy at birth. This manifests as uncertainty in the initial conditions needed to solve the guiding center equations (1)–(4).

In this work, we follow the standard practice of assuming that all alpha particles are born with the same kinetic energy of 3.5 MeV [1, 4, 27]. This is a simplification of the actual alpha particle birth process. For accurate alpha particle confinement results, one should in principle account for the energy spread in the alpha birth energy, due to the kinematics of the collision process [2, 8, 25, 62]. As we make this simplifying assumption, only two sources of uncertainty remain: the location of birth of alpha particles, and the velocity direction at birth. We model this uncertainty by imposing probability distributions on the initial position R₀ and initial parallel velocity V₀. Specifically, we model alpha particles as being born uniformly in the stellarator volume U:

$\begin{equation}{\mathbf{R}}_{0}\sim \text{Uniform}\,(U).\end{equation} \tag{ 6 }$

We choose this model for its simplicity. It is physically relevant for flat density and temperature profiles, with a steep pedestal at the edge. For standard density and temperature profiles peaked on axis, our model overestimates the production of alpha particles at large radii. Consequently, the fraction of lost alpha particles in our setup is likely higher than those sought in practice. The method based on MFMC that we present below can be adapted to any choice of probability distribution for R₀, and we will discuss later in this article the extent to which the numerical results we present could be affected by the choice of distribution. For the emission direction, we use the convention that particles are born isotropically in velocity space [1, 4]:

$\begin{equation}{V}_{0}\sim \text{Uniform}\,(-{V}_{\mathrm{m}\mathrm{a}\mathrm{x}},{V}_{\mathrm{m}\mathrm{a}\mathrm{x}}),\end{equation} \tag{ 7 }$

where V_max is the velocity of a particle with a kinetic energy of 3.5 MeV.

3.2. Metrics of energetic particle confinement

In order to estimate energetic particle losses, we must propagate the initial uncertainty through the guiding center dynamics to classify which initial conditions lead to lost particles. Let $\mathbb{E}$ denote the expectation under R₀ and V₀. Particle loss is thus measured by $\mathbb{E}\left[{F}^{\text{lost}}\left({\mathbf{R}}_{0},{V}_{0}\right)\right]$ where

$\begin{equation}{F}^{\text{lost}}\,({\mathbf{R}}_{0},{V}_{0}){:=}\begin{cases}1,\quad \mathrm{inf}\left\{t\,:\,s(t;{\mathbf{R}}_{0},{V}_{0})=1\right\}< {T}_{\text{final}}\quad \hfill \\ 0,\quad \text{otherwise}\quad \hfill \end{cases}\end{equation} \tag{ 8 }$

and s(t; R₀, V₀) comes from solving equations (1)–(4) using the initial conditions (R₀, V₀). Details on how R₀ is converted to flux coordinates are discussed later in section 5. Since the dynamics are deterministic, we can interpret F^lost(R₀, V₀) as a Bernoulli random variable which classifies whether or not a particle crosses the last closed flux surface before t = T_final, based on where and in what direction it is born.

F^lost often is the key figure of merit considered in alpha particle confinement studies [1, 4, 27, 31, 34, 36, 37]. It corresponds to a discrete random variable. To demonstrate the applicability of our method to figures of merit corresponding to continuous random variables, we consider two other metrics which may be used for design optimization, and provide insights on a stellarator's efficacy in confining energetic particles. The first quantity of interest is the average time of loss of energetic particles. Its value for design optimization and performance analysis can be explained as follows. First, prompt losses, corresponding to alpha particles lost after a few transits or bounces, before they have significantly slowed down, lead to more damage to the first wall than slower losses. Second, alpha particles need to be confined for long enough that they heat the plasma as they slow down. Given two magnetic configurations with the same loss fraction after an alpha particle slowing down time, the more desirable one is the one with the longer mean time of loss. This figure of merit is also relevant for assessing the validity of the assumptions and models used for the analysis. If the order of magnitude of the mean time of loss is comparable or only slightly smaller than the alpha particle slowing down time, a large number of alpha particles remain confined long enough to be thermalized. Results obtained by simulating mono-energetic collisionless guiding center trajectories, as often done [1, 4, 27] and as we do for illustrative purposes in this work, should then be confirmed with codes including more detailed physics, and collisions in particular [18, 49]. As we consider this figure of merit, we however have to deal with a mathematical difficulty, associated with the fact that for fully confined alpha particles, the time of loss is infinity. We circumvent this difficulty by defining a modified loss time, such that all particles are considered 'lost' by T_final:

$\begin{equation}{F}^{\text{time}}({\mathbf{R}}_{0},{V}_{0}){:=}\mathrm{m}\mathrm{i}\mathrm{n}\left(\mathrm{i}\mathrm{n}\mathrm{f}\left\{t:s(t;{\mathbf{R}}_{0},{V}_{0})=1\right\},{T}_{\text{final}}\right).\end{equation} \tag{ 9 }$

Our second physical quantity of interest associated with a continuous random variable is the mean flux surface location of energetic particles. This quantity is not directly related to alpha particle loss, but is a complementary measure of the quality of energetic particle confinement. It provides physical insights on why certain configurations have lower losses, as well as information on the location of alpha power deposition. We will thus consider the quantity:

$\begin{equation}{F}^{\text{flux}}(t,{\mathbf{R}}_{0},{V}_{0}){:=}s(t;{\mathbf{R}}_{0},{V}_{0}).\end{equation} \tag{ 10 }$

Therefore the mean modified loss time $\mathbb{E}\left[{F}^{\text{time}}\left({\mathbf{R}}_{0},{V}_{0}\right)\right]$ and mean normalized flux surface label $\mathbb{E}\left[{F}^{\text{flux}}\left(t,{\mathbf{R}}_{0},{V}_{0}\right)\right]$ serve as alternative statistical metrics for a stellarator's effectiveness in confining energetic particles in our study. For the remainder of this article, we will take F to represent either F^lost, F^time, or F^flux at some fixed time t.

3.3. Uncertainty propagation using Monte Carlo

Now that we have identified energetic particle loss, along with two other metrics for energetic particle confinement, as an expectation of a function F(R₀, V₀), it remains to accurately estimate that expectation. A direct method for this is the MC estimator, whereby we draw p i.i.d. samples ${\left\{({\mathbf{R}}_{0}^{(i)},{V}_{0}^{(i)})\right\}}_{i=1}^{p}$ from (6) and (7) and then approximate $\mathbb{E}[F]$ by the unbiased estimator

$\begin{equation}{\hat{F}}_{\text{MC},p}{:=}\frac{1}{p}\sum\limits _{i=1}^{p}F({\mathbf{R}}_{0}^{(i)},{V}_{0}^{(i)}).\end{equation} \tag{ 11 }$

This standard MC approach is summarized in method 3.1. Recall that the variance of a random variable F is defined by $\mathrm{Var}(F)=\mathbb{E}[{\left(F-\mathbb{E}[F]\right)}^{2}]$ . Since the MC estimator has variance Var(F)/p, it converges at the slow rate of $1/\sqrt{p}$ with respect to the root mean square error (RMSE). In order to compute the MC estimator, we need to solve the guiding center equations (1)–(4) a total of p times and then evaluate F using those trajectories. Due to the high computational cost of simulating guiding center trajectories of energetic particles over the appropriate time scale, as explained in section 2.2, using the MC estimator can rapidly become computationally intractable for large p.

The large cost required to produce an accurate MC estimator motivates us to aim for an estimator with lower variance via variance reduction. Such a procedure would lower the constant in front of $1/\sqrt{p}$ , thereby yielding a more accurate estimator for the same computational effort.

4. Estimating energetic particle confinement with multifidelity Monte Carlo

Here, we first describe how one can leverage a surrogate model to construct a control variate based multifidelity estimator with smaller variance than the standard MC estimator. We then discuss an approach to construct highly correlated surrogate models for the analysis of energetic particle confinement.

4.1. Variance reduction with multifidelity Monte Carlo

Alongside the high-fidelity model F, which requires expensive numerical integration to evaluate, suppose we also have a low-fidelity, or surrogate, model G. In order to leverage G for variance reduction we utilize a control variate based MFMC method [48]. The MFMC estimator with computational budget p is the unbiased estimator

$\begin{equation}{\hat{F}}_{\text{MFMC},p}{:=}{\hat{F}}_{\text{MC},n}+\alpha \left({\hat{G}}_{\text{MC},m}-{\hat{G}}_{\text{MC},n}\right),\end{equation} \tag{ 12 }$

where α is a constant, n is the number of high-fidelity samples, and m is the number of low-fidelity samples. To minimize the variance of the MFMC estimator, α, n, m can be optimally chosen, and those choices satisfy

$\begin{align}\hfill \alpha & =\rho (F,G)\sqrt{\frac{\mathrm{Var}(F)}{\mathrm{Var}(G)}},\hfill \\ \hfill p& =n+\frac{m}{w},\hfill \\ \hfill m& =n\sqrt{\frac{w\rho {(F,G)}^{2}}{1-\rho {(F,G)}^{2}}}\hfill \end{align} \tag{ 13 }$

where ρ(F, G) is the Pearson's correlation coefficient between the high-fidelity F and the surrogate G, which is defined by

$\begin{align}\hfill \rho (F,G)& {:=}\frac{\mathrm{Cov}(F,G)}{\sqrt{\mathrm{Var}(F)}\sqrt{\mathrm{Var}(G)}}\hfill \\ \hfill & =\frac{\mathbb{E}[(F-\mathbb{E}[F])(G-\mathbb{E}[G])]}{\sqrt{\mathrm{Var}(F)}\sqrt{\mathrm{Var}(G)}}\hfill \end{align} \tag{ 14 }$

and w is the ratio of cost of evaluating F to the cost of evaluating G. A practical implementation of this method is summarized in method 4.1. The variance of the resulting optimal MFMC estimator is given by

$\begin{align}\hfill \mathrm{Var}\left({\hat{F}}_{\text{MFMC},p}\right)=\left[{\left(\sqrt{1-\rho {(F,G)}^{2}}+\sqrt{\frac{\rho {(F,G)}^{2}}{w}}\right)}^{2}\right]\\ \qquad \times \,\mathrm{V}\mathrm{a}\mathrm{r}\left({\hat{F}}_{\text{MC},p}\right).\hfill \end{align} \tag{ 15 }$

In order to maximize the amount of variance reduction MFMC provides compared to MC, the bracketed term in (15) should be as small as possible. This means that we want the surrogate G to be both highly correlated to F, as well as much cheaper to evaluate than F. Although the MFMC estimator still converges at the same slow rate of $1/\sqrt{p}$ as the standard MC estimator, it can be more accurate with the same computational budget provided a sufficiently correlated and inexpensive surrogate.

Note that once one is given a surrogate G, the amount of speedup provided by MFMC is constant. That is, the construction of G is done offline, separate from the online phase of estimation. No matter how expensive the construction of G is (which could require many samples of F for instance), once we have that surrogate, the speedup from MFMC is guaranteed no matter how large p is taken to be; see [46] for an extension of MFMC that takes offline costs into account.

In the MFMC literature, the cost of sampling the initial uncertainty is not taken into account, as it is typically assumed to be orders of magnitude cheaper than everything else being considered. In practice, if sampling the input uncertainty requires non-trivial cost, then a better definition of the cost ratio w would be required as the current definition only accounts for the cost of propagating the uncertainty forward. While we do not account for this cost of input uncertainty to make a fair budget comparison in our MFMC experiments, we mention here the limits on implementation imposed by how the input uncertainty is sampled.

4.2. Choice of surrogate model

We now detail our choice of a data-fit surrogate model G designed to maximize correlation with F for our estimation of energetic particle confinement. We rely on data-fit surrogates because traditional sources of surrogates are not appropriate for energetic particle dynamics. For example, there is no natural hierarchy of simplified physics models to leverage [47], since the guiding center equations are already a simplification of the computationally intractable full-orbit dynamics, and a 'beads on the wire' model for the particle trajectories, which would omit all finite-gyroradius corrections, as in the kinetic MHD model [10, 12, 21, 24, 30, 50, 51, 55], would not lead to good correlation since it neglects the large cross-field drifts of energetic particles. Similarly, we have empirically found that another usually powerful source of surrogates, arising from successively coarsening the grid [13], is not satisfactory either for our application. That is because the time step of the numerical ODE integrator is tightly constrained by the requirement to resolve the shortest characteristic time scales of the motion of both trapped and passing particles. When this time step is significantly increased, accuracy is significantly reduced, and correlation rapidly decreases. Lastly, since the problem is transport dominated, popular surrogates from traditional model reduction, such as proper orthogonal decomposition or reduced basis methods, are a poor choice [44].

For a data-fit surrogate, we consider a linear basis function model of the form:

$\begin{equation}{G}^{n}({\mathbf{R}}_{0},{V}_{0};{\mathbf{P}}^{n})=\sum\limits _{k=0}^{n}{P}_{k}^{n}{\phi }_{k}({\mathbf{R}}_{0},{V}_{0}),\end{equation} \tag{ 16 }$

where ${\mathbf{P}}^{n}\in {\mathbb{R}}^{n+1}$ and ${\left\{{\phi }_{k}\right\}}_{k=0}^{n}$ is any set of basis functions. Using this family of surrogates, we want to find the set of parameters Pⁿ which maximizes the correlation with our high-fidelity model. This amounts to solving

$\begin{equation}\underset{{\mathbf{P}}^{n}}{\text{maximize}}\,\rho (F,{G}^{n}(\cdot \,;{\mathbf{P}}^{n})).\end{equation} \tag{ 17 }$

However, a caveat of this formulation is that the optimization problem (17) has infinitely many maxima through scaling, since our model is linear, ρ(F, Gⁿ(⋅; c Pⁿ)) = ρ(F, cGⁿ(⋅; Pⁿ)) = ρ(F, Gⁿ(⋅; Pⁿ)) for any c > 0.

It thus is better to consider ${\mathbf{P}}_{\ast }^{n}$ , which solves the following optimization problem (18):

$\begin{equation}\underset{{\mathbf{P}}^{n}}{\text{minimize}}\,\mathbb{E}[{(F-{G}^{n}(\cdot \,;{\mathbf{P}}^{n}))}^{2}].\end{equation} \tag{ 18 }$

Due to the linear nature of our family of surrogate models, the unique solution ${\mathbf{P}}_{\ast }^{n}$ of (18) is also one of the non-unique solutions for (17).

In practice, evaluating the objective function in (18) is as challenging as the original problem motivating our work, since it involves evaluating expectations of F. Finding the exact minimum ${\mathbf{P}}_{\ast }^{n}$ is therefore a computationally intractable problem, regardless of the choice of basis functions ${\left\{{\phi }_{k}\right\}}_{k=0}^{n}$ . We now explain how we address this difficulty and construct a satisfactory approximation of ${\mathbf{P}}_{\ast }^{n}$ for the particular choice of basis functions ${\left\{{\phi }_{k}\right\}}_{k=0}^{n}$ we make for this study, namely piece-wise linear functions. The idea is to temporarily change the interpretation of F, and view it as a deterministic map from the four-dimensional space (s, θ, ζ, v_||) to $\mathbb{R}$ . First, in (s, θ, ζ, v_||) coordinates, we construct a regular mesh on [0, 1] × [0, 2π] × [0, π/2] × [−V_max, V_max], and evaluate F on that mesh. Then we specifically choose our basis functions {ϕ_k} to be the hat functions on the mesh, so that our family of linear basis functions Gⁿ is the family of piece-wise linear functions on the mesh. If F is a continuous map, and we determine ${\tilde{\mathbf{P}}}^{n}$ so that ${G}^{n}(\cdot ,{\tilde{\mathbf{P}}}^{n})$ is the corresponding interpolant of F on the mesh, then ${\tilde{\mathbf{P}}}^{n}$ will be a close approximation to ${\mathbf{P}}_{\ast }^{n}$ . The reasoning behind this is that if F is continuous, as we refine the mesh then ${G}^{n}(\cdot \,;{\mathbf{P}}_{\ast }^{n})$ , where ${\mathbf{P}}_{\ast }^{n}$ solves (18) on successively finer meshes, will converge to F. Since ${G}^{n}(\cdot \,;{\tilde{\mathbf{P}}}^{n})$ also converges to F as the mesh is refined, we expect ${\tilde{\mathbf{P}}}^{n}$ to be a strong approximation to ${\mathbf{P}}_{\ast }^{n}$ , converging as the mesh refines. Although we cannot measure how close ${\tilde{\mathbf{P}}}^{n}$ is to ${\mathbf{P}}_{\ast }^{n}$ , in practice we have observed that ${G}^{n}(\cdot ,{\tilde{\mathbf{P}}}^{n})$ is favorably correlated with F, even on a relatively coarse mesh.

5. Numerical results

We now conduct numerical tests to determine the level of variance reduction and the speedup one can obtain in practice with our MFMC approach in a realistic stellarator configuration. We first provide details for the setup of our numerical experiments. Then we demonstrate the effectiveness of MFMC for confinement studies done at high resolution, with a time step equal to the characteristic gyroperiod of alpha particles in our magnetic equilibrium. Since the time step is small, we cannot track particle orbits beyond T_final = 0.1 ms with the computing resources available to us. This is well below the typical alpha particle slowing down time, which is approximately 200 ms. In terms of confinement, the results of this analysis are mostly dominated by prompt particle losses. In the final part of this section, we decrease the time resolution of our high-fidelity numerical integrator, in order to compute particle trajectories up to T_final = 20 ms. While this remains below the typical alpha particle slowing down time, it is long enough to include more varied particle loss channels, and thus provide realistic estimates of the effectiveness of MFMC for alpha particle confinement studies.

5.1. Numerical setup

For our high-fidelity solver, we use a Runge–Kutta 4 (RK4) time integrator for the guiding center equations. The terms ${B}_{s},{B}_{\theta },{B}_{\zeta },{B}^{\theta },{B}^{\zeta },\sqrt{g}$ are provided as numerical outputs of a VMEC solve [28]. These functions are given by the double Fourier representation (5) using 128 pairs (m, n) in the (θ, ζ) variables, where only cosine or sine series arise due to stellarator symmetry. For each (m, n) pair, f_mn(s) is represented using a smoothing spline over 101 equispaced points in [0, 1]. We perform all tests on a Wistell-A vacuum magnetic configuration. Wistell-A is a quasihelically symmetric stellarator with four-fold stellarator symmetry [5]. For the time step size in our RK4 integrator, we choose Δτ = 2π/Ω which is the characteristic gyroperiod of alpha particles in the Wistell-A magnetic field. This step size allows us to accurately resolve the guiding center dynamics, but limits us in how large T_final can be. We note that the gyroperiod is in principle not a characteristic time scale of guiding center motion. It may thus seem more natural to consider the transit time as the characteristic time scale, i.e. the time for an alpha particle to transit a field period. However, as our results below will show, for the Wistell-A configuration we study in this work, accurate results require the time step to be of the order of the gyroperiod.

We shall refer to our low-fidelity models as G^lost, G^time, and G^flux for the quantities of interest F^lost, F^time, and F^flux, respectively. To track the flux label in time, we specify N_flux time points t_k = (time step size) × N_skip k for k = 1, ..., N_flux. Note that N_flux and N_skip change depending on the time step size, and are adjusted in order to get to the desired T_final. The reasoning behind recording the flux label every N_skip steps is that for any practical value of T_final, tracking the trajectory at every step requires an unreasonably large number of specified time points. For different time step sizes, we choose N_skip in an attempt to make the specified time points similar to one another. For example, if the time step size is halved, then N_skip would be doubled. If the time step size is divided by three, then N_skip would be tripled, and N_flux adjusted as well. Thus, for an input (R₀, V₀), our high-fidelity flux label model outputs ${\left\{{F}^{\text{flux}}({t}_{k},{\mathbf{R}}_{0},{V}_{0})\right\}}_{k=1}^{{N}_{\text{flux}}}$ . Our low-fidelity flux label model is then ${\left\{{G}_{k}^{\text{flux}}({\mathbf{R}}_{0},{V}_{0})\right\}}_{k=1}^{{N}_{\text{flux}}}$ where ${G}_{k}^{\text{flux}}$ is the piece-wise linear interpolant to F^flux(t_k, ⋅) over a regular mesh in (s, θ, ζ, v_||) as detailed in section 4.2. Similarly G^time is the piece-wise linear interpolant to F^time over the same mesh. For all interpolants we use 25 points in each dimension. For G^lost we specify an optimal threshold using G^time. Namely, we set

$\begin{equation}{G}^{\text{lost}}({\mathbf{R}}_{0},{V}_{0}){:=}\begin{cases}1,\quad {G}^{\text{time}}({\mathbf{R}}_{0},{V}_{0})\leqslant {T}_{\text{threshold}}\quad \hfill \\ 0,\quad \text{otherwise},\quad \hfill \end{cases}\end{equation} \tag{ 19 }$

where T_threshold is chosen to minimize the expected misclassification $\mathbb{E}\left[{\left({F}^{\text{lost}}-{G}^{\text{lost}}\right)}^{2}\right]$ . To compute T_threshold, we use a sample average approximation of the expected misclassification using 10⁴ samples, and then choose T_threshold to be the minimizer of that sample average approximation evaluated at 10⁴ equispaced points between 0 and T_final. Note that in our numerical experiments, the time T_threshold is found to be comparable to T_final.

To sample R₀ ∼ U in magnetic coordinates, we take advantage of the four-fold stellarator symmetry in Wistell-A and sample (s₀, θ₀, ζ₀) from the probability density proportional to $\sqrt{g}$ over D = (0, 1) × (0, 2π) × (0, π/2). This is done by numerically integrating $\sqrt{g}$ over D to compute the normalization constant, and then performing rejection sampling with a uniform proposal. This step is relatively fast and requires on average only three rejections before a successful draw from the target distribution.

The costs of evaluating F^lost, F^time, and F^flux are all the same: the time required to numerically integrate to T_final using a given time step size. The cost per time step (which is roughly 4× the cost per right-hand side evaluation (1)–(4) in our implementation) is on the order of 10⁻³ s. For the values of T_final considered in this work, the total time is on the order of tens of minutes. For longer integration to T_final = 200 ms, this would be on the order of hours. The measured runtime is reported in table 1. In our experiments, constructing a surrogate G from given training data is on the order of milliseconds and thus very low because only a spline interpolant has to be built. If training data are available from previous simulations, then they can be directly used to construct a surrogate. If training data need to be generated first, then this can incur one-time high costs but more sophisticated surrogate models than the interpolants used here can help to reduce these one-time costs. We discuss this important point in more detail in section 6. The costs of evaluating G^lost and G^time are the same, but the cost of evaluating G^flux is slightly larger since it requires evaluating N_flux interpolants. In table 1 we report the costs of evaluating the high-fidelity model, the cost ratios for G^lost, G^time, and the cost ratios for G^flux across multiple time step sizes and values of T_final. We note that for T_final = 20 ms the low-fidelity models are all several millions of times faster than the high-fidelity models.

Table 1. Costs for the high-fidelity model in seconds, cost ratios, and N_flux and N_skip used for various time step sizes and integration lengths. We see that for T_final = 20 ms our low-fidelity models are several millions of times faster than the high-fidelity evaluation. Only the values for (Δτ, 0.1) were measured, all other columns result from scaling.

(Time step size, T_final (ms))	(Δτ, 0.1)	(3Δτ, 20)	(5Δτ, 20)	(7Δτ,20)	(10Δτ, 20)
High-fidelity cost (s)	20	1333	800	571	400
Cost ratio, G^lost and G^time	2 × 10⁶	1.33 × 10⁸	8 × 10⁷	5.71 × 10⁷	4 × 10⁷
Cost ratio, G^flux	6 × 10⁴	4 × 10⁶	3.6 × 10⁶	1.7 × 10⁶	1.2 × 10⁶
N_flux	684	760	684	978	684
N_skip	100	300	200	100	100

In table 2 we report the correlations between our high and low-fidelity models for each of the quantities of interest across multiple time step sizes and values of T_final. We measure the correlation coefficient using 10⁴ samples.

Table 2. Correlations between the high-fidelity and low-fidelity models for each quantity of interest using different time step sizes and T_final. For the normalized flux label we report both the mean correlation over time points as well as the correlation at T_final. Correlations are measured using 10⁴ samples.

(Time step size, T_final (ms))	(Δτ, 0.1)	(3Δτ, 20)	(5Δτ, 20)	(7Δτ,20)	(10Δτ, 20)
Classification of lost particles	0.8776	0.9076	0.8996	0.8762	0.8460
Modified loss time	0.9444	0.9299	0.9426	0.9436	0.9247
Normalized flux label (average)	0.9846	0.9538	0.9517	0.9403	0.9111
Normalized flux label (final time)	0.9761	0.9486	0.9452	0.9270	0.8833

5.2. High resolution high-fidelity model and short time integration

We numerically compare the MC and MFMC estimators for the loss fraction, mean modified loss time, and mean flux label using our low-fidelity models for a situation in which we have a high resolution high fidelity time integrator, with which we can only afford to compute trajectories over a relatively short time. Specifically, here we use time step size Δτ and choose T_final as 0.1 ms. For computational budgets p = 100, 250, 500, 1000, 2500, we measure the RMSE of MC and MFMC by using 100 replicates of each estimator and compare them to a reference solution generated using MC with 10⁶ samples.

For a given computational budget p, we would ideally want to use the optimal MFMC estimator with n high-fidelity evaluations and m low-fidelity evaluations satisfying (13). However, in practice the resulting n and m are not necessarily integer. Thus we simply round down and choose

$\begin{equation*}n=\lfloor \frac{p}{1+\sqrt{\frac{\rho {(F,G)}^{2}}{w(1-\rho {(F,G)}^{2})}}}\rfloor ,\hspace{25.0pt}m=(p-n)w\end{equation*}$

and compare this MFMC estimator to the MC estimator with p high-fidelity model evaluations. In the case of the normalized flux label, we average the RMSE over the N_flux specified time points.

In figure 2 we see the clear variance reduction using MFMC with our low-fidelity models compared to standard MC. We see that for the same computational budget, for each quantity of interest, MFMC is able to noticeably outperform MC. The most remarkable improvement is for the mean normalized flux label, where MFMC is nearly 30 times faster than MC. The improvement for the loss fraction is more modest, but MFMC is still four times faster than MC.

In the case of the normalized flux label, each of the N_flux specified time points has a different speedup, and in figure 2 we report the median over all such values. The different speedups at each specified time point t_k is due to the fact that the correlation between F^flux and G^flux decreases over time, which can be seen in figure 3. As a result the measured speedup for computing $\mathbb{E}[{F}^{\text{flux}}({t}_{1},{\mathbf{R}}_{0},{V}_{0})]$ is much larger than the measured speedup of, e.g. $\mathbb{E}[{F}^{\text{flux}}({T}_{\text{final}},{\mathbf{R}}_{0},{V}_{0})]$ . Consequently, in utilizing the speedups for different time points we must be careful of outliers. This is the primary reasoning behind assessing the speedup for the whole trajectory by the median of the speedups at each time point, since the median is more robust to outliers than other measures of central tendency, such as the mean.

**Figure 3.** Correlation between F^flux and G^flux as a function of time, using different time step sizes for F^flux (which in turn leads to a different G^flux for each step size). We note that all the correlations stay practically the same until around 3 × 10⁻⁴ s. After that, the experiment with the lowest resolution high-fidelity model (i.e. 10Δτ) has a rapid drop off in correlation. The decrease with respect to different time step sizes is expected, with the highest resolution experiment maintaining the highest correlation.
Download figure:
Standard image High-resolution image

5.3. Lower resolution high-fidelity model and longer time integration

We now extend our analysis to a longer T_final of 20 ms. Since this is 200 times longer than the short time integration case, we must increase the step size for the high-fidelity model by a factor of 3, 5, 7, and 10 for computational tractability. Note that changing the step size changes the high-fidelity models F^lost, F^time, and F^flux, which implicitly changes the corresponding low-fidelity models G^lost, G^time, and G^flux.

The results in table 2 indicate that accuracy of the high-fidelity model has a direct impact on the level of correlation between the high-fidelity and the low-fidelity models. The numerical experiments with 3Δτ and 5Δτ yield similar correlations by the final time, but we see that as the time step of the high-fidelity numerical integrator further increases, and the accuracy of the high-fidelity model thus further decreases, the correlation with the low fidelity model decays quickly. A partial exception to that conclusion applies to the modified loss time, for which the correlation remains steady as we increase the time step of the ODE integrator. For the moment, we do not have an explanation for this surprising observation.

The importance of relying on an accurate high-fidelity model can be seen in figure 3 as well, where we plot the correlation between F^flux and G^flux as a function of time, for different choices of step size. For short time all step sizes of the numerical ODE integrator provide the same correlation, but as time increases, we see that low resolution in the high-fidelity model leads to rapid loss of correlation. Moreover we notice the general pattern that, uniformly in time, the more resolution one has for the high-fidelity model, the more correlation one obtains with the corresponding low fidelity model.

Since the costs of our low-fidelity models remain the same, it is doubly beneficial for MFMC that correlation between the high-fidelity and low-fidelity models tends to increase as the resolution, and thus cost, of the high-fidelity solver increases. Indeed, improving the resolution of the high-fidelity model not only yields better correlation to the resulting low-fidelity model, but also a larger cost ratio, which in turn means more variance reduction with MFMC.

Just by looking at tables 1 and 2 and figure 3, we might expect some of the speedup we obtained in our high resolution, short time integration analysis to extend to the current analysis with lower resolution and longer time integration. Although the correlation goes down for longer time, the cost ratio grows rapidly, which in the MFMC setting can offset the effects of decreasing correlation. Furthermore, we may expect that the models with smaller step sizes (e.g. 3Δτ) will yield better speedup than models with larger step sizes (e.g. 10Δτ). This is precisely what we now test numerically.

For computational budgets p = 100, 250, 500, 1000, we measure the RMSE of MC and MFMC by using 90 replicates for each estimator and compare them to a reference solution generated using MC with 10⁵ samples. We rely on fewer replicates in estimating the RMSE and have less resolution in the reference solution because of the significantly increased computational cost for longer time integration of the particle orbits. We balance the costs in the same manner as in the short time integration case, and also assess the flux label speedup using the median, for the same reasons as before.

Our speedup results for the lower resolution, longer time integration experiments are displayed in figure 4. Broadly, we see that we lose some speedup for this larger T_final as compared to our short time integration experiment, but still achieve an order of magnitude speedup in estimating the mean flux label when the step size 3Δτ is used. For the other quantities of interest, the speedup is also quite large, the smallest being an almost four times speedup for estimating the loss fraction with the high fidelity integrator with time step 10Δτ. For all time step sizes, the speedup for estimating the loss fraction is about 4, as we had obtained for the experiment with a short time integration.

Comparing across experiments with different step sizes, we note that we typically get the most speedup for the most accurate, and most expensive, high-fidelity model, with step size 3Δτ. As the step size increases, we see that the speedup in estimating the loss fraction decreases the slowest, whereas the speedup for estimating the mean flux label decreases the fastest. This is consistent with the results from figure 3, and it means that when we increase the step size, the larger cost ratio does not seem to fully make up for the decreased correlation. The speedup for the mean modified loss time does not appear to depend on the step size in a clear manner, and is between around 6.5 and 8 depending on the step size. These results are also consistent with the correlations reported in table 2, where we did not observe any obvious trend with the step size for this quantity of interest. Note that for the largest step size 10Δτ, there is more speedup in estimating the mean modified loss time than in estimating the mean flux label.

6. Summary and discussion

We have developed and implemented the first MFMC scheme for estimating energetic particle confinement in stellarator configurations. For the Wistell-A magnetic equilibrium we considered, this has allowed us to gain over an order of magnitude in speedup compared to standard MC for certain cases and certain physical quantities. For other cases and physical quantities for which MFMC acceleration is less efficient, the speedup was still at least a factor of four. This speedup was gained by leveraging efficient data-driven surrogate models built from interpolants.

Although we found that MFMC provided the largest speedup for our highest resolution high fidelity model with particle trajectories which were integrated over a short time scale, we observed that the speedups for estimating the loss fraction and the modified mean loss time remained large and roughly constant as we significantly increased the final time for the integration of the particle trajectories and reduced the resolution of our high-fidelity models.

6.1. Benefits in early vs late design phases

As our results suggest, the acceleration obtained by replacing standard MC with MFMC will depend on the physical quantity of interest, on the equilibrium magnetic configuration, and on the manner the uncertainty in the initial conditions is introduced. For example, one can mathematically show that speedup for the loss fraction deteriorates as the true loss fraction $\mathbb{E}[{F}^{\text{lost}}]$ goes to 0. If we had not assumed that alpha particles are born uniformly in the stellarator volume U, but instead had had all of them born on a flux surface close to the magnetic axis, as is sometimes done in confinement studies, the true loss fraction would have been smaller, and the MFMC speedup for this quantity of interest may have been smaller as well. On the other hand, the Wistell-A configuration is already well optimized for alpha particle confinement. If we had considered a magnetic configuration with significantly worse alpha particle confinement, we may have obtained a much higher speedup through our MFMC implementation. In the context of stellarator optimization, the approach we presented in this article may therefore be most useful in the early design optimization phase, when the loss fraction is high. In the later design phases, if the loss fraction has been optimized to become very low, the variance reduction provided by MFMC via control variates becomes ineffective for the estimation of the loss fraction (but not necessarily for other confinement quantities, as we showed in this work). For such situations, variance reduction methods based on, e.g. multi-fidelity importance sampling are more suitable [47].

6.2. Surrogate modeling

There are several open questions regarding the construction of surrogate models for estimating energetic particle confinement in stellarator configurations. In this work, we focused on interpolants because they can be readily constructed with off-the-shelf software packages that are widely used. If training data are available, then constructing an interpolant as surrogate model has negligible computational costs compared to, e.g. a high-fidelity model evaluation. However, if training data are unavailable, then they need to be obtained first, which can incur one-time high costs. One direction of future research is deriving surrogate models with adaptive methods that judiciously explore the parameter space and request new training samples only where necessary based on error indicators, rather than uniformly sampling the parameter space as in this work. For example, surrogate models based on sparse grids have been shown to require little data [29]. Another line of future work is to develop physics based reduced models that require no training [47].

The optimization loop for finding a stellarator design with good confinement properties poses additional challenges but also opportunities. In the early stages of optimization, when the magnetic configuration changes significantly from one optimization iteration to the next, surrogate models built for the previous iterations can become obsolete quickly. However, less accurate and quick to build surrogate models are sufficient because relatively crude estimators of the objective can provide the relevant information for finding descent directions. Physics based reduced models, which require no training, can play a critical role in these early stages of optimization. In later stages of the optimization, surrogate models may be re-used for multiple iterations because they remain valid as the magnetic equilibrium changes only slightly from one optimization iteration to the next. Thus, it can be worthwhile to invest a larger budget in training a surrogate model to reduce the variance with multi-fidelity methods to obtain accurate estimators of the optimization objective. The development of multi-fidelity methods that trade-off the surrogate construction to exploit the properties of early vs later optimization stages is a challenging but promising research topic. Note in that regard that multi-fidelity methods have been successfully used for design in various engineering disciplines [42, 43, 54], but have not yet been applied to problems involving sensitive chaotic dynamics, as we have studied in this article.

6.3. Scope

We applied our approach to a relatively straightforward setup, but our MFMC framework for estimating energetic particle confinement lends itself to extensions to more detailed physics and design studies. For example, it would be straightforward to replace our numerical ODE integrator with existing high performance solvers for guiding center trajectories, which may also include collisions [18, 49] and wave–particle interactions [56]. Another immediate extension would be to consider more sophisticated models for the initial conditions of the alpha particles, accounting for energy spread at birth, and the fact that for peaked pressure profiles, most alpha particles are born in the core of the device. Our success with MFMC using guiding center orbits also suggests that estimating statistics based on full-orbit trajectories may become computationally tractable. While full-orbit simulations are extremely expensive to perform, if we can find a proper surrogate to leverage, MFMC would enable high accuracy estimation of full-orbit statistics while only requiring a very small amount of full-orbit simulations. These topics are all subjects of ongoing work. Finally, it is important to stress that for a reliable analysis of energetic particle confinement, single-particle codes and theories may not be sufficient. A more self-consistent drift kinetic theory has recently been proposed for tokamaks with magnetic field perturbations [60], and we do not yet know if MFMC is relevant for such theory, and could lead to speed-up.

Acknowledgments

We would like to thank Aaron Bader for providing the VMEC files for the Wistell-A magnetic equilibrium, and for many enlightening conversations on numerical methods for energetic particle confinement studies. We also would like to thank Alain Brizard and Matt Landreman for their comments on an early version of this manuscript, which helped us to improve it. Frederick Law was supported by the Department of Defense (DoD) through the National Defense Science & Engineering Graduate (NDSEG) Fellowship Program and was supported in part by the Research Training Group in Modeling and Simulation funded by the National Science Foundation via Grant RTG/DMS—1646339. Antoine Cerfon was partially supported for this work by the Simons Foundation under Grant No. 560651, and by the United States National Science Foundation under Grant No. PHY-1820852. Benjamin Peherstorfer was partially supported by AFOSR under award number FA9550-21-1-0222 (Dr Fariba Fahroo) and by NSF under award number CMMI-1761068 and award number DMS-2012250.

Accelerating the estimation of collisionless energetic particle confinement statistics in stellarators using multifidelity Monte Carlo

Article metrics

Permissions

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Abstract

1. Introduction