Nonlinear gyrokinetic predictions of SPARC burning plasma profiles enabled by surrogate modeling

P. Rodriguez-Fernandez; N.T. Howard; J. Candy

doi:10.1088/1741-4326/ac64b2

1. Introduction

Fusion energy is poised to provide clean, unlimited, reliable and safe energy to the world. Upon the prospects of ever-increasing energy needs in a developing world, the limitation of raw fuels and the need for de-carbonization of the energy infrastructure, fusion has been considered to be humankind's surest bet toward a clean energy future. Although fusion is often stated to be thirty years away, progress on the construction of ITER [3], milestones from the inertial confinement fusion program [4], increased private investment [5, 6] and recent developments of high temperature superconducting magnets [7, 8] now place burning plasmas on the horizon.

Our ability to optimize the design point of fusion reactors relies upon our capability to reliably predict burning plasma behavior. In particular, how well we can simulate computationally (and develop reduced models of) the transport of energy and particles out of the hot plasma directly affects how much risk a given project has to undertake. Historically, for reactor design studies, energy and particle transport have been modeled through two quantities derived from empirical laws and decades of experiments: the energy confinement time, τ_E, and the density peaking, ν_n. Both quantities affect how much fusion energy a plasma is able to produce when a given amount of power is input into the system. This, of course, is directly connected to the fusion gain, Q, and eventually the engineering gain of the fusion facility, Q_E. However, in the last decade, integrated modeling of fusion plasmas with physics-based models has become common in the field. Thanks to the development of reduced models of transport such as TGLF [9] and QuaLiKiz [10], solutions of the steady-state fusion plasma are possible with a reasonable computational expense.

Despite tremendous advances with reduced, physics-based model development [11] and their success in explaining experimental tokamak phenomena [12–18], the ability of reduced models to predict plasma behavior in novel, burning-plasma regimes needs to be carefully studied. This precaution arises from the fact that, generally, the models used within integrated modeling frameworks must reduce the computational cost of transport calculations by a factor of, at least, 10⁶ (compared to first-principles simulations) to be computationally tractable. In the case of turbulent transport modeling, these assumptions often involve quasilinearity in the evolution of core micro-instabilities and a limited database of nonlinear results that are used to map such instabilities to meaningful transport quantities. A lot of work is devoted worldwide to validating these reduced models by direct comparison to experiment [19], but the differences between the parameter space of current fusion devices and that of future reactors cast some doubt on the validity of reduced models predictions. Simulations from first principles still remain our most powerful tool toward reliably predicting burning plasma behavior, but the extraordinary computational cost of such simulations has precluded their use to predict fusion performance of future devices, and have mostly been used as spot checks and 1D parameter scans.

2. Background

Part of the complexity to predict transport in plasma systems is due to the very nonlinear nature of plasma behavior and the extreme separation between turbulence time-scale, $\mathcal{O}(1{0}^{-6}\enspace \text{s})$ , and power balance characteristic times, $\mathcal{O}(1{0}^{0}\enspace \text{s})$ . Hence, solving for the steady state solution has mostly been exclusive for reduced transport models, and only a few examples are found in the literature where truly nonlinear turbulence simulations are used to find the steady-state plasma solution in experimentally relevant conditions. Transport solvers such as TGYRO [20] and TRINITY [21] presented numerical techniques based on Newton methods to handle the separation of spatial and temporal scales and were able to couple the turbulence dynamics of flux-tube simulations with the macroscopic plasma evolution. Such methods are predicated on the idea that each time step in the macroscopic transport evolution encompasses a separate evolution of turbulence until saturation. However, the need to calculate a numerical approximation to the transport Jacobian matrix and the implementation of relaxation and step-correction techniques often lead to high computational requirements. For non-linear turbulence simulations, it is often the case that 50–100 calls per radial point [20, 21] are required in order to achieve a sufficient level of convergence³ . This feature of scale-bridging transport solvers often imposes limitations to the physics fidelity of the turbulence simulations. Assuming a pure plasma, equal electron and ion temperatures, fixed density profile, adiabatic electrons, fixed heat sources and sinks, and neglecting magnetic fluctuations are common simplifications found in the literature.

In parallel, several efforts are on-going to develop and apply full-f transport solvers [23] that do not require scale or physics separation [24] and that evolve the total distribution function, such as XGC1 [25], ORB5 [26], GYSELA [27] or GT5D [28], to name a few. However, such direct approach to simulate turbulence dynamics in energy confinement time-scales presents great challenges and requires extreme computational resources, which has limited their applicability, validation and physics fidelity. Most full-f gyrokinetic simulations found in the literature evolve the plasma for short times (a few collision times), which is enough to study certain transport and turbulence saturation processes but not sufficient for performance and confinement prediction [29] of experimentally relevant plasma conditions. Furthermore, it has been stated that the full-f approach does not scale well to upcoming small −ρ_* tokamak experiments, such as SPARC [1] and ITER [30].

Flux-matching techniques suitable for global turbulence solvers are also being developed [31, 32], which do not require Jacobian calculations and employ Picard-like numerical schemes to solve the implicit time-step of the transport equation. Despite their great promise, the simulations carried out so far have been limited in physics fidelity, and further work is needed to deal with more than one transport channel and multiple gyrokinetic species.

This work presents a new multi-scale method to find steady-state solutions of temperature and density profiles in tokamak plasmas using a minimal number of expensive nonlinear simulations per radial location, leveraging techniques developed in Bayesian statistics, machine learning and black-box optimization fields.

3. Physics description

SPARC [1, 33], now under construction, is a compact, high-field, D–T fusing tokamak that aims to be a demonstration device for the high-field path to fusion energy. It uses high temperature superconducting magnets to produce on-axis fields of up to 12.2 T, and employs ion cyclotron range-of-frequencies (ICRF) as its auxiliary heating mechanism. All tokamak systems in SPARC are being designed to sustain the primary reference discharge (PRD). The PRD is a full-field (12.2 T), full-current (8.7 MA) scenario that has been predicted via empirical scalings [1] and physics-based integrated models [34] to produce a fusion gain of Q ∼ 9–11 in standard H-mode operation. This work presents high-fidelity core profile predictions for the PRD scenario, with the assumption of a peeling-limited pedestal [35], and with input ICRF power and density as scoped in its physics basis [1, 33, 34].

3.1. Macroscopic plasma evolution

As we are interested in the macroscopic behavior of the core plasma (temperature and density profiles to determine fusion performance), the transport equations to solve are written in the form of flux-surface average velocity moments of the electrons and ions distribution functions, and closed by providing a proper model for particle and energy fluxes. In this work, fluxes will be computed with high-fidelity collisional and turbulence models, as it will be described in detail in section 3.3. The separation between macroscopic (transport) and microscopic (turbulence) evolution is possible by virtue of δf gyrokinetic scale separation in devices with sufficiently small ρ_*, which is the case in many of current experiments and certainly in upcoming burning plasmas.

By integrating over the interior to the flux surface at radius r and assuming steady-state, the system of coupled equations to solve has the following form:

$\begin{align}\hfill \langle {\mathbf{q}}_{\text{e}}\cdot \nabla r\rangle & =\frac{1}{{V}^{\prime }}{\int }_{0}^{r}\left(\langle {P}_{\text{aux},\text{e}}\rangle +\langle {P}_{\alpha ,\text{e}}\rangle -\langle {P}_{\text{ei}}\rangle -\langle {P}_{\text{rad}}\rangle \right){V}^{\prime }\,\mathrm{d}r\hfill \\ \hfill \langle {\mathbf{q}}_{\text{i}}\cdot \nabla r\rangle & =\frac{1}{{V}^{\prime }}{\int }_{0}^{r}\left(\langle {P}_{\text{aux},\text{i}}\rangle +\langle {P}_{\alpha ,\text{i}}\rangle +\langle {P}_{\text{ei}}\rangle \right){V}^{\prime }\,\mathrm{d}r\hfill \\ \hfill \langle {\mathbf{\Gamma }}_{\text{e}}\cdot \nabla r\rangle & =\frac{1}{{V}^{\prime }}{\int }_{0}^{r}\langle {S}_{\text{e}}\rangle {V}^{\prime }\,\mathrm{d}r,\hfill \end{align} \tag{ 1 }$

where the r coordinate used here is the half-width of the mid-plane intercept, ⟨⋅⟩ indicates flux-surface averaging, and V' = ∂V/∂r where V is the volume interior to the flux surface. Sources and sinks are described in section 3.2, and the flux-surface averaged transport fluxes in section 3.3. The volume integrals in equation (1) result in the coupling of radial locations (i.e. a change in deposited power and particles in inner locations has a direct effect in outer locations), which is a consequence of the steady-state condition used in this work.

The elements in the left-hand side of equations (1) are descriptive of the transport fluxes, and will be referred to as ${Q}_{\text{e}}^{\text{tr}}$ and ${Q}_{\text{i}}^{\text{tr}}$ (in units of power per unit area), and ${{\Gamma}}_{\text{e}}^{\text{tr}}$ (in units of particles per second and unit area). The right-hand side includes target fluxes, and will be referred to as ${Q}_{\text{e}}^{\text{target}}$ , ${Q}_{\text{i}}^{\text{target}}$ and ${{\Gamma}}_{\text{e}}^{\text{target}}$ . A steady-state solution of the problem (also called flux-matched solution) is the set of electron temperature, ion temperature and electron density radial profiles that satisfy ${Q}_{\text{e}}^{\text{tr}}={Q}_{\text{e}}^{\text{target}}$ , ${Q}_{\text{i}}^{\text{tr}}={Q}_{\text{i}}^{\text{target}}$ and ${{\Gamma}}_{\text{e}}^{\text{tr}}={{\Gamma}}_{\text{e}}^{\text{target}}$ at all radial locations. Therefore, a converged solution will satisfy equations (1) (i.e. will ensure energy and particle balance). The ansatz used in this work and the calculation of both transport and target quantities are based on the implementation used by the TGYRO code [20].

3.2. Target fluxes calculation

Auxiliary electron heating ⟨P_aux,e⟩ and ion heating ⟨P_aux,i⟩ are assumed unchanged during the transport evolution, together with the particle sources ⟨S_e⟩. This assumption is based upon the fact that we do not expect strong variations of the deposited power and particle profiles from electromagnetic waves and wall sources and recycling when the core temperature and density profiles vary. Nevertheless, the flexibility of the Bayesian framework presented here allows for the self-consistent variation of auxiliary power profiles, but it is left for future work.

The classical energy exchange between electron and ion species is defined as:

$\begin{equation}\begin{aligned}\hfill \langle {P}_{\text{ei}}\rangle ={c}_{\text{ex}}\cdot \sum\limits _{\mathrm{i}}\frac{{Z}_{\text{i}}^{2}{n}_{\text{i}}{m}_{\text{i}}^{1/2}}{{\left({m}_{\text{e}}{T}_{\text{i}}+{m}_{\text{i}}{T}_{\text{e}}\right)}^{3/2}}\cdot \mathrm{ln}\,{\Lambda}\cdot {n}_{\text{e}}\cdot \left({T}_{\text{e}}-{T}_{\text{i}}\right)\end{aligned},\end{equation} \tag{ 2 }$

where we use $\mathrm{ln}\,{\Lambda}=24.0-\mathrm{ln}\left(\sqrt{{n}_{\text{e}}}/{T}_{\text{e}}\right)$ and c_ex is a constant of proportionality. Detailed definitions and proper units can be found in reference [36]. The energy exchange power, as it enters in the electron and ion energy balance equations (equations (1)) will be varied self-consistently during the convergence process.

The total alpha heating, ⟨P_α⟩ is calculated using the parameterization of the D–T fusion cross-sections proposed by Bosch and Hale [37]. The partition between electrons ⟨P_α,e⟩ and ions ⟨P_α,i⟩ is calculated as described by Stix [38]. In this model, the local power density that goes into the ions is given by:

$\begin{equation}\begin{aligned}\hfill \langle {P}_{\alpha ,\text{i}}\rangle & =\langle {P}_{\alpha }\rangle .{\hat{W}}_{\mathrm{c}\mathrm{r}\mathrm{i}\mathrm{t}}{\int }_{0}^{1/{\hat{W}}_{\mathrm{c}\mathrm{r}\mathrm{i}\mathrm{t}}}\frac{1}{1+{x}^{3/2}}\,\mathrm{d}x\hfill \\ \hfill {\hat{W}}_{\mathrm{c}\mathrm{r}\mathrm{i}\mathrm{t}}& ={c}_{\alpha }\cdot {T}_{\text{e}}{\left(\frac{1}{{n}_{\text{e}}}\sum\limits _{\mathrm{i}}\frac{{Z}_{\text{i}}^{2}{n}_{\text{i}}}{{m}_{\text{i}}}\right)}^{2/3},\hfill \end{aligned}\end{equation} \tag{ 3 }$

where c_α is a constant of proportionality [38].

Radiation ⟨P_rad⟩ as a sink for the electron energy channel will be calculated accounting for Bremsstrahlung, line and synchrotron radiations. The impurity assumptions used in this work are identical to those reported in reference [34], and each ion radiation contribution is computing using a Chebyshev polynomial representation of ADAS data [39, 40] for the average charge state of the impurity ion, as implemented in the TGYRO code [20]. Following this methodology, the cooling rate per impurity is given by:

$\begin{equation}\mathrm{l}\mathrm{n}\;{L}_{Z}=\sum\limits _{k=0}^{n}{c}_{k}\cdot \mathrm{c}\mathrm{o}\mathrm{s}\left(k\cdot \mathrm{a}\mathrm{r}\mathrm{c}\mathrm{c}\mathrm{o}\mathrm{s}\left(\frac{2\;\mathrm{l}\mathrm{n}({T}_{\text{e}}/{T}_{\mathrm{m}\mathrm{i}\mathrm{n}})}{\mathrm{l}\mathrm{n}({T}_{\mathrm{m}\mathrm{a}\mathrm{x}}/{T}_{\mathrm{m}\mathrm{i}\mathrm{n}})}-1\right)\right),\end{equation} \tag{ 4 }$

where c_k are the Chebyshev coefficients (n = 11) and T_max and T_min define the fit range. The Bremsstrahlung and line radiated power density contribution per impurity is then ⟨P_rad,i⟩ = n_e n_i L_Z,i. Synchrotron radiation is calculated following reference [41].

Figure 1 depicts a comparison between the high-fidelity TRANSP [42] simulation of the SPARC PRD [34] and the targets model described in this section. Differences of less than 10% are found, despite the several orders of magnitude differences in run time. In particular, the predictions from the Stix model (equations (3)) are in remarkable agreement with the expensive, Monte-Carlo simulations with the NUBEAM code [43] in TRANSP.

In summary, we have that each of the power densities can be written as nonlinear functions of the local kinetic profiles: ⟨P_ei⟩ = f(T_e, T_i, n_e), ⟨P_α,e/i⟩ = f(T_e, T_i, n_e) and ⟨P_rad⟩ = f(T_e, n_e). These local power densities enter in equation (1) in the form of volume integrals, which leads to radial and channel coupling (i.e. non block-diagonal Jacobian) when solving for the steady-state solution. Here, the normalized inverse gradient scale lengths are the free parameters to be used to find the flux-matching solution, and the local kinetic profiles are reconstructed using the trapezoidal rule, following the TGYRO methodology [20]. Likewise, volume integrals of the local power densities are performed using a realistic geometry parameterization [44].

3.3. Transport fluxes calculation

The uniqueness of the solver presented here is the use of very high fidelity transport simulations. Instead of reduced quasi-linear transport models typical of flux-matching solvers, here we employ direct non-linear gyro-kinetic plus kinetic neoclassical particle and energy fluxes.

Turbulent fluxes were calculated using the continuum gyrokinetic code CGYRO [2]. CGYRO is a widely used code that has been optimized for modern computing architectures, and has been specifically designed for enabling multi-scale and near-edge and pedestal gyrokinetic simulations. Here, nonlinear simulations spanning much of the plasma minor radius (ρ_N = 0.25–0.85, where ρ_N is the normalized square root of toroidal flux coordinate) are performed. Unlike previous work that utilized nonlinear gyrokinetics for the prediction of plasma profiles (described in section 1), the simulations performed here included high physics fidelity, allowing for the simultaneous determination of particle and heat transport. Each simulation was performed utilizing six gyrokinetic species: deuterium, tritium, fluorine (Z = 9, A = 18, representing a lumped low-Z ion species), tungsten (partially ionized with Z = 50, A = 184), and a fast ³He population at 5% of the local electron density, representing the minority species for the ion-cyclotron heating. A population of alpha particles was notably not included in this simulations as it was determined that they represented a small fraction of the overall ion density and thus could be ignored without significant loss of fidelity.

The simulations performed in this work built off of previously reported nonlinear gyrokinetic simulations [45] which outlined the model fidelity and numerical setup needed for accurate simulation of the SPARC PRD. All simulations performed for this work were nonlinear and simulated electromagnetic turbulent fluctuations (evolving both δϕ and δA_|| contributions). The inclusion of electromagnetic effects was reported as potentially playing a significant role in the stabilization of ion temperature gradient (ITG) modes in the inner regions of plasma simulated here, and due to the minor increase in computational cost, they were included. Additionally, recent results [46] demonstrated a non-negligible impact of electromagnetic contributions to density peaking in predictive TGLF simulations.

The target simulation box sizes were L_x, L_y ∼ 80 × 80ρ_s. However, the exact box sizes that are attainable are determined by local equilibrium values. As a result, the minimum box size utilized was $\sim 80\times 80{\rho }_{\text{s}}$ with box sizes at inner radii generally being larger $(\sim 110{\rho }_{\text{s}})$ than this target. We note that the chosen box sizes where found to be sufficient for capturing the turbulence in the conditions studied, as discussed in reference [45]. Each simulation utilized approximately 300 radial modes and 16 toroidal modes, capturing ion-scale turbulence and zonal flows in the range k_θ ρ_s = 0.0–1.2 while also using N_ξ = 24 (pitch angle grid points), N_θ = 24 (poloidal grid points), and N_energy = 8 (energy grid points). Here we note that k_θ is the poloidal wavenumber of the turbulence and ρ_s is the ion Larmor radius evaluated with the sound speed, c_s. Nonlinear convergence tests were reported in a previous paper (reference [45]) on the initial profiles used in this work. Increases in box size and the number of radial modes yielded no statistically meaningful changes in the fluxes. Linear checks in other resolutions, (energy, pitch-angle, radial modes, and poloidal grid points) were performed on both the initial and converged profiles to provide confidence in the convergence of the results. These tests yielded small changes in the linear growth rates (most changes < 2%) with a 50% increase in each numerical resolution. The chosen resolutions for these simulations were an attempt to achieve accurate results while also minimizing the computing time utilized in this work. Plasma shaping properties were modeled using a MXH equilibrium [44] and collisions were modeled using the Sugama collision operator [47]. The effects of rotation were not included in these simulations but future work will include an estimated rotation profile to assess the impact of rotation and E × B shearing effects on the predicted profiles. However, it should be noted that such effects are generally expected to improve confinement and increase performance.

The simulation times required to obtain converged heat and particle fluxes at each of the radii simulated varied substantially, consistent with the large variation in the linear growth rates observed from ρ_N = 0.25 compared with 0.85. At the innermost radial position, nonlinear simulations were typically run for $\sim 1200a/{c}_{\text{s}}$ with averaging periods of approximately 400 − 500a/c_s, while the larger growth rates and therefore quicker saturation of the turbulence, allowed for approximately 600a/c_s simulations at ρ_N = 0.85 with average windows still targeting approximately 400a/c_s. Similar averaging windows were typically used throughout this work, however, there was variation due to the large variations in profiles and gradients that were studied and conditions that resulted in stable turbulence were sometimes terminated earlier.

The large variation in the profiles studied in this work resulted in dramatically different required time steps. This work took advantage of the adaptive time-stepping that was recently implemented in CGYRO to ensure that the appropriate numerical time step was utilized regardless of the profiles studied. Each simulation reported here utilized 72 nodes on the NERSC Cori KNL partition. Due to the variations in the length of the simulations, the exact computing time usage of these nonlinear simulations varied by location and profile studied. However, due to the large number of species simulated and resolutions utilized, on average we estimate that approximately 100 k CPU-hours were required per radial location, per profile studied. A number of methods (such as early stopping and reduction of number of ion species) can be utilized to reduce the overall computational time which are speculated to further reduce the computational resources required by approximately a factor of 2. These methods will be investigated as part of future work.

The use of ion-scale simulations in this work is motivated both by practical and physical considerations. Practically, the computational resources required to simultaneously resolve ion and electron-scales in a tokamak plasma would make profile prediction intractable, with the only existing simulations typically requiring 10–20 M CPU-hours per radial location [48–50]. Despite significant contributions to electron heating from fusion alphas, the conditions in the SPARC scenario studied here have been found to exhibit a large ratio of ion to electron heat flux Q_i/Q_e in the range 3–5 [34]. This balance of heat fluxes has been determined to occur in large part due to the strong collisional coupling of electrons and ions in the high density regime used in SPARC and due to the significant radiative losses that occur at the high operational temperatures. Large Q_i/Q_e ratios are a hallmark of dominant ion-scale turbulence, as electron-scale contributions (high-wavenumber trapped electron and electron temperature gradient modes) are responsible for near exclusively driving electron heat transport [51]. Furthermore, linear gyrokinetic and nonlinear electron-scale simulations were reported for the SPARC PRD profiles (that represent the initial profiles used in this work). These investigations suggested that the ratio of the electron to ion-scale growth rates (γ_high-k/γ_low-k) [52] and the electron heat fluxes obtained by electron-scale simulations, generally indicate a negligible role of high-k turbulence in the SPARC PRD [45]. As a result, the use of ion-scale simulations for prediction of SPARC appears robust and should be expected to lead to an accurate description of profiles and performance.

Even though turbulence dominates the energy and particle losses in the cases presented here, the predictions also include second-order neoclassical transport fluxes as calculated with the multi-species kinetic NEO [53] solver.

4. Flux-matching technique

From equations (1) it is straight forward to see how coupled the problem at hand is. We aim to solve for the flux-matching condition at all radial locations for three transport channels, that are interconnected through sources and sinks with all of the kinetic profiles that are the design variables of the minimization problem.

This work employs a Bayesian optimization (BO) framework to find the flux-matching solution of transport equations with minimal number of high-fidelity simulations. Given the extreme computational requirements of turbulence simulations, the evaluation of transport fluxes during the convergence process requires a careful treatment.

4.1. GP model

Under the modeling framework of Gaussian process (GP) regression, energy and particle transport fluxes as a function of input parameters (i.e. kinetic profile gradients and relevant plasma quantities) are considered to follow independent joint Gaussian distribution functions. The GP that models such distribution is considered to have a linear mean, μ(x), and a covariance described by a Matern-5/2 kernel, $\mathcal{K}(\mathbf{x},{\mathbf{x}}^{\prime })$ :

$\begin{equation}{F}^{\text{tr}}(\mathbf{x})\sim \mathcal{G}\mathcal{P}(\mu (\mathbf{x}),\mathcal{K}(\mathbf{x},{\mathbf{x}}^{\prime })),\end{equation} \tag{ 5 }$

where F^tr represents each of the local transport fluxes (i.e. electron energy flux, ${Q}_{\text{e}}^{\text{tr}}$ , ion energy flux, ${Q}_{\text{i}}^{\text{tr}}$ , and electron particle flux, ${{\Gamma}}_{\text{e}}^{\text{tr}}$ ), and x is the set of local gradients.

The solution of the transport flux evaluation with CGYRO is assumed to be polluted by heteroscedastic Gaussian noise, to account for the uncertainties in each of the evaluations of the true gyrokinetic transport fluxes (feature of initial value solvers such as the one used here):

$\begin{equation}{y}^{\text{tr}}(\mathbf{x})={F}^{\text{tr}}(\mathbf{x})+\mathcal{N}(0,{\sigma }_{y}^{2}(\mathbf{x})).\end{equation} \tag{ 6 }$

In this context, we assume that the standard deviation of the flux, σ_y(x), is known for each evaluation, and is calculated individually per profile and radial location simulated in CGYRO. Here we employ an approach that uses the auto-correlation time of the time series to determine the standard deviation of the fluxes, similar to previous work [54]. Each simulation time series was decomposed into windows that were three times wider than the calculated auto-correlation time and assumed to represent an independent sample. The standard deviation of the mean fluxes to be used in equation (6) was then reported as the standard deviation of independent samples divided by the square root of the number of samples.

This implementation of the likelihood function will be convenient when, in future work, a multi-fidelity approach to flux-matching is used. Similar to that proposed in reference [31], early stopping of the initial value simulations (before turbulence reaches full saturation) in those cases that are estimated to be far from targets could be used to save computational resources and can be assigned a larger noise to account for the evaluation uncertainty.

4.1.1. Physics-informed modeling

As described in section 3, three transport channels are solved for using n_r uniformly distributed radial locations in the ρ_N coordinate. This results in 3n_r design variables of the problem: electron temperature (∇T_e), ion temperature (∇T_i) and electron density (∇n_e) radial gradients at each of the simulated locations. This 3n_r–D optimization problem can, however, be simplified by taking into account that the flux evaluations in the context of this work are local, i.e. the flux at a given location can be described solely from the solution of transport equations at such location, and that long-range interactions only appear as a result of the imposition of the steady-state condition (i.e. in time-scales longer than turbulence decorrelation times). This means that the flux-matching technique takes care of radial coupling of the transport flux evaluations by ensuring a converged solution at the end, but each transport flux evaluation can be taken to be independent from each other at each iteration, and fully defined locally. By taking this into account, each local transport flux is assumed to be a function of six input quantities:

$\begin{equation}{F}^{\text{tr}}(\mathbf{x})=f(\nabla {T}_{\text{e}},\nabla {T}_{\text{i}},\nabla {n}_{\text{e}},{T}_{\text{e}},{T}_{\text{i}},{n}_{\text{e}}),\end{equation} \tag{ 7 }$

where the set of input parameters of the models are fully defined locally. At the end of this paper, a discussion on the locality assumption is provided.

To further simplify the problem and ease model training, we make the physics-informed assumption that turbulence within the nonlinear simulations is described in terms of the following normalized quantities:

$\begin{equation}{F}^{\text{tr}}(\mathbf{x})={C}_{\text{GB}}\cdot \hat{{F}^{\text{tr}}}(a/{L}_{{T}_{\text{e}}},a/{L}_{{T}_{\text{i}}},a/{L}_{{n}_{\text{e}}},{T}_{\text{i}}/{T}_{\text{e}},{\hat{\nu }}_{\text{ei}}),\end{equation} \tag{ 8 }$

where $a/{L}_{{T}_{\text{e}}}$ , $a/{L}_{{T}_{\text{i}}}$ and $a/{L}_{{n}_{\text{e}}}$ are the normalized inverse gradient scale lengths. C_GB represents the gyro-Bohm normalization, either energy flux ${Q}_{\text{GB}}\propto {n}_{\text{e}}{T}_{\text{e}}^{5/2}$ or particle flux ${{\Gamma}}_{\text{GB}}\propto {n}_{\text{e}}{T}_{\text{e}}^{3/2}$ . And ${\hat{\nu }}_{\text{ei}}$ is the electron–ion collision frequency normalized to c_s/a, so that ${\hat{\nu }}_{\text{ei}}\propto \mathrm{ln}\,{\Lambda}\cdot {n}_{\text{e}}{T}_{\text{e}}^{-2}$ . The functions that map normalized inputs to normalized outputs, $\hat{{F}^{\text{tr}}}$ , will be the ones to be modeled with the GPs.

Figure 2 shows an example of fitting independent GP models with Matern kernel and linear mean to transport fluxes at several core radial locations in 1D scans of ion temperature gradient from nonlinear CGYRO simulations. GPs provide a good way to capture different behaviors (including critical gradient) without the need to provide functional forms and associated hyper-parameters. The non-parametric nature and the ability to include uncertainties in the observed data points make GP models suitable for the problem at hand. Even though the burning plasmas of interest in this work are often dominated by ITG-type turbulence [34, 45, 56], the non-parametric surrogates employed here do not preclude other types of regimes, or even transitions in between dominant turbulence modes (likely at the expense of more iterations in such complex situations), as, in principle, the GP models could include any arbitrary non-linear behavior with respect to the input parameters. While this work utilizes standard kernels and mean functions used widely in the BO literature, it is possible that functions specific to fusion applications would lead to even better performance, following the experience of past work with GP regression in the field [57], but this and the investigation of whether GPs are suitable to reproduce transport in any arbitrary turbulence regime is left for future work.

**Figure 2.** Scans of inverse normalized ion gradient scale length, $a/{L}_{{T}_{\text{i}}}$ , in ion-scale non-linear `CGYRO` simulations from [55]. GPs models have been trained with only three points per radius, demonstrating the powerful predictive capabilities of non-parametric models to capture the different behavior of (from left to right) ion heat flux, electron heat flux and electron particle flux. GP models and priors are identical. Hyper-parameters are fitted to minimize the negative marginal log likelihood in each case.
Download figure:
Standard image High-resolution image

Equation (8) indicates that the surrogate modeling problem translates into 3n_r independent 5D GP models. As it will be shown next, this choice works well for the context of this problem, but it could be generalized by the addition of further normalized quantities, at the expense of having to perform more observations to properly fit the GP model⁴ . Figure 3 depicts the schematic of the surrogate model that results from implementing the physics-informed GP model as described earlier. For GP training and evaluation, the GPyTorch package [58] is used. The entire model described by equation (8) and every operation depicted in figure 3 are implemented with PyTorch [59] tensors, which allows to leverage automatic differentiation [60] for both training of the GP models and the evaluation of the full Jacobian of the system for optimization and root finding.

**Figure 3.** Physics-informed model to optimize through the BO framework (only one radial location, r₁, is shown). Inputs to the model are the set of inverse normalized gradient scale lengths for the three channels at all radial locations (yellow boxes on the left). Output to the model is the mean of the residual in all channels (red box on the right). Implementation of the model fully in `PyTorch` allows to retrieve the analytical Jacobian of the entire system, as the framework is built so that every operation in the figure has an exact Jacobian representation. Regression with GP models is performed only at the level of the blue rectangles.
Download figure:
Standard image High-resolution image

Note that the physics-informed surrogates as presented in equation (8) are only a function of the number of channels to simulate, n_p, and not a function of the number of radial locations, n_r. With a finer radial grid, the model would require more input gradients to find the solution, but the accuracy of the local surrogates would not be compromised with the same number of complete profiles simulated (i.e. the surrogate models remain 5D). This suggests that the framework presented here has excellent scalability and parallelization properties that will be exploited in future work.

Despite leveraging the use of normalized gyro-Bohm units to fit the turbulent transport surrogates, the minimization of the transport residual is done in real units. This approach is advantageous when extending the predictive core region to the far edge. The gyro-Bohm energy flux is very sensitive to the temperature, ${Q}_{\text{GB}}\propto {T}_{\text{e}}^{5/2}$ , which results in extreme differences between edge and core (in particular when solving for L-mode-type plasma conditions). This is typically very challenging for flux-matching solvers, as the edge transport residual dominates the flux-matching process, often requiring extremely small step sizes in, e.g., Newton solvers. Using real units to minimize the residuals thus provides a clear advantage, as the fluxes to match are typically of the same order of magnitude between radial locations.

4.2. Iterative framework

To find the predicted plasma profiles in steady-state, the transport fluxes must match the target fluxes at each radial location. This condition can be written as:

$\begin{equation}{\mathbf{x}}^{\ast }=\underset{\mathbf{x}{\epsilon}[{\mathbf{x}}_{\text{L}},{\mathbf{x}}_{\text{U}}]}{\mathrm{arg min}}\vert {\mathbf{F}}^{\text{tr}}(\mathbf{x})-{\mathbf{F}}^{\text{target}}(\mathbf{x})\vert ,\end{equation} \tag{ 9 }$

where F^tr and F^target represent the set of electron energy, ion energy and electron particle transport and target fluxes at all radial locations, x is the set of all gradients and [x_L, x_U] are the allowed bounds. This results in a multi-objective minimization problem of the 3n_r transport residuals. We remind the reader that the target fluxes F^target are not fixed. The expected exchange power, radiation, and alpha heating are affected by the changes in kinetic profiles and are updated at each iteration. Sometimes this is referred to as a 'moving targets' calibration problem, and contributes to the high cost of finding flux-matching conditions with standard optimization methods and finite-differences-based Jacobian calculations.

The framework built here to solve for the flux-matching solution is based on standard BO techniques. A small set of kinetic profiles (typically 5 to 10) is used to train the initial set of GP models to reproduce each of the transport fluxes. This training set is produced by means of Latin hypercube sampling to ensure efficient filling of the parameter space even when a low number of samples are used. Even though 5 to 10 initial training samples may seem very low to fit 5D surrogate models, they are typically sufficient to drive the system near marginal stability in stiff plasma conditions. Further refinement to the surrogate models occurs later on to understand where the critical gradient sits at exactly. At present, the method requires that the final converged gradients are within the original training bounds, so that the optimization method only searches in regions of the multi-dimensional parameter space where the model has been trained. Future work will focus on devising methods to find solutions outside of the original bounds.

With the GP models in hand, the flux-matching solution of the GP representation can be obtained by leveraging standard optimization techniques. This work performed optimization based on the mean of the posterior distribution. A genetic algorithm (GA) based on the DEAP library [61] is employed to generate a set of initial solutions by minimizing the mean of the posterior distribution of the average residual of all channels and all radial locations. A multi-start local minimizer based on the L-BFGS-B algorithm [62] as implemented in the BoTorch toolset [63] is used to refine the GA solutions. Finally, a multi-variate root-finder based on the modified Powell method, as implemented in scipy [64] is used to further refine the solutions. The use of the exact Jacobian without the need of finite difference approximations greatly reduces the computational cost of the optimization process, and eliminates the need to specify step sizes for finite differences and prevents oscillatory behavior.

Once a new flux-matching solution is found for the surrogate GP models, the best set of kinetic profiles is evaluated with the high-fidelity transport simulations. If convergence (i.e. actual flux-matching) has not been attained, the profile is added to the database of training samples for the next iteration. Because of this, the surrogate models become more accurate around the region of the parameter space that is expected to contain the final converged solution (around critical gradients). Further details on similar surrogate-based optimization methods for physics and engineering applications can be found in references [65–68].

5. Results for the SPARC tokamak

The transport solver presented here has been applied to simulations of the PRD scenario in SPARC. This discharge has been thoroughly examined with both empirical techniques [1] and integrated modeling with reduced transport models [34]. Furthermore, gyrokinetic analysis has been performed [45] around the PRD profiles predicted by TGLF. Standalone simulations and 1D parameter scans using CGYRO indicated dominance of ion-scale turbulence, which makes this case suitable for the analysis performed here. This work also showed the importance of accounting for multiple transport channels in a self-consistent manner, as, e.g., density gradients and collision rates had strong effects on the energy fluxes.

Four uniformly distributed radial locations in toroidal flux coordinate (ρ_N = 0.25, 0.45, 0.65, 0.85) are chosen for flux-matching. Temperature and density at ρ_N = 0.85 are taken as boundary conditions, although the normalized gradient scale lengths at that location are allowed to vary. This choice is motivated by the smooth gradient profile expected in this standard H-mode plasma, with no internal transport barriers or off-axis heating.

Figure 4 presents the convergence process followed to reach the flux-matching condition in SPARC's PRD. Initial temperature and density profiles (red) are those from reference [34], i.e. obtained using the TGLF-SAT1 model in TRANSP. This set of profiles, along with four randomly-selected profiles (black), are used for the first training of the surrogate models. The first iteration of the BO framework reduced the average residual from 0.601 to 0.153. Such strong reduction was followed by a plateau that lasted for about six iterations. During this plateau, the solver was being trained with profiles below the ion temperature critical gradient. From iterations 12 through 14, oscillatory behavior was observed, as the solver moved to the unstable region of the flux-gradient curves. Due to the very stiff nature of turbulence in the core of SPARC, small variations in the gradients led to very large variations in fluxes. From this point on, the fusion gain remained roughly unchanged, as the variations in gradients to reduce residual are very small (stiff plasma condition).

We note that the transport fluxes from the original baseline profiles (red curves in figures 4(g)–(i)) and final profiles (green curves) differ substantially, as the original profiles had very unstable turbulence at the edge and roughly stable at mid-radius. This observation suggests that the original profiles do not need to be close in flux-space to the final solution, but further work on this topic is needed to draw definite conclusions. Likewise, investigation of the optimal number of initial training samples for faster convergence is currently under investigation. This work employed five CGYRO simulations for the initial training set. However, convergence tests using the TGLF model suggest that choosing between 7 to 12 initial training samples may result in an even lower number of total calls to the code, but more work is needed to draw definite conclusions and understand whether this observation applies to CGYRO as well.

5.1. Convergence in stiff plasmas

The critical gradient and stiff transport nature of plasmas often complicates convergence of flux-matching solvers. A very small variation of gradients can lead to large variations in predicted fluxes, and thus reducing the residual to very small levels often requires unnecessary long simulations if the purpose is predicting performance. As shown in figure 4, fusion gain (strong function of density and temperature) oscillated only around $\sim 5\%$ of the final value for the last five iterations, even though the residual changed by over 100%.

For this reason, as a stopping criterion for our simulations, we not only accounted for a low enough average residual, but we also developed a metric based on the stiffness of transport:

$\begin{equation}{\Delta}a/{L}_{X}^{\ast }\equiv \vert {\mathbf{F}}^{\text{tr}}({\mathbf{x}}_{\text{i}})-{\mathbf{F}}^{\text{target}}({\mathbf{x}}_{\text{i}})\vert \cdot {\left[\frac{\partial {\mathbf{F}}^{\text{tr}}}{\partial a/{L}_{X}}\right]}_{({\mathbf{x}}_{\text{i}})}^{-1},\end{equation} \tag{ 10 }$

evaluated at each iteration optimum point, x_i, after model training. Here, a/L_X refers to each of the driving gradients ( $a/{L}_{{T}_{\text{e}}}$ , $a/{L}_{{T}_{\text{i}}}$ and $a/{L}_{{n}_{\text{e}}}$ ) for each channel.

In short, equation (10) refers to the average relative change in normalized gradients that is predicted to provide flux matching by a linearized diagonal version of the transport model. This metric is valid only if the change in gradient is small enough, which is the case toward the end of the flux-matching process. By the end, as shown in figure 4(l), the average of this metric for all channels and radii is 3.4% and it was reached at profile #17. An extra profile (#18) was evaluated to demonstrate that the fusion gain remains unchanged. Unfortunately, this extra evaluation did not reduce the residual nor the convergence metric. For pragmatic reasons, we decided to stop the flux-matching process, as we did not consider that further reduction of the residual would lead to any meaningful change to the profiles, as demonstrated in figure 5, and even less to the fusion performance prediction.

**Figure 5.** Kinetic profiles, normalized gradients and predicted fluxes for the last four iterations, underlining the dramatic differences in transport fluxes with minimal changes (almost negligible to the naked eye) to the kinetic profiles, feature of stiff and near-marginal turbulence simulations.
Download figure:
Standard image High-resolution image

Although in our current research, we strive to have near-exact flux matches, if one wants to use this technique to project fusion power, confinement time and plasma performance in conditions of stiff transport such as the one considered here, the convergence criteria can be relaxed. Figure 5 clearly highlights the stiff conditions in SPARC PRD plasma. Minimal changes in the gradients for the last few iterations of the present framework led to significant changes in the fluxes that oscillated around the targets, even reaching stable conditions at some radii (e.g. profile #15). The profiles, however, barely changed and therefore the fusion gain remained the same.

5.2. Comparison with empirical and reduced-model predictions

Figure 6 compares the prediction from CGYRO to that of different TGLF saturation rules. Although the prediction of ion temperature with SAT2 is in remarkably good agreement with CGYRO, the electron energy transport is overpredicted for the inner half of the plasma minor radius. All rest of saturation rules predicted lower levels of ion energy transport in the near edge. However, the strongest and most important difference between CGYRO and TGLF predictions is in the electron density profile. All saturation rules resulted in predictions of density profiles that are less peaked in the plasma core than with CGYRO, consequence of the overall lower gradients at around mid-radius. As a result of the very different density peakings, the fusion gain widely varies from one prediction to another, despite having similar levels of the global energy confinement factor, H_98,y2. However, it is encouraging that the first-principles prediction with CGYRO yields a fusion gain of Q ≈ 8, greatly surpassing SPARC's core mission (Q > 2), even ignoring the favorable effect of intrinsic toroidal rotation in SPARC [45, 72], possible stabilization effects of fast alpha populations [73], and stabilization of ITG potentially provided by simulation of δB_|| perturbations [74]. The evaluation of the effect of rotation, fast ions, and the stabilization of ITG due to compressional magnetic perturbations (δB_||) in the prediction of SPARC scenarios is left for future work.

Even though POPCON, TGLF and CGYRO profiles all feature similar quality of energy confinement (H_98,y2 = 0.95–1.05), the predictions of fusion gain can differ by a factor of 2. We expect that such differences will be even larger when simulating reactor plasmas that operate somewhat near ignition conditions. This is due to two important elements: (1) density peaking, and (2) temperature gradient scale length profile shape. This work suggests that predicting or extrapolating energy confinement (or pedestal height) alone is not enough for reliable predictions of fusion power, and calls for the need to use first-principles core transport simulations or to develop reduced models around the region of the parameter space of interest for a reactor study, as well as extensive experimental validation studies of profile predictions in present devices.

The new prediction of density peaking factor (evaluated at ρ_ψ = 0.2, where ρ_ψ is the square root of the normalized poloidal flux, following the nomenclature of references [75, 77]) with CGYRO is closer to the experimental observations in present devices than any of the TGLF predictions, as depicted and described in figure 7. A database of JET and ASDEX Upgrade experimental profiles was used to derive an empirical scaling for the density peaking based on the effective collisionality, normalized pressure and a core particle source parameter [75]. Inclusion of Alcator C-Mod data, which had no neutral beam injections, was demonstrated to also be captured reasonably well by the same empirical scaling [76]. Evaluation of SPARC parameters yields ν_ne ≡ n_e(ρ_ψ = 0.2)/⟨n_e⟩ ≈ 1.48. CGYRO predictions produced a density profile with ν_ne ≈ 1.45, which is in remarkable agreement with the empirical scaling. Although not conclusive, the improved agreement with empirical density peaking results provides additional confidence that the SPARC PRD profiles predicted in this work are representative of those expected during SPARC operation.

**Figure 7.** (left) Density peaking factor, ν_ne = n_e(ρ_ψ = 0.2)/⟨n_e⟩, as a function of effective collisionality and neutral beam injection source parameter, for a database of plasma discharges in JET, ASDEX Upgrade and Alcator C-Mod. Experimental data from [75, 76]. Also plotted are the peaking factor predicted from the empirical scaling [75] for SPARC parameters, the peaking obtained from `CGYRO`, and the range of peakings from different saturation rules in `TGLF`. (Middle) Subset of data corresponding to source-free plasma cores, i.e. without neutral beam particle sources. (Right) Experiment versus prediction using the scaling proposed in reference [75].
Download figure:
Standard image High-resolution image

6. Discussion

This paper has presented a new scale-bridging technique to find steady-state solutions to the plasma transport equations with very high-fidelity non-linear gyrokinetic fluxes. Standard local, gradient-driven, δf transport codes can be used to predict temperature and density profiles without modifications to the code and with a reduced number of simulations. The use of machine learning and optimization techniques have demonstrated great advantages in aiding the convergence process, and the quality of the final root is not compromised as the surrogates are only used during convergence. Even though this study focused entirely on core transport, it represents a first step toward routine performance predictions of fusion devices, including burning plasma experiments and pilot plants. Extension of this work could include edge and pedestal modeling, self-consistent with the core solutions, which is a critical step toward whole-device integrated simulations [18, 78].

Convergence was achieved in three transport channels and evolving energy sources and sinks (alpha heating, radiation and energy exchange) with only 72 total calls (18 per radial location) to the nonlinear CGYRO code. This corresponds to approximately a 3× speed-up compared to standard Newton methods found in the literature⁵ . We note, for the TGYRO-savvy reader, that this convergence rate corresponds to less than five iterations in TGYRO, given that the calculation of the finite-differences Jacobian requires (1 + n_p) flux evaluations per radius, where n_p is the number of profiles evolved.

The total computational cost of the benchmarking exercise explored here was approximately 8 M CPU-hours performed on the NERSC Cori KNL partition. This high cost is due to the first-of-a-kind nature of the problem presented here and the extreme high fidelity of the gyrokinetic simulations. We expect that the computational cost could be reduced by a factor of 2 or 3 when more efficient techniques are implemented, such as early stopping, physics-informed non-stationary co-variance kernels and the use of accurate gyrokinetic assumptions (e.g. grouping impurity species).

It is important to note that the turbulent transport model used to find the flux-matching solution is treated entirely as a black-box system, which allows to extend this framework to any transport model or first-principles simulation (e.g. TGLF [9], QuaLiKiz [10], GENE [79]) as long as a set of independent input parameters can be identified to fit the surrogate models. The treatment of the gyrokinetic fluxes as being drawn from a random process and modeled properly within the GP framework prevents convergence problems that may arise from the numerical computation of derivatives in Newton solvers and in direct numerical simulations due to noisy realizations of the fluxes [31, 32, 80] or irregularities in the turbulence behavior [22]. The non-parametric surrogate modeling employed here also presents advantages in dealing with critical gradient behavior and stiff transport regimes.

Gyrokinetic turbulence has been modeled here by means of a series of local flux-tube simulations, in which the gradients of the background profiles are assumed constant throughout the local simulation domain until saturation at each macroscopic time-step is achieved. This choice is appropriate for the core of tokamak plasmas that do not exhibit internal transport barriers, with a monotonic q-profile and where nonlocal effects (e.g. turbulence spreading) play a negligible role in determining the transport state of the plasma. In circumstances other than this, global turbulence simulations may be required, such as those performed in reference [32] with the GENE [79] code. The framework presented here is adaptable to a global simulation setting, although the variables used to fit the surrogate models would be more numerous, as the fluxes at one location could be directly affected by parameters at a distant location. Furthermore, the choice of four flux tubes to simulate turbulence in SPARC was also motivated by the expected smoothness in the gradient profile and the use of distributed heat sources (vs possible localized sources in other scenarios). Scenarios with similar features in Alcator C-Mod have been studied, and the authors estimate that the effect of the coarse radial grid used here is minimal to reproduce performance. However, more work is needed to determine the higher level of accuracy introduced by the higher number of flux tubes, at the expense of more computational cost. This will be a focus of future work.

The workflow presented here is capable of providing steady-state solutions of transport equations with arbitrary fidelity. The use of extreme-accuracy core transport simulations with fully nonlinear gyrokinetic codes (as the one presented here) can become a routine exercise in experimental transport research and can greatly contribute to the validation of reduced models of turbulence and transport. However, even with the accelerated method presented in this paper, the very high computational cost of each standalone gyrokinetic simulation still remains a bottleneck for their direct use in full-device, time-dependent or control-oriented applications. In such situations, reduced models such as TGLF [9] and QuaLiKiz [10], and their surrogate versions [22, 81–83], are still needed, and the method presented here can contribute to their validation and development.

This work has demonstrated, through extreme high-fidelity, first-principles core-plasma modeling, that the SPARC tokamak will be capable of attaining Q ≈ 8 in its PRD scenario, which gives further confidence in its design and the potential of high-field devices to achieve high-performance fusion cores at a compact size.

Acknowledgments

The authors thank M. Greenwald particularly for the experimental data used in figure 7, and generally for the invaluable mentoring support and leadership role in defining robust physics basis for the SPARC tokamak. We also thank E. Belli and G. Staebler for the development of CGYRO and TGLF, respectively. This research used resources of the National Energy Research Scientific Computing Center (NERSC), a US Department of Energy Office of Science User Facility located at Lawrence Berkeley National Laboratory, operated under Contract No. DE-AC02–05CH11231. This work was funded by Commonwealth Fusion Systems under RPP005. Work by J. Candy was supported by the US Department of Energy, Office of Science, through the AToM project under Grant DE-SC0017992.

Nonlinear gyrokinetic predictions of SPARC burning plasma profiles enabled by surrogate modeling

Article metrics

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Abstract

1. Introduction

2. Background