Enhancing predictive capabilities in fusion burning plasmas through surrogate-based optimization in core transport solvers

This work presents the PORTALS framework, which leverages surrogate modeling and optimization techniques to enable the prediction of core plasma profiles and performance with nonlinear gyrokinetic simulations at significantly reduced cost, with no loss of accuracy. The efficiency of PORTALS is benchmarked against standard methods, and its full potential is demonstrated on a unique, simultaneous 5-channel (electron temperature, ion temperature, electron density, impurity density and angular rotation) prediction of steady-state profiles in a DIII-D ITER Similar Shape plasma with GPU-accelerated, nonlinear CGYRO. This paper also provides general guidelines for accurate performance predictions in burning plasmas and the impact of transport modeling in fusion pilot plants studies.


Introduction
As we approach the operation of magnetic confinement burning-plasma experiments [3,4] and as reactor design studies are becoming widespread in the fusion energy community [5][6][7][8], the need for reliable, fast and accurate physics models of the confined plasma becomes essential to realize economically-attractive fusion energy.Particularly, accurate physics models for the transport of energy, particles and momentum in the confined plasma are exceptionally important.Fusion power production and reactor efficiency strongly depend on the core pressure gradients that are attained within the operational space of the fusion device.These gradients in kinetic profiles are determined by a balance of the energy, particle and torque input and turbulent and collisional transport processes.
The nonlinear gyrokinetic framework for turbulent transport [9,10] is the gold standard for a rigorous description of micro-turbulence in the plasma core.Gyrokinetic theory uses the ordering parameter δ .= ρ i /a (where ρ i is the thermal ion gyroradius and a is the plasma minor radius) to average over the ion gyromotion thereby reducing the kinetic description from 6D to 5D.However, nonlinear gyrokinetic simulations that attempt to evolve the full distribution function, without further approximation, are computationally impractical for predictive modeling applications.By a further time and space-scale ordering in powers of δ between turbulence and macroscopic evolution, one can separate the local (δf ) gyrokinetic equations from the global transport equations [11].Importantly, it is only in this limit that gyrokinetic and neoclassical fluxes are themselves separable [12].The community has developed sophisticated tools based on this spatio-temporal ordering.The gyrokinetic fluxes can be calculated with nonlinear time-dependent codes such as CGYRO [2], GENE [13], GKW [14] or GX [15], or with quasilinear models like TGLF [16] or QuaLiKiz [17].Similarly, the neoclassical fluxes can be computed with kinetic codes like NEO [18], or moment-based models like NCLASS [19].
The local neoclassical and gyrokinetic calculations are then embedded into global transport solvers that compute self-consistent particle (fueling), momentum (torque) and energy (heating) sources to be balanced against the neoclassical and gyrokinetic losses, iterating until steady-state conditions are found.Even with the many simplifications provided by this embedded approach, the stiff behavior of turbulence results in the requirement of running many transport iterations to achieve steady-state (flux-matching) conditions.While it is hard to determine a priori how many flux-tube δf simulations are required to simulate a given plasma condition, it is often the case that hundreds or even thousands of evaluations per radial location are required.This number varies from one case to another, but generally it increases with the number of channels simulated (e.g.simultaneous prediction of temperatures and densities is more expensive than only temperatures prediction), the evolution of targets (e.g.self-consistent alpha heating introduces a nonlinearity) and the stiffness, discontinuities and non-monotonicity of the transport model.In burning plasmas where heating is dominated by the profiles themselves via alpha heating, where density peaking comes mostly from transport processes and not sources, and with large gyro-Bohm units, the convergence of transport solvers becomes exceptionally difficult.Beginning with TGYRO-GYRO [20] and TRINITY-GS2 [21] whose pioneering studies set the foundation for work in this area, efforts in the community are ongoing to develop advanced transport solvers (including TANGO-GENE [22] and TRINITY-GX [15]) with computationally efficient transport models that capture the physics of gyrokinetic turbulence brought to macroscopic steady-state conditions.The work presented here is complementary to these efforts, as it focuses on the acceleration of the convergence of transport solvers, rather than the development of the transport models themselves that are efficient and applicable to a wide range of plasmas (e.g. for global gyrokinetics and 3D geometry).The tokamak transport modeling community has also recently begun to employ machine learning-accelerated and data-driven models for transport predictions [23][24][25][26][27].These efforts are primarily aimed at developing high-dimensional surrogate models to simulate transport fluxes and integrating them into standard transport solvers.
This paper presents the paradigm of re-defining the flux-matching problem encountered in transport solvers as a surrogate-based optimization problem, and presents, to our knowledge, the most comprehensive prediction of core kinetic profiles in a tokamak: a 5-channel, nonlinear flux-matched gyrokinetic prediction with self-consistent energy exchange and radiation.Section 2 introduces the flux-matching formulation for steady-state prediction of core profiles and Section 3 presents the paradigm of re-writing the system as a physicsguided surrogate-based optimization problem (PORTALS framework).Section 4 discusses guidelines for accurate profile predictions and Section 5 presents a 5channel nonlinear gyrokinetic prediction enabled by the surrogate formulation at reduced cost, with PORTALS-CGYRO.In Sections 6 and 7, general discussion and conclusions are presented.

Background
Core transport solvers in tokamak geometry aim at solving the fluid-like flux-surface averaged (FSA) 1D transport equations [28], where kinetic or gyrokinetic effects are only included externally to the code, usually in the form of black-box modules for theory-based or first-principles models of turbulence and neoclassical transport.
When employing high-fidelity physics models for the nonlinear transport flux calculations, one encounters a non-analytic dependence on background profiles and, particularly, on their gradients.This is due to the drift-wave nature of electromagnetic tokamak turbulence, which is driven unstable by pressure gradients with complex sensitivity to collisions and background plasma conditions.This results in the need to develop robust numerical techniques that reduce oscillations, ensure convergence and minimize simulation errors [29,30], otherwise standard numerical schemes to solve parabolic partial differential equations would require extremely small time steps of integration and would be exceptionally time consuming.Even though the time step required to ensure convergence of transport solvers has been reduced by orders of magnitude thanks to the implementation of numerical diffusivities [30] (e.g. in ASTRA [31]) and internal Newton iterations [29] (e.g. in TRANSP [32]), calls to the transport model are still in the order of a hundred per radial location, making the use of first-principles turbulence simulations unfeasible, at least for routine analysis and study of turbulence physics.Furthermore, the requirement to include grid-scale numerical diffusion in these solvers requires a careful analysis of convergence, robustness and stability for each application.
When the time dynamics is not important or when the goal of the simulation exercise is to produce a steady-state solution, time independent transport solvers are capable of predicting core kinetic profiles with lower computational expense.This is achieved by re-defining the system of equations as an inverse problem [20], where the kinetic profiles are the free parameters and the difference between transport fluxes and target fluxes is minimized below some convergence criterion.The system of equations that is often solved in time independent solvers is obtained by integrating over the interior of each flux surface (see Fig where r is the half-width of the mid-plane intercept, ⟨•⟩ indicates flux-surface averaging, and V ′ = ∂V /∂r where V is the flux surface volume.S e represents particle sources, S ω represents momentum torque input and the P X represent the different components (sources and sinks) of energy in the plasma.Eq. 1 clearly shows how coupled the system of FSA equations is.Every transport flux component (Γ e , q e , q i , Π, Γ Z ), depends on the state of background turbulence, which is in turn determined by each of the kinetic profiles under consideration.Furthermore, target fluxes such as energy exchange (⟨P ei ⟩), radiation (⟨P rad ⟩) and alpha power (⟨P α,e ⟩, ⟨P α,i ⟩) strongly depend on the profiles themselves.Here it is assumed that the momentum flux, ⟨Π • ∇r⟩, includes the residual stress, and that there are not intrinsic sources of the impurity of interest (hence the null-flux condition).In the reminder of this paper, auxiliary heating power delivered to each species is assumed to be fixed throughout the simulations and therefore must be obtained by external means, such as interpretive simulations with TRANSP or ASTRA, and assumed constant with the variations of the kinetic profiles from initial to steady-state conditions.This is a current limitation of the framework, but it is expected to be addressed in future work.The rest of target flux components are calculated self-consistently with the kinetic profiles.Details on these calculations are available in Ref. [1].
The process to bring an initial set of kinetic profiles to steady-state conditions when external heating, fueling and torque sources are imposed, requires a number of local, δf turbulence simulations to provide the transport fluxes at each iteration.Both in time dependent (with a criterion that accounts for the stationarity of the predicted quantities) and time independent (with a criterion based on the closeness between transport and target fluxes) solvers, the information of previous simulations is often not used, including the local Jacobian at each iteration.The paradigm presented here proposes that a global surrogate model for each of the transport fluxes is constructed and refined during the convergence process.All the simulations, including the ones far from convergence, are utilized to train the surrogate model and build the dependencies of transport fluxes with respect to the free parameters of the problem.

Flux-matching as a surrogate-based optimization problem
The set of Equations 1 can be generalized and written as a multi-residual minimization problem: R j,c = F tr j,c (y j,∀c , ∇y j,∀c ) − (y r≤r j ,∀c ,∇y r≤r j ,∀c ) (y j,∀c , ∇y j,∀c )V ′ (r)dr where F tr j,c is representative of the transport flux of channel c at radial location r j , which is a function of the local values of all the channels, y j,∀c = {n e (r j ), T e (r j ), T i (r j ), ω 0 (r j ), n Z (r j )}, and their local gradients, ∇y j,c ≡ ∂yc ∂r | rj .Similarly, f target j,c represents the target flux density for channel c at radial location r j , which is a function of the local values of all the channels.For generalization purposes, we also assume that the target density can be a function of local gradients, for example in the case of turbulent energy exchange [33].It is important to note that the target fluxes F target j,c at flux surface r j are non-local due to volume integration, but the transport fluxes F tr j,c are fully defined locally, a consequence of the local approximation of turbulence.The number of radial points chosen to represent the logarithmic gradient profiles (piece-wise linear functions as illustrated in Fig. 1) is N ρ , the number of channels to simulate simultaneously and self-consistently is N c , and the number of model evaluations is N m .
Equations 2 represent a minimization problem of the scalarized, multichannel residual, ξ, with each channel profile, y j,c , free to evolve.The set of radial profiles that minimize the residual, y * c (r), defines the steady-state plasma with self-consistent transport and targets and represents the solution to the problem.
With a fixed boundary condition for each channel, y b,c = y c (r = r b ), the gradients and the local values of the profiles are linked via radial integration.Under the framework of tokamak FSA transport equations, it is convenient to define the normalized logarithmic gradients, z c = a/L yc = − a yc ∂yc ∂r , as the free parameters of the problem.This way, the logarithmic gradients (for each channel c and radial location j) uniquely define the profiles [20] and the use of a normalized flux surface label, ρ = r/a, can be introduced readily: (z r≤r j ,∀c ,y r≤r j ,∀c ) As it will be described in Section 3.2, standard numerical techniques such as Newton methods to solve this problem do not scale well for high N c (number of channels), as the local Jacobian must be estimated from finite differences methods and require several transport model evaluations per iteration, even in the case of assuming a block-diagonal Jacobian.Leveraging advances in machine learning and statistics, here we adapt a standard Bayesian optimization workflow to solve for the flux-matching of multi-channel, steady-state transport as formulated in Equations 3. The evaluation of the residual, ξ, requires the evaluation of turbulence and neoclassical transport fluxes and the target flux densities at each radial location.When first-principles, nonlinear gyrokinetics is used, each residual evaluation (Eq.3a) requires N ρ simulations brought to saturation, which can quickly become computationally expensive if many  [34], with magnetic axis in dashed black line, 5 selected flux surfaces in blue (cross section in purple) and last closed flux surface in green.Note that the lower x-point region is smoothed out due to Fourier-moments decomposition typical of transport solvers such as TRANSP [32].Subplots b), d) and f) depict the piece-wise linear function representations of the normalized logarithmic gradient of electron temperature, ion temperature and electron density with the value at 5 selected flux surfaces (values to be predicted by the transport solver).Subplots c), e) and g) depict the corresponding kinetic profiles of each channel, produced by radial integration of logarithmic gradient from an edge boundary condition (dark blue).
evaluations N m are required to solve Eq. 3d.This is when Bayesian optimization techniques can help reduce the computational cost of the flux-matching problem.

Bayesian Optimization
Instead of starting from a single initial condition, in Bayesian optimization, an initial set of training data (model evaluations) is used to fit a probabilistic surrogate model1 for the metric to minimize, usually a Gaussian process (GP) model.In the context of flux-matching solvers, this initial training phase consists of running a set of local, δf turbulence simulations with variations in the free parameters (logarithmic gradients, z c , but with consistent changes in the profile values, y c ).The information from these local simulations is used to fit GP models to reproduce the transport fluxes v.s.free parameters behavior, and to reconstruct a model for the residual, ξ.
The GP model, trained on all thus-far observed data points, provides a prediction for the posterior distribution of the residual, and is used to find the optimal points to evaluate next with the expensive turbulence simulations.This phase requires the definition and optimization of an acquisition function, i.e. a function that quantifies the value of evaluating the costly model at any given point in the domain for optimizing the outcome (in this context the residual).The work presented here makes use of the GPyTorch [35] and BoTorch [36] packages for GP modeling and Bayesian optimization, respectively.The reader interested in the details on Bayesian optimization or GP theory and implementation is referred to Refs.[35][36][37][38][39].
The implementation of the flux-matching problem of δf transport models as a surrogate-based optimization problem to leverage Bayesian optimization methods is done within the PORTALS2 framework [1] and the rest of this paper describes implementation details that were needed to achieve a significant speed-up with respect to standard methods, which enable some of the highest fidelity predictions of core kinetic profiles performed to date.
For clarification, we must note that the results obtained with PORTALS are not results of surrogate modeling of turbulence and transport, but of direct turbulence modeling (either with quasilinear gyrofluid TGLF or nonlinear gyrokinetic CGYRO for the cases shown in this paper).Surrogates are simply used to accelerate multi-channel convergence, but the steady-state predictions of profiles are a full-physics result and must be interpreted as such.

Implementation of domain-information
Instead of using the full residual, ξ, as both the objective function (Eq.3d) and the surrogate output, here we choose the divide ξ in each component (channel and radius) as defined in Eq. 3a to construct a composite surrogate model.This ensures that each transport surrogate is only fit to the corresponding local input parameters, significantly reducing the dimensionality of the problem.While ξ is a nonlinear function of N c • N ρ variables, each transport flux F tr j,c (z j,∀c , y j,∀c ) is only a function of 2 • N c variables (local gradients z j,∀c and local profile values y j,∀c ).Therefore, constructing individual surrogates for each flux component is very advantageous to minimize the number of iterations to achieve accurate enough surrogate models within the Bayesian optimization workflow.This is essentially an application of composite Bayesian Optimization [40].

Input space transformation
Each of the transport fluxes includes neoclassical and turbulent fluxes, F tr j,c = F neocl.j,c + F turb. j,c , which are evaluated in separated simulations, and also fitted by separate surrogate models.By using index m to represent the turbulent (F turb. ) or neoclassical (F neocl.) fluxes, the aim of the surrogates is to reproduce the 2 • N c dimensional functions (local in space, j, but with multi-channel dependencies, ∀c): While the normalized logarithmic gradients z j,c are suitable as direct input parameters to reproduce transport fluxes, the profile values y j,∀c = {n e (r j ), T e (r j ), T i (r j ), ω 0 (r j ), n Z (r j )} can be transformed into variables that have a more distinct effect on turbulent and neoclassical transport.Defining the problem with y j,∀c = { ν ei (r j ), T i /T e (r j ), β e (r j ), ω 0 /c s (r j ), n Z /n e (r j )}, while still retaining a complete set, allows the surrogates to be directly fitted to quantities of interest and of direct effect to the turbulence dynamics, which aids in the training of surrogate models.Here, c s = f (T e ) is the ion sound speed, ν ei = f (T e , n e ) represents the electron-ion collision frequency normalized to the ion sound speed, and β e = f (T e , n e ) is the magnetic-field normalized electron pressure.
We note that in the current implementation, separate surrogate models for turbulent and neoclassical fluxes are used, but the framework is flexible enough to include a single model for both, if the user is interested in a more general approach.In cases where the neoclassical fluxes are of high fidelity (e.g. with the NEO code [41]), even if much cheaper than nonlinear gyrokinetic calculations, it is important to build surrogate models to their flux response.This is because their evaluation (unless analytical) will be much more expensive than a GP evaluation, and automatic differentiation would be, generally, not available; making the acquisition function optimization (the flux-matching process per se) significantly more expensive.
The rotation, which can go through zero throughout the radial profile, requires a special treatment, as its gradient cannot properly be defined as a normalized logarithmic gradient.Here, we use the radial gradient normalized to the ion sound speed over minor radius: z ω0,j = − a cs dω0 dr .This factor is proportional to the E×B shearing rate, γ E×B , and will affect all other channels via E × B shear stabilization of the background turbulence.
We note that the varying input parameters to the transport model are not limited to those in the basis, but any other variable can be derived from some combination of the fixed problem parameters and the basis parameters, such as the total pressure gradient.However, some complex dependencies of fluxes with respect to input parameters can appear if one is not careful about synergies with fixed parameter and gyrokinetic model assumptions.For instance, in a case where a/L ne is a free parameter in a plasma with two ions, if a/L ni is not varied self-consistently, each local gyrokinetic simulation will not be quasineutral.Similarly, if quasineutrality is achieved by compensating with one of the ions and not the other, the differing ion density gradients will affect background turbulence in a non-trivial way, making it challenging for the surrogates to describe the behavior with few training points.In this work, we make the following choices: 1) the thermal ion species always have the same temperature as the main ions and the same logarithmic density gradient for non-trace ions, and 2) variations of the electron density induce an equal change in the densities of the non-trace ions by keeping the same concentrations3 .

Outcome space transformation
Similarly to the input transformations, the outputs of the transport simulations are more properly fitted if transformed to gyro-Bohm units: G(T e,j , n e,j ) where G(T e,j , n e,j ) is the gyro-Bohm normalization factor for each transport channel: where ρ s is the ion gyroradius evaluated with the ion sound speed, c s .We note that this outcome transformation depends on the input parameters and therefore it is not fixed for all evaluations.In this framework, the residual will be defined in real units and not gyro-Bohm normalized, which eases the convergence process when core and edge have wide variations in temperature or density (e.g. in the case of L-mode predictions from the edge to the core).Therefore, the input and outcome transformations are both applied prior to model training and during the evaluation of the surrogate posterior, by leveraging BoTorch's transformation methods, botorch.models.transforms.After the transformation to the proper physics quantities for training (different set per surrogate model), we apply normalization (to the unit cube) and standardization (zero mean, unit variance) of inputs and outputs respectively, which eases the model hyperparameter training with BoTorch standard priors.
Acknowledging the different units for each transport channel, for the definition of the scalar residual ξ from Eq. 3c we make a units transformation for the particle transport residuals.The electron particle flux residual enters in Eq. 3c as a convective flux component, Γ e = 5  2 T e Γ e , where the factor 5 2 is chosen following common approaches in transport modeling.Impurity transport flux is also converted into a convective flux, but it is additionally divided by the original impurity density concentration to avoid dramatically different channel residuals.These choices are ad-hoc, but provide residual definitions such that each local, channel components have similar magnitudes.
Note that since the total residual ξ in (3c) is a nonlinear function of the individual residuals R j,c , its posterior distribution under the surrogate model is non-Gaussian.We therefore employ a Monte-Carlo based objective function to transform the posterior distributions of each local channel residual into the scalarized ξ total residual.By leveraging BoTorch's sample-average approximation approach and PyTorch's auto-differentiation capabilities, we are able to back-propagate gradients through the acquisition function, objective, GP model, and transforms all the way back to the inputs, which allows us to effectively optimize the acquisition function to determine the next simulation parameters for the expensive model.

Simple relaxation as initialization method
Typically, Bayesian optimization workflows start with a random set of points.Here, we leverage the fact that turbulence (dominant over neoclassical transport) is strongly affected by the normalized logarithmic gradients when written in normalized units (see Sec. 3.1.2),and therefore it makes sense to ensure that gradients are properly sampled during training and that the profiles to evaluate are realistic.
Instead of random training (employed in the first PORTALS publications [1,34]), we utilize the derivative-free simple relaxation (SR) method also implemented in TGYRO [20].The SR technique consists of neglecting crosschannel, cross-radius interactions and assuming that the local transport matrix is diagonal, with positive diffusion coefficients built from the relative difference between target and transport fluxes at each iteration.Formally, the SR iteration scheme can be written as: where η j,c is an ad-hoc parameter that determines the relative step in normalized logarithmic gradients, z j,c , to move the system towards reducing the difference between transport, F tr j,c , and target, F target j,c , fluxes.When implementing Eq. 7 in a code such as TGYRO, it can also be helpful to impose a maximum value of δz j,c = z j,c on a given iteration, which aids in smoothing the trajectory to convergence, particularly for early evaluations where there can be large mismatches in the target and model fluxes.
Performance tests were performed with the quasilinear Trapped Gyro-Landau Fluid (TGLF) model [16].Our tests found that generating initial training samples with the SR method (with a constant value of η j,c = 0.2) was slightly more efficient in building accurate surrogates than random initialization (e.g. with Latin Hypercube sampling).Two examples of SR vs random initialization performance are shown in Fig. 2. SR is particularly efficient at reducing the residual for the first 15 evaluations, saving a few extra evaluations compared to random initialization.Generally, the benefits of SR vanish as more samples are added to the training database, as expected.
While the effect is not major in the number of total evaluations required, the SR method ensures realistic profiles, where big radial variations in logarithmic gradients or unrealistic de-coupling of the profiles (e.g., electrons and ions de-coupled in a way not consistent with collisional equilibration) are not allowed from a macroscopic transport point of view.This eases the initial-value δf simulations, which can otherwise be difficult and expensive to saturate if random choices of gradients are requested to be evaluated by PORTALS, as some variations inevitably lead to unrealistically over-driven or completely stabilized turbulence.
Apart from using SR for initialization of the first 5 profiles, we also make an adjustment to the dimensionality of the first surrogates fitted.Although the transport models, F m j,c (z j,∀c , y j,∀c ), require both logarithmic gradients, z j,∀c , and profile values, y j,∀c , to be fully defined, for the first 5 Bayesian optimization iterations we only fit models to the logarithmic gradients, F m j,c (z j,∀c ).Although it only has a small effect, this technique has also been observed to help convergence.

Physics-informed GP and acquisition optimization
In the current implementation of PORTALS, we utilize a Radial Basis Function (RBF) covariance kernel for the Gaussian processes of turbulent and neoclassical fluxes.We found that the use of a linear mean for gradient quantities, z j,∀c , and constant mean for profile quantities, y j,∀c , works well for a few test cases performed so far.However, a proper characterization of the benefit of this approach v.s. the same polynomial degree for all input parameters has not been performed yet in a general manner and is subject of future work.This choice is motivated by the physics understanding that logarithmic gradients most strongly affect the background turbulence and profile quantities provide smaller corrections.This is expected to be particularly true as the plasma reaches steady-state, a situation where profile quantities will barely change and turbulence will be above critical-gradient, thus a higher-order polynomial makes sense for z j,∀c .Future work will further utilize physics information to build better surrogates, to include constraints such as F m j,c (z j,c = 0) = 0 (zero flux at zero gradient) or ∂zj,c > 0 (positive diagonal of transport matrix).The addition of such constraints could help build better surrogate models that can accurately capture the system's behavior with lower number of evaluations.
In this work, we leverage the posterior mean of the predicted distribution of the residual ξ to decide the next points to evaluate, which can be readily implemented with multi-dimensional root-finding methods (described below).In practice, other acquisition functions such as Expected Improvement (EI) [42,43], Upper Confidence Bound (UCB), or information-based variants such as entropy search [44] are often used in the Bayesian optimization literature, as they balance exploration (sampling uncertain regions) and exploitation (sampling promising regions) more effectively.We will investigate the effect of different acquisition function choices in PORTALS, including any generalizations of the existing acquisition functions to the multi-residual structure of Eq. ( 3), as part of our future work.However, as noted in Section 3.1.2,as the total residual ξ is a non-linear function of the individually modeled residuals, the full posterior distribution of these residuals is used in computing our target objective and thus the probabilistic nature of our surrogate models is crucial to our approach.
To optimize the acquisition function (i.e.solving for flux-matching in the mean of the surrogate posterior distribution), we employ a sequential combination of three numerical methods.Firstly, we take advantage of the decomposition of the total residual, ξ, into (N c • N ρ ) individual local channel residuals.For this, we use the multi-dimensional root-finding methods available in SciPy [45].Specifically, the Levenberg-Marquardt (LM) method [46,47] has shown superior performance for this particular problem.The exact Jacobian of the surrogate models F m j,c is provided to the LM method, as facilitated by automatic differentiation in PyTorch [48], encompassing all transformations outlined in Sections 3.1.1and 3.1.2as well as the residual reconstruction in Eqs. 3. We must note that this first optimization method operates directly using the mean of the prediction of each surrogate flux evaluation, and therefore does not take advantage of the Monte-Carlo based objective transformation that would properly capture non-Gaussian properties of the nonlinear Equation (3c).This would be required when uncertainty estimations from the Gaussian processes are taken into consideration for more advanced acquisition functions.This method was still kept in PORTALS as part of the historical familiarity of steady-state, flux-matching frameworks as multi-variable root-finding problems.Future work will be dedicated to understanding to what degree the decomposition of the total residual into each channel, radial flux components can be leveraged for acquisition optimization when moving away from the simplistic posterior-mean acquisition.
Secondly, in cases where the multi-variate LM method fails to achieve a satisfactory level of residual reduction, we resort to leveraging multi-start, scalarized-residual optimization in BoTorch.A heuristic generation of initialization points is used from which to start the L-BFGS-B local optimization algorithm [49].Lastly, we utilize a genetic algorithm from the DEAP package [50], to check for additional, possibly better optima that might have been missed by the local search algorithms of the previous steps.

Impurity particle transport
As written in Eq. 1, the impurity density (n Z ) evolution is assumed to be source free, and therefore the workflow aims at solving for the null-flux condition, ⟨Γ Z • ∇r⟩ = 0. Impurity density peaking in such situations happens by means of a convective pinch that arises from neoclassical and turbulent transport and that results in a detrimental inward flux of impurities to the plasma core.Formulating the impurity density flux with a diffusion-convection ansatz, summing over charge states and ignoring the effect of charge on core transport 4 : Under the assumption of trace impurity, one can divide by n Z and arrive to an expression for a "modified" impurity density flux that is only a function of the logarithmic gradient of the impurity density and not directly of its concentration: As n Z is always non-zero and positive, the null flux condition is equivalent for the "modified" impurity density flux: As a consequence of Eq. 10, we can reduce the dimensionality of the surrogate models for the impurity particle transport fluxes, as n Z is not required to fully describe Γ m Z in null-flux conditions.It is also important to note that under the trace impurity assumption, neither n Z nor a/L n Z should affect the background turbulence state, and therefore the surrogate models for the rest of fluxes (Q m e , Q m i , Γ m e , Π m ) do not need to include those extra variables for training and prediction.
The previous approximation for trace impurities does not mean that the impurity density channel is decoupled from the rest.Even in trace quantities, high-Z impurities can radiate substantial amounts of power that will affect the evolution of electron energy, thus affecting the background turbulence in an indirect way.However, the separation of δf and background distribution functions allows for the coupling to only occur on a "macroscopic transport" scale, and therefore handled by PORTALS directly and not by each individual turbulence simulation, a feature that will be especially exploited in Section 3.3.

Decoupling of radial grids
As will be discussed in detail in Section 4.1, the number of local transport model evaluations can be greatly reduced without loss of accuracy due to lack of normalized gradient profile structure in the plasmas of interest.However, the calculation of target fluxes, F target , requires a more careful treatment because it is constructed as the inner volume integral of the local target flux density, f target .Using a small set of radial points to perform the volume integrals can result in non-negligible errors, particularly in the calculation of the selfconsistent energy exchange power, r 0 ⟨P ei ⟩V ′ dr, as shown in Fig. 3.In the current implementation of PORTALS, the target flux calculations are performed on a finer radial grid of 20 points (usually 4-5x more points than the transport calculations), which is sufficient for an accurate representation of volume integrals while at the same time does not introduce much overhead during acquisition optimization.

Turbulent exchange
The framework of FSA transport equations described in Eq. 1 also allows the generalization of the energy exchange power, ⟨P ei ⟩, to include contributions from both classical and turbulent exchange.While the classical exchange can be calculated from standard net-exchange collisions formulas (see original PORTALS [1]), the turbulent exchange or anomalous heating is provided by the turbulence δf simulations [33].This means that the turbulent exchange becomes an additional quantity to include during the residual calculation and requires its own surrogate model.At flux-matching, however, the turbulent exchange enters as a target flux density in Eqs. 3 and needs to be volumeintegrated.This is the only component in the target calculations that cannot be extended to the finer radial grid described previously, and hence its effect on the macroscopic profiles will be subject to volume integration errors.We must clarify that the turbulent exchange or any other target flux that requires surrogate modeling (when non-analytic) is incorporated into the surrogate modeling framework via the target flux density, f target j,c , and not via the integrated flux.The volume integration is performed as part of the transformation from flux (surrogate) quantities to residual quantities during optimization.
Even if small, the inclusion of the turbulent exchange in PORTALS allows to assess its effect on profile predictions directly with nonlinear gyrokinetic simulations instead of relying on quasilinear modeling.

Benchmarking examples with the TGLF model
The previously described workflow is general for any local, δf transport model, and therefore the physics fidelity (and associated computational cost) of the kinetic profiles prediction depends on the chosen model fidelity.Benchmarking with nonlinear gyrokinetic models would require significant computing resources for a proper study.Therefore, here we approach this by using TGLF [16] -a fast, quasilinear transport model-to compare the performance of PORTALS and standard Newton methods, from the perspective of how many profile evaluations are required to achieve flux-matching.
Figure 4 shows the evolution of the residual for the prediction of three kinetic profiles (T e , T i and n e ) at 10 radial locations uniformly distributed between r/a = 0.35 − 0.9 with evolving targets and with the TGLF SAT0 model [16].PORTALS has been run with 16 different seeds (with random initialization) Fig. 4 Comparison of residual evolution in PORTALS with random initial training, a standard Newton method (NM) and simple relaxation (SR) schemes.For this exercise, we employed the quasilinear Trapped Gyro-Landau Fluid (TGLF) SAT0 model [16].Three example plasmas were used: (a) DIII-D ISS [34], (b) ITER Baseline [34] and (c) JET Ohmic L-mode [51].PORTALS was initialized with different random seeds, which affect the initial training (5 profiles, vertical dashed line) and the subsequent training of Gaussian processes and acquisition optimization techniques.NM corresponds to TGYRO method-1 and each of the three numerical parameters (relaxation parameter η, maximum step size bmax and finite differences step ∆x) were varied within reasonable ranges.SR corresponds to TGYRO method-6 with variations of the relaxation parameter η SR .We note that, even though TGYRO works with gyro-Bohm normalized residuals, here we plot them in real units, M W/m 2 , for a direct comparison of performance.
and it is compared to a standard Newton method with finite differences Jacobian (approximated to be block-diagonal) [20] and a simple relaxation model.Numerical input parameters to the Newton method such as the relaxation parameter η, maximum step size b max and finite differences step ∆ x are varied.We note that these 1D scans of numerical parameters are not representative of the potentially "best" performance of the Newton method, but reflects the often large variability of the results.
While only three specific cases are shown in Figure 4, further numerical experiments reflect the same trend: the surrogate-based optimization in PORTALS usually achieves a significant speedup when compared to the blockdiagonal, finite-differences Newton method.The simple relaxation method in some situations can get similar performance to PORTALS, but it is very sensitive to the choice of relaxation parameter and can get stuck in local optima.The increased cost of Newton methods for flux matching (i.e. more transport model evaluations) is mostly a consequence of the need to calculate the local Jacobian numerically at each iteration, which results in (1 + N c ) profile evaluations per Newton step.
It is important to note, however, that the methods as currently implemented in PORTALS introduce a significant overhead in wall-time cost.Standard numerical methods (as those implemented in TGYRO) require close-to-negligible extra cost and the total computing time required for a full profile prediction can be approximated as the sum of each individual transport model evaluation: , where t m is the cost of each local δf simulation, including any internal parallelization technique 5 .When the flux-matching problem is formulated as a surrogate-based optimization one, the cost of fitting the surrogate models and of acquisition function optimization need to be taken into consideration: , where t f it and t acq represent the time spent in fitting the surrogates and in optimizing the acquisition function at each iteration.In fact, if the transport model evaluation is cheap (as in the case of the TGLF model), the cost of the flux-matching exercise can be dominated by the surrogate fitting and optimization overhead.However, the improved convergence properties of the surrogate-based techniques combined with the comparatively (to nonlinear gyrokinetics) fast execution of quasilinear models may still make the adoption of these techniques more desirable than traditional Newton methods, but more work to characterize the numerical robustness of PORTALS with quasilinear models is required to draw definite conclusions.
As it has been described throughout Sec.3.1, the high performance of PORTALS techniques is achieved via the decomposition of the residual ξ into smaller, self-contained components.This also results in the need to fit and evaluate multiple surrogate models, usually N ρ • (3 • N c + 1), where it was assumed that each channel requires neoclassical, turbulent and target flux surrogate models, and that each radius also requires a model for the turbulent exchange.The surrogate overhead (t f it + t acq ) takes on average ∼ 10 wall-time minutes utilizing an AMD EPYC 7543 32-Core Processor for a problem whose residual requires the evaluation of 90 surrogate models (10 radial locations, 3 channels with zero target particle flux) with 30 free parameters.Under the conservative assumption of 5× speed up in PORTALS, we arrive at the rule of thumb that PORTALS methods are appropriate if each transport model requires more than 3 minutes to evaluate.This condition is not satisfied with most quasilinear transport models like TGLF [16] and QuaLiKiz [52], that take a few tens of seconds per evaluation.On the other hand, nonlinear gyrokinetic modeling does benefit dramatically from the use of PORTALS, as simulations of realistic plasma conditions range from a few tens of thousands to millions of CPU-hours (or in recent GPU architectures hundreds of GPU-hours) per evaluation.Future work will devise methods to reduce the wall-time cost of each individual PORTALS iteration (such as parallelizing the evaluation of the GP models and leveraging modern GPU hardware) to bring the benefits of surrogate modeling to the realm of quasilinear profile predictions.
Fig. 5 shows characteristic phases for the decrease of multi-channel residual of PORTALS-CGYRO simulations, for three examples.Phase I consists of the initial SR training, which often decreases residual if channels are not coupled or strongly dominated by one type of turbulence mode, although the simplicity of this scheme (as addressed in Sec.3.1.3)can also lead to oscillations or higher residual (as in the middle plot in Fig. 5).However, the goal of this phase is not to observe a steady decrease of the residual, but of producing proper training points for subsequent phases of the PORTALS workflow.Phase II is when surrogates are fitted and acquisition function optimization informs next points to evaluate.During Phase II, oscillations are usually observed for 2-5 evaluations, a consequence of the surrogates exploring cases near the logarithmic gradients bounds, or combinations that were not explored during Phase I.In Phase III, the residual drops dramatically and flux-matching occurs.Sometimes, following Phase III, we observe oscillations due to critical gradient and stiff behaviors of turbulence.Profiles and gradients are converged, but fluxes can jump up and down the targets, as was discussed in Ref. [1] and will be described in detail in Sec. 5. Generally, as depicted in the bottom subplots in Fig. 5, the residual evolution is not uniform across radial locations, but no specific trends are found.In some situations, the core reaches low residuals faster than the edge, and vice versa.
Due to the need for expert knowledge to interpret if sufficient convergence has been achieved, together with the need for a careful nonlinear gyrokinetic setup (to avoid wasting computational resources), the current implementation of PORTALS requires significant "human in the loop" component.Future work will focus on the development of a more automated approach that can make high-fidelity profile predictiosn available to a wider user base.

Restarting simulations for parameter scans
During the prediction of profiles with the PORTALS framework, it is currently assumed that the rest of plasma and geometry parameters are fixed.While this assumption can be relaxed, the accuracy of the surrogate models would be compromised if such parameters are not included as input variables during training.The inclusion of more variables, while possible, would require a higher number of local simulations with the transport model for surrogate training, which can quickly become intractable if expensive turbulence codes are used.
An advantage of using surrogate-based techniques with fixed geometry and background electromagnetics is that the surrogates trained during a profile prediction can be re-utilized to produce new scenarios with variations in certain input parameters.Input parameters that do not affect the local δf turbulence simulations but that affect the macroscopic evolution (e.g.auxiliary input power, fueling or torque) are suitable for surrogate-reutilization. Input parameters whose effect is already captured by the set of profile quantities, y j,c , that were used to train the surrogates (e.g.edge density or pressure boundary condition) are also suitable for this technique.Fig. 6 shows an example of how surrogate models look like for characteristic channels.Leveraging this aspect of surrogate modeling results in a much reduced number of new evaluations required to reach new converged states, and allows for extensive study of reactor scenarios.
This technique was exploited for the study of the parameter space of the SPARC tokamak [4,56] when operating in L-mode-like conditions [57].We utilized nonlinear CGYRO [2] to evaluate core transport at 6 radial locations (r/a = 0.35, 0.55, 0.75, 0.875, 0.9, 0.943) and scanned auxiliary input power, edge density boundary condition and edge temperature boundary condition.As shown in Fig. 7, first convergence took 12 profile evaluations, but subsequent scenarios required an average of 5 extra evaluations for convergence in the three predicted channels (T e , T i , n e ).This unique study enabled the investigation Fig. 7 Summary of SPARC L-mode investigation using PORTALS and nonlinear CGYRO (6 radial locations, 3 channels).On the left, the L 1 residual normalized to the radial-average of the target flux is plotted for all iterations.Initial training had 5 profile evaluations and first converged profile was achieved with a total of 12 evaluations.The re-utilization of surrogates enabled the prediction of kinetic profiles (Te, T i , ne) and fusion performance of 12 scenarios (variation of P input , n edge , T edge ) with only a total of 71 profile evaluations (426 local δf CGYRO simulations).On the right, ion heat flux matching between transport and targets for each of the scenarios.Similar flux-matching quality was achieved for electron energy and particle fluxes.
of 12 scenarios at a total cost of 71 profile evaluations (426 local δf CGYRO simulations) [57].Physics results from this and other applications of PORTALS will be part of upcoming publications and conference presentations.
It is important to note, however, that the re-utilization of surrogate models for parameter scans has the limitation of exploring scenarios with all other parameters fixed, such as the plasma geometry, safety factor profile, or effective charge, Z ef f .If additional parameters change, and there is a non-negligible effect on turbulent or neoclassical transport, the re-utilization of results from previous optimization runs in a naïve fashion can potentially delay convergence, as the parameters that are used to train the surrogates would not capture all the changes in transport fluxes.In such a case, it becomes more efficient to start the PORTALS simulations from scratch.
In future work, we intend to explore leveraging multi-task GP modeling approaches [58] that do not assume identical dependence of the outcomes on input parameters from different simulation settings and instead learn correlations between the outcomes.Such "transfer learning" has the potential to substantially reduce the number of evaluations required even in cases where simulated behaviors are not identical.This also opens the door for "multi-fidelity" optimization, where different simulation setups with different cost/accuracy trade-offs can be leveraged together in a principled fashion via a joint surrogate model.

Guidelines for accurate profile predictions
At the time of publication, we have utilized this workflow for the nonlinear gyrokinetic profile prediction of over 30 different plasma conditions, many of which were for burning plasmas.PORTALS has been employed to perform direct nonlinear gyrokinetic simulations with CGYRO [2] to predict steady-state plasmas in SPARC PRD [1], ITER Baseline Scenario [55], DIII-D ITER Similar Shape [34], ARC [54], SPARC L-modes [57] and JET H, D and T Ohmic plasmas [51].These studies are part of a broader effort to both validate nonlinear gyrokinetics in current experiments and to predict the performance of future devices.This section is intended to outline "best practices" that have been determined from these efforts.

Selection of radial grid
Thanks to the de-coupling of transport flux training and flux-matching solver, it is found that the cost of profile predictions with PORTALS scales linearly with the number of local, flux-tube simulations used to represent the full profiles.This is because the surrogate training occurs locally, so the nonlinearities introduced by the evolution of sources (e.g.energy exchange and alpha heating) and sinks (e.g.radiation) and the coupling of distant radial locations via gradient integration are handled directly by the trained surrogate models.The use of automatic differentiation in PyTorch presents clear advantages as access to the exact, local Jacobian matrix enables a more robust convergence process.
Therefore, the cost of the profile predictions directly depends on the choice of a radial grid that accurately represent the profiles.The coarser the radial grid the cheaper the profile prediction, but misrepresentations of profile shape and fusion performance may occur if the profiles are too constrained by the chosen grid.In this work, we focus on the prediction of inductive, on-axis heating plasma discharges, where the presence of internal transport barriers and a fine gradient structure are not expected.
To study the minimum requirements for the prediction of such scenarios we focus our attention to a database of 1084 discharges in the Alcator C-Mod tokamak [59], spanning various operational scenarios (including L-, Iand H-modes).Discharges were required to have at least 150ms (∼ 5 energy confinement times) period of steady plasma current, magnetic field, total input power and line averaged density.Electron density and temperature data from Thomson Scattering [60] and Electron Cyclotron Emission [61] were used to produce a database of kinetic profiles that were fitted using sparse Gaussian processes techniques [62].Filtering of bad fits (e.g.due to outliers) was used to ensure realistic profile shapes.The full database of fitted profiles is depicted in Figure 8. Core electron densities ranged from ∼ 0.5 − 4• 10 20 m −3 , and temperatures from ∼ 0.5 − 5keV .
To assess the radial requirements for profile predictions, we parameterize the normalized inverse gradient scale lengths a/L T,n with piece-wise linear functions, with a finite number of knots (equal for both a/L T and a/L n ) that Fig. 8 Electron temperature (Te) and density (ne) fits and their corresponding inverse gradient scale lengths (1/L Te , 1/Ln e ) for all 1084 shots included in the database.In this work, we only explore gradients inside of r/a < 0.9.
represent the locations that one would use to run δf turbulence simulations.Several potential radial knot locations were evaluated to determine the optimal number and location (in normalized minor radius, r/a) of such knots.Potential knot positions from r/a = 0.2 − 0.9 with 0.05 spacing were considered, and 3 − 7 radial knots were evaluated, using r/a = 0.9 always as the anchor point and linearly interpolating to zero from the point that is nearest to the magnetic axis.Note that the choice of r/a = 0.9 as the anchor point ensures that pedestal-like structures are not covered by the prediction.For any chosen number and location of knots, profiles of T e and n e can be obtained by radial integration of the gradient scale lengths.For visualization purposes, Figure 9 depicts 5 possible knot grids (evenly-spaced points) and the changes in the integrated temperatures and densities.
To assess the goodness of any given knot grid, we determine a "modified" fusion power ratio, P f / P f,f it .We assume equilibrated and pure plasmas (T i = T e , n i = n e ), and the temperatures are multiplied by a factor of 10 to bring the average on-axis temperature of ∼ 2keV in Alcator C-Mod to ∼ 20keV expected in burning plasma experiments such as in SPARC [53,56].This, along with the assumption of a 50-50 D-T mixture, transforms the database to burning-plasma-like conditions and the deviations in fusion power due to the coarse radial grid are directly representative for what would be expected when predicting burning plasmas and reactor concepts.
For any given number of knots, the optimal knot locations to minimize the error of the database is found by performing a grid search.Figure 10 (left) Fig. 9 Example of profile parameterization using piece-wise linear 1/L T,n interpolation for an Alcator C-Mod plasma in the database.3 to 7 radial knots evenly-spaced out between r/a = 0.2 and 0.9 are used to parameterize the gradient scale lengths, which are plotted in the bottom (1/Ln e and 1/L Te ) along with their integrated profiles on top (ne and Te).Experimental data (red points) was assumed to have a ∼ 10% error.
shows the histogram of the ratio of the modified fusion power between the parameterized profiles and the actual profile fit for the optimized locations for each number of radial knots.Figure 10(right) displays the very different parameterization performances between evenly-spaced and optimized knot locations.
First, it is found that, for any number of chosen points, the optimization of the knot locations (non-uniform grid) produces significantly reduced parameterization error than if only evenly-spaced points are chosen.Particularly, the edge region is found to require larger number of points, while inside midradius, it is found that one knot is sufficient to minimize the fusion power error.This occurs because the edge has always more structure than the core of the plasma, likely a consequence of potentially less-stiff transport in that region.Furthermore, due to the consideration of realistic plasma geometry, most of the contribution to fusion power comes from r/a ∼ 0.3 (as shown for example in SPARC in Figure 11c), requiring a more accurate representation of the edge and mid-radius gradients than those in the inner core.
This investigation also reveals that 5 radial knots-with radial locations at r/a = [0.35,0.55, 0.75, 0.875, 0.9]-are sufficient for an accurate representation of the fusion power, with a mean of the database centered at P f / P f,f it = 0.9955 and standard deviation of 6.8%.In the case of the stored energy, W , the representation with 5 knots resulted in a distribution with mean centered Fig. 10 (left) Superposition of histograms of the modified fusion power ratio P f / P f,f it for all shots in the database using the optimal set of parameters for each of the gradient reconstructions with piecewise linear functions.(right) Set of box-plots comparing P f / P f,f it of using optimal parameters vs. evenly-spaced parameters.
at W/W f it = 0.9990 and standard deviation of 2.6%.Note that the use of different fitting techniques (GPs to fit profiles and piecewise-linear-a/L T,n for finding optimal knots) ensures that we are not introducing bias in finding the best radial grid.
Therefore, with this investigation we recommend to perform profile predictions by using not more than 5 locations, as long as the plasmas to be predicted are not expected to exhibit features of advanced scenarios, with large off-axis current drive, internal transport barriers, reversed shear, or other phenomena that may introduce structure in the gradient profiles.These results are reminiscent of the concept of profile resilience and consistency, an attribute of burning plasma profiles (usually with small gyro-Bohm normalized heat and particle fluxes) that results in the near invariability of a/L T profiles throughout the plasma core as a consequence of stiff turbulent transport driven by ion temperature gradient (ITG) mode turbulence [63].The edge, however, via the formation of a pedestal or via non-stiff transport [64] in colder, L-mode plasmas can significantly vary and is more sensitive to plasma parameters and input power.

On the importance of near-axis simulations
Section 4.1 has presented evidence that, for inductive tokamak scenarios, accurate enough (6.8% standard deviation) predictions of fusion power in burning-plasma regimes are attained even if it is assumed that the inverse gradient scale length is linearly interpolated from the value at r/a = 0.35 to a/L T,n = 0.0 on axis.In this section we further explore the reasons behind this perhaps surprising result.This finding can have important implications as running turbulence transport models near the magnetic axis is difficult due to the low magnetic shear and high gyro-Bohm units, and other physics (such as fast ions or sawtooth) may further complicate such simulations and interpretation of the results.
Setting aside the fact that there is experimental evidence that suggest that a linear interpolation to a/L T,n = 0.0 on axis is sufficient (as shown in Figure 8 for Alcator C-Mod plasmas), even if such feature was not observed, the small plasma volume near the plasma core results in only small variations in fusion power when the linear interpolation assumption is relaxed.Figure 11 shows the small effect that the inner gradients (inside of r/a < 0.35) have on the total fusion power, using the SPARC Primary Reference Discharge (PRD) scenario [1] as the base case.An additional point at r/a = 0.175 was added to the a/L T,n parameterized profile and the values of the gradients were sampled 1000 times from uniform distributions (0 − 2 in a/L T and 0 − 0.5 in a/L n ).Even with such large deviations from the original "interpolationto-zero" profile, the standard deviation of the distribution of fusion powers was 3.2% (Figure 11d).Although the differences in the fusion power density (Figure 11c) were significant, the small volume of that region results in only small differences to the integrated fusion power.
Consequently, from the perspective of predicting fusion performance and fusion power output from a burning plasma, the small gain in accuracy does not seem to be compensated by the additional cost of the turbulence simulations inside of r/a < 0.35.

Ion-scale simulations
Although the PORTALS techniques are insensitive to the choice of the δf transport model -allowing high-fidelity simulations to be brought to stationary conditions at reduced number of evaluations-, the choice of model setup affects directly the cost of the simulation exercise.In the case of gyrokinetic modeling, the choice of the simulated normalized poloidal wavenumber range, k y ρ s , is an important parameter that affects strongly the simulation cost and the type of turbulence instabilities and interactions that are captured.Although not a limitation, all the simulations performed so far with PORTALS rely on the assumption that ion-scale turbulence is the dominant turbulent transport mechanism, and that high-wavenumber effects or interactions with long wavelength turbulence are sufficiently small.
Validation studies in current experiments, with widely varying heating schemes, collisionality regimes and plasma parameters, require additional work to understand the validity of ion-scale gyrokinetics to reproduce core transport.Performing linear stability and numerical resolution scans around experimentally-measured conditions is an approach that is often taken in the community (e.g.Ref. [34] for predictions of DIII-D plasmas).
However, in the case of reactor and burning plasmas predictions, there is no access to experimental conditions to evaluate the turbulence spectrum.A careful look at the requirements for fusion power production in conventional, inductive tokamaks can yield some insights to motivate the assumption of ionscale turbulence modeling.Ref. [65] showed that despite the low collisionality of reactor-relevant plasmas, their energy confinement times are sufficiently large that they remain well-coupled.Thus, although such plasmas are inherently electron heating dominated (due to the inherent dominance of alpha heating in a reactor-relevant plasma), the combination of strong coupling and radiation losses results in Q i > Q e for all the reactor-relevant conditions studied with PORTALS to date.Due to the dominance of the turbulent ion heat flux in the core plasma (where neoclassical transport is found to be negligible due to the low collisionality), it appears that gyrokinetic simulations which only resolve the long-wavelength (k y ρ s ≲ 1) turbulence that drives the ion heat flux are sufficient for these conditions.This conclusion is consistent with the "fingerprint paradigm" of Kotschenreuther et.al [66] who note that only long-wavelength instabilities such as ITG, trapped-electron modes (TEMs), or kinetic ballooning modes (KBMs) are (at least linearly) capable of producing Q i /Q e > 1 ratios consistent with the observed reactor characteristics.Whether multi-scale turbulence dynamics become important in other reactor scenarios (such as steady-state advanced tokamak regimes), or lead to significant nonlinear enhancements of the ion scale fluctuations remains to be resolved in future work.

Ion bundling
Similarly to the transport vs target radial grid decoupling presented in Sec.3.1.5,another technique that is possible in PORTALS and desirable to reduce the computational cost of the predictive exercise is the de-coupling of the ion species.While calculation of targets (particularly radiation) requires a realistic mix of impurities, the effect of the impurity mix in turbulence modeling is often captured by its effect on the effective charge Z ef f and main ion dilution f main .Therefore, nonlinear gyrokinetic simulations may benefit from the reduction of the number of ion species, with the lumping of impurity species being a common approach to reduce the computational expense.We must note, however, that when the density of impurities become additional channels to simulate and bring to steady-state, this approach is not possible.
Similarly to the impurity lumping, it is observed [34] that the effect of separating hydrogenic ions into deuterium and tritium is marginal on core turbulence modeling of burning plasmas and therefore it becomes advantageous to simulate a lumped A = 2.5 hydrogenic main ion.Studies of isotope effect in current machines [67] show that using an effective main ion mass can reproduce many aspects of the turbulence.However, the effect on alpha heating and fusion production is, of course, retained during the calculation of targets.

5-channel nonlinear gyrokinetic prediction of DIII-D ITER Similar Shape Plasma
To demonstrate the efficiency of PORTALS, here we aim at simultaneously bringing 5 channels (electron temperature, ion temperature, electron density, lithium impurity density and angular rotation) to steady-state of the DIII-D ITER Similar Shape (ISS) H-mode plasma from Ref. [34], with ionscale, nonlinear CGYRO [2] and NEO [18] simulations providing the turbulent and neoclassical components of cross-field transport.We allow radiation (Bremsstrahlung, line and synchrotron) and energy exchange (classical and turbulent) to evolve.We employ the 5 radial locations described in Sec.4.1 (r/a = [0.35,0.55, 0.75, 0.875, 0.9], as illustrated in Fig. 1 for this specific DIII-D ISS case) but calculate targets in a higher resolution grid of 20 points from r/a = 0.0 to 0.9.Nonlinear gyrokinetic simulations using the CGYRO code were performed at each of the radial locations.The general simulation setup is similar to that reported in Ref. [34] with some modest changes that are meant to lead to improve the turbulence statistics and therefore lead to more reliable time averaged heat and particle fluxes.Although the exact box sizes varied slightly across the radius, the nominal simulation setup was targeting box sizes in the radial and bi-normal direction of ∼ 120 × 120ρ s for L x and L y .This simulation domain was represented with 24 toroidal modes (n n ) spanning from 0.053 to 1.219 in k y ρ s .Nominally 512 radial modes (n r ), 24 toroidal modes (n n ), 24 theta points (n θ ), 24 pitch angles (n ξ ) and 8-12 energies (n energy ) were used, with the exact number of radial modes varying somewhat depending on the radial location simulated.Simulations utilized high accuracy collisions implemented with the Sugama collision operator [68], realistic geometry implemented with the Miller Extended Harmonic formulation [69], electromagnetic turbulence (δϕ, δA ∥ , δB ∥ ), and included 5 gyrokinetic species (D, T, Lumped impurity species, trace Li).The addition of the trace Li species (n Li /n e = 1 • 10 −6 ) is to allow for the prediction of the lithium density profiles in the simulation.
Each individual local δf CGYRO simulation was performed on the GPU partition of the NERSC Perlmutter supercomputer with each simulation utilizing 12 nodes, with each node comprised of 4 NVIDIA A100 (40GB) GPUs and requiring approximately 3-4 hours to achieve saturated conditions.Therefore each "black-box" evaluation (i.e.profile evaluation with 5 radial points) results in 720 GPU-hours (180 node-hours), making the use of surrogate-modeling techniques in PORTALS more efficient than standard numerical methods, as described in Sec.3.2.As will be discussed in the following, 30 profile evaluations were required to achieve steady state, amounting to 21,600 GPU-hours for a 5-channel, self-consistent nonlinear CGYRO prediction of the core plasma in the DIII-D ISS.
Fig. 12 shows the evolution of the 5 kinetic profiles from the original parameterized profiles (red) [34] to the multi-channel flux-matched ones (green).Starting transport fluxes (red lines in bottom subplots) are very far from targets, but after 26 transport evaluations per radius (130 nonlinear CGYRO simulations), the plasma is brought to steady-state in all coupled channels simultaneously.For visualization purposes, transport vs target fluxes are plotted in both Fig. 12 and Fig. 13 with different y-axis range.
From a naive inspection of the flux-matching profiles of transport and target fluxes for the best residual plasma, one could argue that the plasma is not truly in steady-state, as target fluxes are not within confidence bounds of the turbulent fluxes evaluations.However, this is a consequence of stiff transport behavior and extreme sensitivity with respect to input parameters, as it was also reported for the SPARC PRD in Ref. [1].Fig. 13 plots two cases towards the end of the simulation: the best residual case and the last one evaluated.The gradient profiles are nearly unchanged, yet the fluxes jump in some locations from below to above targets.This somewhat oscillatory behavior could be resolved by running extremely long δf simulations, where any error introduced by the limited time averaging is suppressed.However, such simulations will not give any more accuracy to the profile predictions, which are already converged and no significant changes to gradients are expected to happen.Fig. 14 shows the time traces of the CGYRO δf simulations at r/a = 0.55 as representative location.Simulations are restarted from a baseline case (blue shaded region) that was run for ∼ 360a/c s and each of them is let to evolve to their corresponding saturated state.At this location it was assumed that running for an additional ∼ 400a/c s was sufficient to evaluate the saturated state, which was determined by visual inspection, but metrics for automation of PORTALS are under investigation.We must note that this approach -given the limited time averaging caused by the otherwise prohibitely cost of running for longer time periods-would not capture changes in the transport fluxes from turbulence and zonal flow patterns that may developed on long time scales, as observed in some gyrokinetic studies [70,71].This is generally a limitation of flux-matching frameworks of δf simulations that aim to achieve steady-state with high-fidelity gyrokinetic simulations, under the constraint of computational cost.However, most long-time patterns, such as those in the aforementioned papers, typically show early signs (e.g., intermittency or bursts) that might trigger a collapse.These are features that are actively looked for in the simulations used to bring the plasma to steady-state in our workflow, and have not been observed in the specific regimes that PORTALS has been run on so far, including the predictions showed in this paper.
The mean flux in the saturated state is determined by the mean of the time trace in the saturated region, and the confidence bounds are 2σ S , where σ S = σ √ N .Here, σ is the standard deviation of the time trace and N is the number of independent samples, found by decomposing the full time window into periods of 3τ C , where τ C is the autocorrelation time.This assumes that time points separated by more than 3 autocorrelation periods can be considered to be independent from each other and contribute to the reduction of the total error of the time signal.Here we also make the assumption that target and neoclassical fluxes have no error associated to their calculation, which could be included in future work when constructing better metrics to determine convergence in PORTALS simulations.This utilization of the estimation of uncertainties in observed data to build more robust surrogate models enabled by the built-in capabilities of the Gaussian processes is one of the key advantages of PORTALS, as discussed in Section 6.The details of the implementation of fixed Gaussian noise (coming from this estimation of the error in the time Fig. 13 Investigation of stiff behavior.The gradient profiles of each kinetic profile are plotted for three iterations (#0, #26, #29), along with the flux profiles (zoomed-in version of Fig. 12).Right most column depicts the L 1 residual (best residual achieved in iteration #26, approximately 2 orders of magnitude lower than original) and the convergence metric ∥∆y∥∞ that represents the L∞ norm of the difference between the input parameters of the current iteration and the closest point evaluated.This metric reaches the stopping criterion of 1%, which indicates that the maximum change in any of the logarithmic gradient is below 1%, demonstrating sufficient stationarity.
evolution of the saturated state in the initial value simulations) is further discussed in Ref. [1].

Discussion
This paper has presented the PORTALS framework that formulates the steadystate multi-channel transport equations in the plasma core as a surrogatebased optimization problem.This formulation has allowed arguably the highest fidelity predictions of kinetic profiles and performance in tokamaks, by the direct use of the CGYRO code for steady-state flux matching.A unique, 5channel nonlinear gyrokinetic prediction of core profiles was presented, which only required 26 evaluations to attain enough convergence of the profiles.The difficulties of dealing with stiff transport and its effect on the development of robust convergence metrics were discussed.In this section we share some thoughts on why the field of magnetic-confinement fusion, as it enters the era of burning plasma experiments and power plant design, may need high-fidelity transport physics and simulations.

On the need for self-consistent, multi-channel predictions
For prediction of burning plasmas and reactor scoping At a time when the fusion community is focusing on designing potential fusion power plants and when the planning of the experimental operational campaigns of ITER [3] and SPARC [56] is requiring careful, predict-first simulations, the development of techniques that can accurately describe plasma profiles and performance is critical.Historically, tokamak devices have been designed using simplified models of transport and performance.In fact, the use of global energy confinement scaling laws, along with empirical formulas for peaking factors and functional forms for profile shapes, is widespread in the community.
While such approaches are useful to have a quick view of potential operational scenarios in new machines and as a check for simulation results, the uncertainties in the predictions are too high to trust fusion endeavors solely to such simplified formulas.Generally speaking, there are two types of uncertainties introduced by the use of empirical scaling laws to project burning-plasma performance.First, capturing the complex, high-dimensional dynamics of tokamak plasmas with just a few global metrics is, by nature, an approach subject to inaccuracies.Some of these errors are captured by the fit error bars of scaling laws, but higher errors may be present if one accounts for measurement uncertainties as well.The extrapolation of current scaling laws to burning-plasma regimes, with their high degree of non-linearity magnifies this error.Predictions made with a physics-based understanding of the underlying dynamics [72,73] are a powerful tool for reducing this uncertainty.Extra attention must be taken if power plant designs are aimed at exploiting unexplored regimes that were not fully accounted for when building the scalings laws or saturation rules for quasilinear models, such as negative triangularity plasmas [5] or non-conventional aspect ratios [74].Operational regimes with high plasma rotation, high positive triangularity and high core radiation fractions are also examples where standard empirical scalings laws are more prone to inaccuracies, although most recent updates to the ITPA scaling laws are promising [75].Secondly, even if confinement and peaking were predicted exactly, the choice of parameterized functional forms for temperature and density profiles introduces a non-negligible uncertainty.This explains why the performance of the SPARC PRD plasma varied from Q ≈ 11.0 [56] (empirical) to Q ≈ 9.2 [53] (physicsbased) as published in its physics basis, even though the predicted peaking and confinement with physics models were the same as those given by the empirical formulas.
To better visualize the effect of both of these uncertainties, Figure 15 shows the variation of fusion gain (Q = P f us /P in ) for a potential SPARC PRD operational point when transport-related input parameters to the Plasma OPeration CONtours (POPCON) analysis [76] are drawn from Gaussian distributions.Chosen distributions (mean and standard deviation) for this analysis are specified in the figure footnote.We note that the 5% standard deviation assumed here, although ad-hoc, is on the conservative end.POPCON assumptions were similar to those in Refs [4,56].In the same analysis, we also included the uncertainty propagation analysis if a physics-informed profile shape is used instead of parabolic.For this physics-informed profile shape, it is assumed that the core has constant gradient scale lengths (a/L T and a/L n ) and an edge pedestal is varied to match the choice of H 98,y2 and volume average density.
Fig. 15 unequivocally demonstrates that even assuming rather small uncertainties in transport assumptions in empirical modeling can lead to drastically different fusion gain results.As changing design parameters in burning-plasma experiments is costly and time consuming, it becomes clear that incorporating high physics fidelity simulations early in the design of new devices is key to the success of fusion as an energy source.We note that even though we took SPARC PRD as an example, the same results apply to any burning-plasma and reactor-relevant scenario.It is expected that reactors near ignition conditions or with high fusion gains may potentially be more sensitive to transport assumptions.Furthermore, in the example of Fig. 15 it was assumed that the plasma remains in H-mode (i.e., energy confinement scaling τ 98y2 E is applicable) throughout the parameter scans.L-H power threshold considerations and its associated uncertainties will certainly further widen the probably density of fusion gain and fusion power.In the case of the SPARC PRD plasma, the risk from relying on empirical scalings and POPCON analysis has been mitigated by comprehensive transport modeling via quasilinear simulations in TRANSP-TGLF [53], standalone gyrokinetics with CGYRO [78] and self-consistent PORTALS-CGYRO simulations [1].

For validation of gyrokinetics in current experiments
Prior to predicting new devices, validating our understanding and simulation codes of core turbulence is key to build confidence in the results.Validation in core turbulence research consists of comparing transport simulations to experimental results [79, and references therein].Due to the high computational cost of gyrokinetic simulations, validation studies in this area have historically focused on using experimental gradients as inputs to simulation codes or, at most, 1-or 2-dimensional scans of driving gradients (e.g.a/L Ti and a/L Te ).The concept of "Q i -matched" simulation has been used in the literature to refer to simulations that recover the experimentally-inferred ion heat flux by varying experimental inputs within error bars.Given the stiff-transport nature of turbulence and the often large uncertainties in experimental gradients, it is common that a Q i -matched simulation exists within error bars [80].However, such simulation may not necessarily reproduce transport in other channels, such as electron energy and particle transport.Additionally, flux-matching in this context often refers only to local results, and propagation of gradient variations (with the associated changes in plasma parameters and power flows) from edge to core are rarely accounted for.
To illustrate this, Fig. 16 displays the differences of the calculated impurity particle transport coefficients with CGYRO between using the self-consistent, 3-channel predicted profiles or only using the local, standalone, Q i -matched gradients.This example correspond to the DIII-D ISS plasma [34].Differences are such that can lead incorrect validation conclusions if not self-consistent results are compared to experimentally-inferred transport coefficients.
This work, particularly the capability of PORTALS to bring to steady-state multi-channel profiles, including rotation, opens avenues for more comprehensive validation studies in current experiments.This can be important to, for example, complement validation studies of turbulent transport models to capture intrinsic rotation and momentum transport dynamics [81][82][83].

Advantages and challenges of PORTALS methods
This paper has presented a newly developed framework to enable flux matching and has presented benchmarks and comparisons with standard numerical methods.The PORTALS framework has generally delivered convergence at lower number of profile evaluations than standard Newton and simple relaxation methods.With the risk of oversimplification, this can be explained from the perspective of PORTALS-or any Bayesian optimization framework with only-exploitation acquisition functions 6 -being a generalization of local Jacobian methods.In the latter, first order derivatives are constructed usually by assuming a linear model in the vicinity of the current point.This derivative information is then used to inform the next point to evaluate, that is predicted by the linear model to move towards convergence (e.g., reducing residual).In such methods, however, the information of previous iterations is lost, and new linear models are constructed locally at each iteration.When formulating the problem with surrogate-based optimization, a global model is constructed by utilizing all evaluations that were previously performed, maximizing the information leveraged for the optimization.In the context of purely-exploitation acquisition functions (as the one used here, mean of the posterior distribution), the next point is chosen such that it is predicted to optimize the underlying function the most (in expectation).In core transport modeling particularly with first-principles simulations, there is certain smoothness to be expected (under the assumption of infinite simulation time) between transport fluxes and input gradients that drive turbulence.Critical gradient behavior is usually well defined within δf codes and therefore the use of global surrogates provide important advantages.
The seemingly high dimensionality of the problem (e.g., 6 input variables in a 3-channel prediction) for the low number of samples (< 20 in most cases studied, usually 12-16 iterations) can be explained from different perspectives.First, Gaussian processes are well known to be highly sample-efficient surrogate models.Second, the effective dimensionality of turbulence during the fluxmatching problem is usually lower due to the higher influence of logarithmic gradients on the outcome and somewhat "residual effect" of additional terms such as T i /T e and ν ei .This is quickly picked up by the surrogates, mostly as a consequence of the specialized initial training that scans gradients and quickly drives the system towards the region around marginal stability.While the critical gradient may be in turn affected by these "residual" terms (such as the effect of T i /T e ), most of the parameter sensitivity is captured by the logarithmic gradients.Once the plasma searches around the region of marginal stability (with large variation in fluxes), the "residual" parameters constructed from y j,∀c = {n e , T e , T i , ω 0 , n Z } do not vary significantly and their effect in the background turbulence is small, an effect that is achieved by the outcome transformation described in Sec.3.1.2.The formulation of the transport fluxes in gyro-Bohm units ensures that most turbulent transport sensitivities are captured by the logarithmic gradients.Future work will explore the possibility to encompass the "residual" parameters into a correction factor to the GPs, which can further aid convergence.
The use of Gaussian process regression models as surrogates works well with limited time-averaging when turbulence saturation is defined, as estimated error bars can be used for model training.This provides important advantages to model pressure gradient driven turbulent transport, as it is often the case that in near-marginal stability conditions large oscillations, predatorprey behavior and the need for long time averages are required.By the proper definition of evaluation error and its use to train the surrogate models, we prevent the solver from getting stuck below or at critical gradients.Furthermore, it is generally the case that PORTALS does not "waste time" in stable regions, with otherwise small local derivatives with respect to gradients, as the global nature of the surrogates can quickly bring the plasma to turbulent conditions, by making larger steps than would otherwise be requested by local Jacobian methods with fixed step size.
An additional important advantage of PORTALS is that by further decoupling the macroscopic flux-matching problem from the local δf simulations we ensure robustness against nonlinear transport-target instabilities.In other words, because the macroscopic flux-matching is performed with the surrogate models, large number of evaluations can be made, and heuristic, global methods such as genetic algorithms, can be leveraged to find flux-matching conditions (e.g., as in Ref [84]).Cases with targets that are a strong function of profiles (e.g., burning plasmas via alpha heating, Ohmic plasmas via energy exchange), do not require more iterations than plasmas with de-coupled sources.
On the other hand, the formulation in PORTALS presents also challenges compared to standard methods.As discussed in Sec.3.2, PORTALS is a tool that is more efficient with high-fidelity modeling (e.g., nonlinear gyrokinetics), as the overhead introduced by both surrogate fitting and acquisition optimization can be substantial.This makes the use of these methods with low (e.g., analytical) and medium fidelity (e.g., quasilinear) transport models more expensive.While this is generally a limitation of Bayesian optimization -developed explicitly for expensive black-box function optimizationfuture work will be devoted to make the surrogate training and acquisition optimization more efficient.
As with most probabilistic-modeling optimization methods, the techniques developed here are subject to some vulnerabilities: 1) Sensitivity to bounds: allowable values for the choice of free logarithmic gradients may limit the search space.This challenge could be mitigated by allowing extrapolations of the GP outside of its training bounds, or to perform a pre-optimization step that characterizes the magnitude of the fluxes vs reasonable targets and selects the appropriate bounds.In the profile predictions performed so far, defining bounds ±50% or ±100% the experimental (or predicted by quasilinear models) gradients was enough to ensure the existence of a steady-state solution for the high fidelity nonlinear gyrokinetic system, but this remains a choice of the expert user, as a trade-off with surrogate accuracy may be considered.If bounds are too large, very stiff behavior or transitions between unstable micro-instabilities can result in the requirement to observe more data points for accurate surrogate representation of transport fluxes.2) Overfitting: small training sets could limit PORTALS' optimization potential.This challenge could be mitigated by the introduction of some exploration in the acquisition function, either through the acquisition function definition (e.g., expected improvement or others) or via the inclusion of random samples recurrently.
3) Scale disparity: difficulties arise when training GP models that aim to capture outputs of very dissimilar magnitude (e.g., order of magnitude differences between near-stable and unstable turbulence), with heteroskedastic noise estimations (e.g., differences in turbulence autocorrelation periods depending on gradient value), and with covariance length-scales strongly non-stationary (e.g., pre and post critical gradient).These issues can be approached by the development of proper pre-training transformation, better priors and kernel definitions, which will be subject of future work.4) Oscillatory convergence: defining proper convergence metrics becomes challenging.As discussed in Sec. 5, the stiff behavior of tokamak core transport results in large variations in fluxes that may oscillate around targets with minimal change in free parameters.When using fast transport models, convergence metrics can be developed that account for how much variation in input space has happened for a number of iterations, but the application of such techniques with expensive gyrokinetic codes remains challenging.

Potential for future work
While the PORTALS techniques have shown promising results and great performance in the predictions using nonlinear CGYRO, there are potential avenues for improvement.As part of future work, further development of the surrogatebased optimization techniques will involve a more careful selection and investigation of Bayesian optimization aspects to further increase the efficiency of the formulation.These may include physics-informed GP surrogate models (e.g. that capture critical gradient behavior and that considers the expected smoothness of the flux responses), other acquisition functions (accounting for the exploration / exploitation tradeoff) and better metrics to determine if converged-gradients solutions have been achieved.
In the context of optimization and reactor scenario design, the techniques developed here are also suitable for the high-fidelity optimization with direct numerical simulations.During the construction of surrogate models and as the simulation approaches converged solutions, it is possible to leverage the current information for decision making and modification of input parameters that were not part of the original free parameters.For example, as kinetic profiles are predicted with fixed geometry, if the plasma is predicted to not reach desirable performance goals (e.g.fusion power or L-H thresholds), it is possible to modify the geometry and drive the system to optimal solutions before converging the profiles in the early stages.This sort of "early stopping" or "sequential decision making" can be vital to maximize the high-fidelity physics information during the design of new fusion devices and will be explored as part of future work.
An interesting potential avenue for future work is the exploitation of the uncertainty quantification properties of the GP surrogate models for sensitivity studies of modeling assumptions.The propagation of uncertainties in the assumed models or boundary conditions could provide valuable insights into the robustness of the predictions and their validation against experimental data.Once trained, the GP models can be used to perform sensitivity studies of the input parameters, by finding flux-matched conditions not only for the mean of the posterior distribution, but also for the upper and lower bounds of the confidence intervals.This capability has not been explored yet but will be subject of future investigation.

Conclusions and Prospects
This paper has presented the approach of formulating the multi-channel steady-state transport equations in tokamak geometry as a surrogate-based optimization problem.The implementation of this approach in PORTALS is demonstrated to reduce the number of δf transport simulations required to bring the plasma to flux-matching conditions, as compared to standard methods.Advantages of the framework were discussed, particularly those relevant to power plant design and burning plasma simulations.This paper has not delved into any particular transport model validation study nor has provided practical burning plasma predictions -which are done as part of separate publications (e.g.[1,34,51,54,55,57])-.In describing this unique formulation and solution to the steady-state transport problem, we aimed to offer a comprehensive perspective on PORTALS' potential and its implications for future fusion research.
After outlining the general guidelines for the accurate prediction of burning plasmas and demonstrating the efficiency of PORTALS, it is essential to look towards the future landscape of fusion research.As the field progresses within the burning plasma era, with new experiments coming online soon, the dynamics of turbulence and core plasma transport remain central to designing efficient reactors that can adapt to energy market needs while remaining economically attractive.The behaviors and intricacies of turbulence and transport are pivotal for optimizing the operational point and performance of fusion devices, as discussed and demonstrated in previous sections.
While the foundational theories of nonlinear gyrokinetics and new workflows that leverage advances in computer and data science have provided robust frameworks to simulate and predict core performance (such as PORTALS-CGYRO presented here but also TANGO-GENE [22,85] and TRINITY-GX [15] targeted for high-fidelity transport modeling), there remains an ongoing endeavor to refine and accelerate plasma models.Furthermore, this era, marked by novel experiments coming online and advancements in diagnostics in current facilities, offers a unique vantage point.Validation of transport models, physics assumptions and numerical techniques will be crucial going forward.Realworld experimental data from D-D and D-T facilities will serve as benchmarks against which turbulence models and simulation techniques can be tested, ensuring accurate representation of the underlying physics and reliable prediction of fusion energy systems behavior.This is particularly important for alpha heating physics and the study of power plant relevant conditions at the plasma edge of D-T plasmas.It will also be an exciting frontier to consider using surrogate-based models to accelerate multi-machine and multi-physics validation.
As new experimental findings help us shape the understanding of burningplasma dynamics, the importance of collaboration between experimentalists and theorists and between private and public efforts will be paramount and the only way to unlock the full potential of fusion.This synergy will drive improvements in prediction accuracy and computational efficiency, ultimately enabling more accurate control and optimization of core performance.It is, in fact, interesting to look at the history of the fission industry and analyze the potential parallels for anticipating the evolution of core transport in fusion research.With the establishment and maturation of commercial fission reactors, the emphasis in fundamental nuclear reactor physics and neutron transport research shifted towards optimization, reducing reactor cost, increasing reliability and addressing emerging challenges.We expect, and already observe, a similar trend in our community, with shifted focus on optimizing operational points, accelerating transport models and starting to use simulations during the design of fusion power plants.
Building on our understanding and new approaches to plasma dynamics, our work with accelerated core transport solvers and GPU-accelerated transport models opens doors to faster advancements in fusion reactor design and operation, along with the development of better models with more physics included.Our strategy, enriched by real-world experiments and innovative computational tools, is geared up to tackle the challenges and unlock new potentials in core research, which was only possible thanks to decades of pioneering work by theorists, modelers and experimentalists in many scientific fields.This collaborative and integrative effort signifies a progression from foundational theories to practical application, aimed at optimizing the potential of fusion energy in meeting contemporary needs.

Fig. 1
Fig.1Visualization of problem geometry and free parameters of a three-channel prediction.a) Illustration of nested flux surfaces for a DIII-D ITER Similar Shape plasma[34], with magnetic axis in dashed black line, 5 selected flux surfaces in blue (cross section in purple) and last closed flux surface in green.Note that the lower x-point region is smoothed out due to Fourier-moments decomposition typical of transport solvers such as TRANSP[32].Subplots b), d) and f) depict the piece-wise linear function representations of the normalized logarithmic gradient of electron temperature, ion temperature and electron density with the value at 5 selected flux surfaces (values to be predicted by the transport solver).Subplots c), e) and g) depict the corresponding kinetic profiles of each channel, produced by radial integration of logarithmic gradient from an edge boundary condition (dark blue).

ρ s /a) 2 energy
flux → Q GB = n e T e c s (ρ s /a) 2 momentum flux → Π GB = an e T e (ρ s /a) 2 energy exchange → S GB = n e T e c s /a (ρ s /a) 2

Fig. 2
Fig. 2 Performance of PORTALS at achieving 10× and 100× residual reduction with TGLF in two example H-mode plasmas: DIII-D ITER Similar Shape (ISS) and ITER Baseline [34].Random and SR initialization methods are compared, starting from the same initial condition.(top) Residual vs evaluations, with vertical line delimiting initialization phase and purple horizontal lines indicating the 10× and 100× residual reductions.(bottom) Violin plots representing distribution of number of evaluations that were required to achieve residual goals.Both methods were initialized with a random seed, and distributions were obtained for 16 seeds.As expected, SR is not strongly affected by the seed, particularly at the beginning of the flux matching process.

Fig. 3
Fig. 3 Summary of effect of radial grid on integrated exchange power calculation on an example L-mode plasma.Electron temperature, ion temperature and electron density are parameterized with piecewise linear logarithmic gradients.The effect of three grids on the calculation of targets are shown ([blue] full profile, [green] 20 grid points and [red] 5 grid points).While kinetic profiles and local power density overlap at grid points, as expected, the integrated exchange power can have significant deviations due to the volume integration.

Fig. 5
Fig.5Characteristic PORTALS phases during the prediction of three burning plasmas ([left] SPARC Primary Reference Discharge (PRD)[53], [center] ARC[54] and [right] ITER[55]) with nonlinear CGYRO.Phase I is the initial surrogate training phase, Phase II represents the first PORTALS iterations (often with oscillatory behavior as it learns key parametric dependencies) and Phase III is the final convergence phase.The transition from II to III is smooth and not characterized by a change in the PORTALS iteration scheme but it represents a transition to a situation where the surrogates are capturing the turbulence dynamics with enough accuracy to drive the system towards steady state.Bottom subplots show the evolution of the residual per radial location separately, shaded from red (core) to blue (edge).

Fig. 6
Fig. 6 Example of GP model predictions (left: Qe, center: Q i and right: Γe) around the last point evaluated after first converged profiles from Fig. 7 (point #12) at three representative locations.The example Qe at the leftmost plot is only fitted to logarithmic gradients because the boundary condition for this profile prediction (r/a = 0.943) is not allowed to vary, but the logarithmic gradients can.

Fig. 11
Fig. 11 Evaluation of fusion power in SPARC PRD plasma using random combinations of gradient profiles in temperature and density inside of r/a < 0.35.a) Ion temperature and inverse normalized gradient scale length, b) electron density and inverse normalized gradient scale length, c) fusion power density profile and radial derivative of its volume integral, and d) ratio of volume integrated fusion power between random profiles and original (linear interpolation from r/a = 0.35 to 0.0).Histogram of total fusion power is added on the right for visualization purposes.

Fig. 14
Fig. 14 Time traces of turbulent transport fluxes from CGYRO simulations at r/a = 0.55.Six moments (electron energy, ion energy, electron particle, lithium particle, momentum and energy exchange fluxes) are plotted for the same three iterations in Fig.13.Each plot contains the value of the target turbulent flux (targets with neoclassical transport subtracted) as dashed horizontal lines and the turbulent transport flux as solid horizontal line, represented as the mean of the time traces once they saturate (∆t ∼ 350a/cs) with estimated confidence bounds around them.Each plot contains also a zoomed-in subplot around targets for the near-convergence cases.

Fig. 16
Fig.16 Example of predicted lithium particle transport coefficients with CGYRO for the DIII-D simulation presented in Ref.[34].Coefficients calculated with background profiles from (black) Q i -matched, standalone simulations and (red) 3-channel profile predictions.