Neural networks for turbulent transport prediction in a simplified model of tokamak plasmas

The method of using neural networks (NNs) for turbulent transport prediction in a simplified model of tokamak plasmas is explored. The NNs are trained on a database obtained via test-particle simulations of a transport model in the slab-geometrical approximation. It consists of a five-dimensional input of transport model parameters, and the radial diffusion coefficient as output. The NNs display fast and efficient convergence, a validation error below 2 % , and predictions in excellent agreement with the real data, obtained orders of magnitude faster than test-particle simulations. In comparison to a spline interpolation, the NN outperforms, exhibiting better predicting and extrapolating capabilities. We demonstrate the preciseness and efficiency of this method as a proof-of-concept, establishing a promising approach for future, more comprehensive research on the use of NNs for transport predictions in tokamak plasmas.


Introduction
Nuclear fusion is one of the most promising solutions for a clean and reliable energy source to meet the world's current energy demands.Consequently, there is a significant and ongoing interest in developing a functional nuclear fusion reactor.For almost 70 years, the scientific community has struggled with the challenges of nuclear fusion, with confinement being the central issue.
Turbulence, while present in many physical systems such as the atmosphere, the oceans, and astrophysical media, gives rise to one of the main difficulties regarding the confinement of plasma within a nuclear reactor, namely, the turbulent transport.In such environments, turbulence is mainly represented by the turbulent electric potential, and it leads to one of the primary transport mechanisms of energy and particles.The large-scale, drift-type instabilities have the most significant impact on the dynamics of charged particles, particularly the ion-temperature-gradient (ITG) and the trappedelectron-mode (TEM).
The simplest way to characterize the transport of matter or heat within the paradigm of local transport is through the macroscopic transport coefficients, i.e. the diffusion and the average velocity.
The regimes discussed in this work, while anomalous, are purely diffusive and do not exhibit sub-or super-diffusive behavior, i.e. non-local transport.
Considering its complexity and the inherently stochastic nature of turbulence, the approaches to studying turbulent transport are numerical in nature.The main method for studying plasma dynamics, turbulence evolution, and transport is through gyrokinetic simulations [1][2][3].This technique is used to describe the collective behavior of the plasma in the approximation of the particle gyromotion, by solving an associated kinetic equation coupled to a Poisson-like equation.Although precise, gyrokinetic simulations are computationally demanding and require significant computational resources, in part due to the matter-fields self-consistency of the problem.
Another approach to the issue of turbulent transport is through test-particle simulations, or direct numerical simulations (DNS) [4,5].The working principle is to follow individual particles in an ensemble of given electromagnetic configurations, not taking into consideration the collective interaction of the plasma with the electromagnetic field; using these trajectories, inferences on the transport coefficients can be made.They are significantly more time-efficient and less computationally demanding than gyrokinetic simulations due to the removal of self-consistency, and are convenient for studying the confinement of particles in various configurations and regimes.For these reasons, in the present work we use test-particle simulations to evaluate the turbulent transport in tokamak plasmas.
Nonetheless, the model for test-particle simulations depends on turbulence parameters, which are largely unknown, and plasma parameters, which must be varied in order to accommodate for different tokamak devices and turbulence states.Therefore, although this approach is faster than gyrokinetic simulations, it is far too slow for applications where the transport coefficients could be useful, such as integrated modeling and real-time control applications [6].One promising solution for this problem lies in the rapidly-evolving field of neural networks.
In the recent years, there have been substantial advancements in machine learning; neural networks (NNs) are a particular branch of this domain.NNs are collections of interconnected neurons organized in layers, which are able to "learn" through extensive training and comprehensive processing of existing data, and, afterwards, make predictions in order to perform tasks such as pattern recognition, classification, or nonlinear regression.As a brief summary of the working principle of NNs: to each neuron-neuron connection there corresponds a weight, and to each individual neuron, a bias; the preexisting data fed to the NN is structured into "inputs" and "outputs", and the purpose of the trained NN is to predict the "output", given the "input"; during the training, the weights and biases of the NN are iteratively adjusted in order to replicate the real data as closely as possible.
In this work, we aim to use NNs as a tool for predicting turbulent transport in a simplified model of tokamak fusion plasmas.In order to train the NN, we construct a database with inputs consisting of plasma and tokamak parameters, and outputs consisting of asymptotic diffusion coefficients.A schematic preview of the NN building components approached in this paper is shown in Figure 1.It must be noted that the scope of this paper is a proof-of-principle, focusing solely on constructing the database and training the NN; it is meant to underline the precision and viability of future uses of this technique.Hence, we wish to lay the building blocks for future works in which we intend to use a more robust model of tokamak plasma dynamics, and construct a more versatile training database (i.e.The rest of the manuscript is structured as follows: in Section 2, we describe the transport model ( 2.1), the numerical methods through which we study the system (2.2), and the NN setup (2.3); Section 3 outlines the numerical details; in Section 4, we present the results, and Section 5 addresses the conclusions and outlook.

The transport model
We consider a tokamak plasma configuration in the slab-geometrical approximation; this allows us to simplify the equations of motion (EOMs) of charged particles, while capturing the most important details of the dynamics.The EOMs are described in field-aligned coordinates (, , ) [7] related to the radial, poloidal and toroidal coordinates (, , ) as follows: with (0) -the safety factor evaluated at  = 0;  0 =  0 /2; and  0 ,  0 -the major and minor radii of the tokamak.(, , ) are obtained in the large-aspect-ratio limit ( 0 / 0 ≪ 1) of the natural field-aligned coordinate system.The plasma is immersed in a strong, constant magnetic field, which is oriented along the -direction (also denoted as "parallel"): B =  0 • ê , with ê -the contravariant versor along .In this slab-geometrical limit, we can write the EOMs for the dynamics of ions of mass  and charge , in the presence of a turbulent electric potential   (x, ), in the guiding-center approximation, with the particles' trajectories are described by (x ⊥ ,  ∥ ,  ∥ , ), as follows: The "⊥" in equation ( 2) denotes the perpendicular, (, ) direction, and we recognize the first two terms as the E × B and the polarization drifts, respectively.The last term of the equation is a simplified version of the curvature and grad-B drifts, in the slab approximation, evaluated at the low-field side of the plasma, { = 0,  = 0} ≡ { = /2,  = 0}, in the equatorial plane [8], with  =  2 ⊥ /2 -the magnetic moment, and  ⊥ -the particle velocity in the perpendicular plane.
The turbulence spectrum (k) represents the Fourier transform of the Eulerian autocorrelation function of the turbulent potential, E ((x, ), (x ′ ,  ′ )) = ⟨(x, )(x ′ ,  ′ )⟩.To simplify the model, we only take into account an ITG-driven turbulence spectrum (k) (considering the TEM components negligible).Experimental evidence and gyrokinetic simulations [13][14][15][16] have shown that the ITG turbulence spectrum exhibits a single peaked structure along the radial and parallel directions, and a double-peaked structure along the -direction; therefore, we use an analytical form of (k) [12,17] that is in accordance with these observations: The parameters   ,   and   represent the correlation lengths along their respective directions, and  0 has an influence on the positions of the two symmetrical maxima of the   spectrum.We assume that the frequencies follow a linear dispersion relation, the latter of which was obtained by setting the electron temperature equal to the ion temperature, We consider a single species of particles with mass number  and charge number , in thermal and collisional ionization equilibrium with the bulk plasma.Consequently, the particles' kinetic energies, We are interested in the radial diffusion coefficient: For the scope of this paper, we limit ourselves to diffusive regimes of transport, for which we compute the asymptotic radial diffusion coefficient, lim

The statistical approach and numerical implementation
In order to investigate the turbulent transport, we implement the transport model described in Section 2.1 using a test-particle method, or Direct Numerical Simulations (DNS) [4,18].This exact-in-principle method mimics real trajectories x() resulting from the EOMs (2)-( 5) in different turbulent realizations; this is achieved by computing the trajectories using an explicit representation of the turbulent fields.
The turbulent electric potentials are constructed as an ensemble of dimension   of stochastic, zero-averaged, homogeneous random fields {(x, )}.The effects of intermittency on the distribution of the turbulent potential have already been studied in a previous work [19] and have been found to be minimal; thus, we can assume the Gaussianity of the fields.The ensemble of potentials {(x, )} drives an associated ensemble of trajectories {x()} according to the EOMs ( 2)-( 5); the transport coefficients are then computed as Lagrangian statistical averages over the resulting trajectories.We use a discrete Fourier representation of the potential [4]: with   -number of partial waves;   -ensemble dimension; k  -wavevectors, computed as independent random variables with PDFs corresponding to the spectrum (k) (7); (k  ) -frequencies (8); and  (k  ) ∈ [−, ] -uniformly distributed random phases.The representation (10) ensures that, in the limit of   → ∞, the Gaussianity of the resulting fields is guaranteed through the Central Limit Theorem, and in the limit of   → ∞, the resulting fields converge to the desired statistical properties described above, such as the appropriate autocorrelation function associated with the turbulence spectrum.
From a numerical point of view, we generate   = 10 2 wavevectors k for each of the   = 1.5 × 10 5 realizations of the potential, using the Acceptance-Rejection Method; similarly, the random phases  (k) and the associated frequencies (k) are constructed.For each of the   realizations of the field, the EOMs (2)-( 5) are directly solved with a 4 ℎ order Runge-Kutta method, obtaining   trajectories which are then used to compute the asymptotic radial diffusion coefficient (2).The simulation time is fixed,  max = 50, while the time-step  is chosen a priori in accordance with the parameters of each simulation, assuring that it captures the particle motion: The time-step is fixed for each simulation, and, on average, ⟨⟩ ≈ 0.06.
Figure 2 shows some typical time-dependent radial diffusion profiles, for three distinct values of the turbulence amplitude Φ; the remaining free parameters are fixed, with  * = 1,   = 5,   = 5,   = 1.5.For small times,  ≪ 1, the Lagrangian radial velocities of the particles are   (x(), ) ≈   (x(0), 0), and the resulting radial diffusion is D () ≈ ⟨ 2  (0)⟩.The running diffusion reaches its peak around the time-of-flight,  =  fl , which is a measure of the time in which the space correlation is lost, and can be approximated as the ratio between the characteristic space-scale of the turbulence and the average amplitude of the velocity field,  fl =   / √︁ ⟨ 2  (0)⟩ [20].This results in a peak diffusion D ( fl ) ∝ Φ   −1   .The diffusive features of the process can be seen in the behavior of the asymptotic diffusion, which saturates at a constant value for  ≫  fl .

Architecture of Neural Networks
Neural networks are machine-learning algorithms with a specific architecture that are able to model the relation between input-output variables based on pre-existing data [21].They are adaptable to a vast number of frameworks, systems and tasks, such as pattern recognition/classification [22], clustering/categorization [23], function approximation [24,25], prediction [26], and even dynamic control [27], working with both discrete and continuous inputs and outputs, and can be applied in a variety of domains.NNs can be classified according to a multitude of characteristics, such as the direction of signal propagation (feed-forward/feed-backward), the number of hidden layers (NNs/deep NNs), the optimization and back-propagation algorithms etc.
The primary structure of a NN consists of layers, which are of three types: one input layer ( 0 ≡   ), one output layer ( +1 ≡   ), and between them, multiple hidden layers (  ,  = 1, ).
The building blocks of the layers are called neurons (based on the slight resemblance between NNs and the biological brain), and each layer can have a different number of neurons (denoted   ).The architecture of a generic NN is schematically represented in Figure 3.The inputs and outputs of the NN must be data that can be expressed numerically, such that each neuron of the network represents a single numerical value.In this study, the input is the 5-dimensional set of free parameters of the transport model,  * ,   ,   ,   , Φ , and the output is the single value of the asymptotic radial diffusion, D ∞ .Equation (12) describes how     -the value of the  ℎ neuron of the   ℎ layer,     , is computed: •  ,   −1 corresponds to the weight attributed to the value of the  ℎ neuron  ,   −1 of the previous layer  −1 ; • the function  denotes the activation function, which can be of many types, but is usually chosen either the Sigmoid or the tanh function; •     corresponds to the bias attributed to the  ℎ neuron of layer   , and it shifts the interval of the activation function's input.
Using a matrix notation, the compact form of the equation for all the neurons in a given layer   is: where  is the total number of hidden layers.
The NN is trained on preexisting data with the goal of finding the optimal values of the weights and biases in order to minimize the error between the real outputs and its predictions.
where we take the average ⟨•⟩ over all the diffusion values of the database, with D ∞ -real value of the training data (TD) or the validation data (VD), and  ∞ -value predicted by the network.
• Optimization.The most common optimization method used is the Stochastic Gradient Descent (SGD) [29]; in each iteration, a batch is randomly selected, and the gradient of the loss function with respect to the weights is computed; the weights are then updated in the opposite direction of the gradient, to reduce the loss.In this work, we employ a widely used variant of SGD -the Adaptive Moment Estimation (ADAM) method [30], which is an adaptive-learning-rate method.

Numerical details of the NN and the database
The free parameters of the model described in Section 2 can be divided into three categories: tokamak parameters, plasma equilibrium profile parameters, and turbulence parameters.Starting with the latter, these variables characterize the turbulence profile of the electric potential, such as the correlation lengths   ,   and   , the position of the maxima of the   spectrum, ± 0 , and the turbulence strength Φ.
The scaled diamagnetic drift velocity,  * , the ion temperature, the particle density, and the pressure gradients are examples of plasma equilibrium profile parameters.The tokamak parameters characterize the specific device that's to be studied; these include the major and minor radii,  0 and  0 , the intensity of the magnetic field,  0 , and, implicitly, the magnitude of the grad-B and curvature drifts.Due to the large number of variables, we choose to restrict our research to only one tokamak device, the ASDEX Upgrade (AUG); this sets fixed values for all the tokamak parameters.We can reproduce the same characteristics of the   spectrum (eq.( 7)), as well as the corresponding asymptotic diffusion, by fixing  0 and only varying the correlation length   .Hence, the remaining free parameters of the model (which will be input parameters for our NN) are: the diamagnetic velocity  * , the three correlation lengths   ,   , and   , and the turbulence strength Φ.

Table I:
Values of the fixed and free parameters of the model, in accordance with [31,32].
Due to the random initialization of the weights and biases, no two trained nets result in the same exact configuration.While a point as close as possible to the global minimum should be reached for all the training processes, the paths taken to achieve it differ.Moreover, we can never be certain that the global minimum is reached.Therefore, each trained net yields slightly different predictions, and while the error of either one is minimal, it isn't negligible.A viable workaround for this issue is training multiple nets and constructing the predictions as averages; in literature, this is coined as an ensemble neural network model [33].The primary reason this linear function of nets yields better results than any of the constituent NNs is in virtue of the Central Limit Theorem, as the errors and biases of the individual networks tend to cancel out and, for large enough ensembles, approach the true value of the asymptotic diffusion.In this work, we trained multiple NNs with architectures corresponding to combinations of 2, 3, and 4 layers, and 15, 30, 45 and 60 neurons (not including the 5 inputs and 1 output); this leads to an ensemble of 12 NNs with varied architectures, and the results presented below are averages of their predictions.
In order to improve the accuracy of the network and to speed up the training process, we normalize the input and output data prior to feeding it to the NN.This pre-processing step is crucial if the activation function used is the Sigmoid, tanh, or similar functions, as their outputs are bounded between 0 and −1, or ±1.The most common way to standardize the data is rescaling it with a mean of 0 and a variance of 1.However, for the dataset at hand, this leaves a lot of output values outside the range of the tanh activation function.Therefore, we rescale the inputs/outputs to be bounded between ±1, by applying a linear transformation: with  representing either the training inputs, or the training outputs of the NN.This normalization assures that the activation function is not linear for the given parameter intervals, and that the output values of the NN are, indeed, bounded between ±1.
Regarding the dimension of the TD, we conclude that a training set consisting of  = 10 5 datapoints should suffice for the purposes of the present work, as the accuracy of the NN tends to converge at  ∝ 10 4 datapoints.The dependence of the final percentage loss error E 1 on the dimension of the dataset can be seen in Figure 4. We chose to distribute the parameters uniformly inside a 5-dimensional hypercube, assigning 10 values to each of the five variables.The decision to use a uniform grid for the parameters (rather than sampling random values within the intervals of interest) was motivated by the intention to compare the performance of the NN with that of a simple interpolation, which works best when the input parameters are distributed on an equidistant grid.
In addition to the 10 5 values obtained by test-particle simulations, we complete the TD with 10 4 analytic values, for Φ = 0 and for which the asymptotic diffusion is D ∞ = 0.For the validation and testing of the NN, we construct a validation dataset consisting of 2 × 10 4 randomly generated points inside the 5−dimensional hypercube of parameters.We complete this testing database with: points for which we vary each parameter individually (keeping the other 4 parameters constant); points for which we vary the parameters in pairs (keeping 3 parameters constant), and so on.The structural details of the whole database are summarized in Table II.

Training process and prediction accuracy of the NN
Let us take a closer look at the training process and the convergence properties of the aforementioned NN ensemble.The training stops when a criterion is met, such as the relative error between the loss of two consecutive rounds being less than a set value, or obtaining the same error for an arbitrary number of rounds, both of which indicate saturation.In this work, we used both criteria, demanding that the error function is constant and below the threshold for 10 3 training rounds.On average, the NNs need around 2.5 × 10 3 training rounds before convergence is achieved, which is equivalent to ∼2 hours of elapsed time on a personal CPU.A typical evolution of the error loss function E 1 during the training is presented in Figure 5; the net trained in this example is structured in 2 hidden layers, with 20 and 10 neurons, respectively.We see that the the NN saturates at a constant value around the 2000 ℎ training round; the constant decrease and the saturation indicate that the net has reached a point close to the global minimum, and is not overfitting the data.
Figure 6a shows the relative error distribution between the actual values of the asymptotic diffusion (D ∞ ) and the values predicted by the NN ( ∞ ): We see that the error is centered around 0%, with a standard deviation of ∼ 2% for the TD, and ∼1.5% for the VD.In Figure 6b, we plot the dependence between the real and the predicted values of the diffusion.We see that the error is evenly distributed on the studied intervals and is not clustered in any specific regions; this, together with the symmetry of the error distribution of Figure 6a, indicates that the output of the NN is not biased (i.e.systematically sub-or over-estimating the diffusion), nor does it fail for specific values or intervals of the parameters.
Further on, we inquire the predicting capabilities of the trained network ensemble by looking at the testing datasets  1−4 (see Table II).When one or more parameters are not varied, they are fixed at the following base values, around the middle of the intervals used:  * = 1,   = 5,   = 5,   = 1.5, Φ = 0.05.In Figures 7a to 7e, we compare the real values of the asymptotic radial diffusion (orange points) with the ones predicted by the NN (blue lines), varying each of the 5 parameters individually; in these figures, the green lines correspond to the predictions of an interpolation, which are discussed further in Section 4.2.In Figures 8a to 8c, we plot the real values of the diffusion (red points) over the surfaces of the NN predictions, with two parameters varied at a time, and the rest fixed.
We note that although the dependencies of the asymptotic diffusion are fairly simple, the predictions made by the NN ensemble are unexpectedly close to the real values; more details on the accuracy of the predictions are presented in Table III.

Predicting turbulent transport using NNs vs interpolation
There is a question of whether the predictive tasks assigned to the NN could be more easily achieved through a simple interpolation of the TD.The latter has been constructed using the built-in Interpolation function of Wolfram Mathematica v11.3 [34], using Method → "Spline".After constructing the InterpolatingFunction, we look at its predictions on the testing dataset  1 (the green squares of Figures 7a to 7e   IV.
One noteworthy advantage of NNs is that their training doesn't require the input parameters to be distributed on an equidistant grid, which makes it easier to add further values to the initial TD.
Therefore, we conclude that the NN ensemble is slightly more accurate and convenient within the training data range, and significantly so outside it.

A word on the physical mechanisms
The main goal of this work is to give a detailed description of the methodology for the development of NNs for predicting turbulent transport.Evidently, this goes hand in hand with the development of an associated database; the latter was constructed with the aid of test-particle numerical simulations, in the frame-work of a statistical approach on a simplified transport model.
Although not directly related to the NN, it is useful to have an understanding of the physical processes involved in the transport, and of how various plasma and turbulence properties impact it.
This might be significant because, while not employed in this study, having previous knowledge of the gross dependence between inputs and outputs could enable us to develop more reliable databases (perhaps non-uniform grids), or even use specific activation functions in the NNs.The mathematical and physical factors beyond the dependence of the output, D ∞ , on each of the input parameters,  * ,   ,   ,   , Φ , will be individually examined.
First, let us take a look at the transport model of eqs.( 2) -(5).Perhaps the simplest limit of that model is when the particles considered are very cold (  = 0, which implies  ∥ = 0,  = 0), the turbulent field is frozen ( = 0, or flat pressure profiles), and the scale of the parallel fluctuations of the turbulence is very large (  → ∞).In this limit, it is straightforward to see that the only term remaining is the E×B drift in a time-independent potential, which is equivalent with a 2D Hamiltonian system.Consequently, all the trajectories are closed (i.e.trapped particles), and the running diffusion coefficient has an algebraic decay to 0 with time [20]; this is due to the fact that the particles remain correlated at any latter time.
The next level of complexity is considering non-zero temperatures and a finite parallel correlation length.This implies that the particles experience parallel velocities (even in the absence of a parallel acceleration) which, in the wave decomposition of the turbulent field, makes Thus, a frequency-like term (   ∥ ) is acquired by the field, which makes it time-dependent.The Hamiltonian characteristic of the dynamics is broken and the trajectories are not closed anymore; they are able to explore various regions in the perpendicular plane, be scattered by the potential landscape, and exhibit finite (non-zero) diffusion.In essence, this is a decorrelation mechanism induced through the motion in the landscape of parallel fluctuations of the turbulence; an associated decorrelation time would be   =   / ∥ .It is expected that the parallel acceleration will contribute to this decorrelation by introducing more stochasticity in the parallel velocities, yet it is difficult to infer how much more.
Another decorrelation mechanism that affects the transport is the time-dependence of the turbulence.If the pressure profile (or temperature profile, in the case of ITG) is non-flat, a diamagnetic drift velocity is present and the frequencies are non-zero (as in eq. ( 8)).The problem is that these frequencies are proportional to the wavenumber   , which has a direct effect on the E × B component of the motion in the radial direction.In fact, it can be shown that a linear dispersion relation (k) =    * is equivalent with a turbulent potential drifting along the -direction with velocity  * , exactly as desired.
The effect of this drift characteristic of the turbulence is that some of the particles remain trapped, but exhibit elongated trajectories, while others are trapped, but are carried away along the -axis; the latter is equivalent with open trajectories.This break of topology induces a strong suppression of the radial transport, which can make diffusion fall to zero much faster than in the purely frozen case.On the other hand, our frequencies do not follow an exact dispersion relation; this makes them similar to a superposition between a drift motion and a random evolution of the field.The latter is a decorrelation mechanism which tends to increase the transport.
The main component of the radial transport in eqs.( 2) -( 5) is the E × B drift, that amounts to ∼   .This implies that a gross measure of the radial velocities experienced by the particles is At the level of a quasilinear analysis, the diffusion coefficient can be evaluated as  2    .Depending on which of the decorrelation mechanisms is faster,   can be estimated as   /  ,   / ∥ , or even something more complex derived from the frequency dispersion relation.Now we can understand the behavior of the radial diffusion with each of the five parameters (Figures 7a to 7e).If the perpendicular decorrelation is the one that dominates, then the Kubo number [20] is  ★ = Φ  /  .It is known that the turbulent transport is usually anomalous and D ∞ ∼   .This explains why the asymptotic radial diffusion drops roughly as  −1  (Fig. 7c) and grows almost linearly with Φ and   (Figures 7a and 7e).
Things are more complicated for the two remaining dependencies,   and  * .The decorrelation time is always a complicated mix between all physical processes involved.Although it seems that   =   /  gives a good gross estimation of this time, it is not the only contribution.As already discussed, the parallel motion through the turbulent field also decorrelates the trajectories, with a decorrelation time of ∼   /⟨ ∥ ⟩.In principle, D ( → ∞) = D frozen (  ); this explains the influence of   on the diffusion (Fig. 7d), as it seems to follow the running diffusion profile.
Regarding the dependence with the diamagnetic drift velocity  * , the competition between the two mechanisms is obvious: when  * is small, its main effect is to decorrelate the trajectories and thus to lower the decorrealtion time and increase the diffusion; when  * is too large, the particle drift along with the field becomes stronger and the diffusion is lowered (Fig. 7a).

Conclusions and outlook
In the present work, we have investigated NNs as a tool for predicting turbulent transport in a simplified model of tokamak plasmas.The training and validation sets have been obtained through high-resolution test-particle simulations of ion dynamics in turbulent electric fields.The input of the NN is the 5-dimensional set of free parameters of the transport model,  * ,   ,   ,   , Φ , and the output is the single value of the asymptotic radial diffusion, D ∞ .
The final NN is constructed as an ensemble of 12 individual NNs with different architectures, all optimized with the ADAM method and using a normal Xavier initialization of the weights and biases.
The tanh activation function is employed in all nets since, prior to training, the data is rescaled to be constrained between ±1.
During the learning phase, the convergence properties of the NN proved to be fast, and the validation predictions, accurate -both indicators of efficient learning.In the testing phase, the NN predictions were in good agreement with the real data, with an average error of below 2%.Moreover, the NN was able to accurately reproduce dependencies with lower degrees of freedom, with one through four of the five parameters fixed, with an overall error of below 2%, as well.
Comparing the NN with a spline interpolation of the TD has shown that the former makes better predictions for both the validation set, and the testing sets with lower degrees of freedom,  1−4 .
Moreover, it is able to accurately extrapolate, with average errors of ∼5%.Another key advantage is that NNs can use unstructured data for training, unlike interpolation, which works well when the data is distributed on an equidistant grid; this facilitates adding new data to the initial TD, after it has been created.
Based on these results, we consider that training NNs for predicting turbulent transport in tokamak plasmas proves to be a precise and viable method, both accuracy-and time-wise.This approach shows advantages over other methods, such as interpolation, and is up to 10 6 times faster than a single testparticle simulation.While the transport model is simplified and the database is limited, the purpose of this article as a proof-of-concept for this method has been reached, establishing the framework for subsequent research in which a more robust model of tokamak plasma dynamics will be employed, and a more adaptable and extensive training database will be built.

Figure 1 :
Figure 1: Schematic preview of the NN building components approached in this paper.

Figure 2 :
Figure 2: Running radial diffusion profiles for three distinct values of the turbulence amplitude.

Figure 3 :
Figure 3: General architecture of a NN; the gray level of each connecting line symbolically represents the contribution (weight) of each neuron.

•
The training process has a number of components: • Initialization.Before the training begins, the weights and biases are initialized with random values according to a chosen initialization method.If the parameters start with the same values, or, in the case of the tanh or Sigmoid activation functions -with absolute values much greater than zero, this regularly slows down or hinders the learning process, leaving the NN in local minima without reaching the global minimum of the function.While the values can be initialized according to various distributions, the most appropriate for our dataset and the activation function used is the normal Xavier method [28] Loss function & training rounds.The training process consists of multiple rounds; during a training round, the dataset is split into several batches of a chosen dimension and is fed to the NN.A training round is complete when it went through the whole dataset, and what follows is the backpropagation, and the start of another round.During each round of training, the weights and biases of the NN are adjusted according to an optimization algorithm in order to decrease the error loss function.In this study, we compute the loss function as:

Figure 4 :
Figure 4: Dependence between the training (orange circles) and validation (blue triangles) error loss function E 1 and the dimension of the TD.

Figure 5 :
Figure 5: Typical evolution of the percentage error loss function E 1 during the training of one of the 12 NNs, consisting of 2 hidden layers with 20 and 10 neurons, respectively.

Figure 6 :
Figure 6: (Left) Relative error E 2 between the actual (D ∞ ) and the predicted values of the asymptotic diffusion ( ∞ ), for the training (orange, back) and for the VD (blue, front).(Right) Dependence between the actual and the predicted values of the asymptotic diffusion, for the TD (orange) and for the VD (blue).

Figure 7 :
Figure 7: Dependence of the asymptotic radial diffusion on the free parameters of the model, each one varied individually; all figures feature the actual data (orange points), the NN predictions (blue lines), and the interpolation predictions (green lines).
).While there aren't notable differences in the predictions of the NN ensemble and those of the InterpolatingFunction for the monotonic dependencies of the asymptotic diffusion with   , Φ, or   , the InterpolatingFunction shows some fluctuations for the more complex dependencies with  * and   .Overall, the InterpolatingFunction predictions are intrinsically exact on the TD, and sufficiently close to the real values on the VD and  1−4 .We also note that constructing the InterpolatingFunction takes mere seconds, whereas training the NN ensemble requires close to 24 hours.

Table II :
Dimensions and structure of the training, validation and testing databases; the testing sets  1−4 each have 1, 2, 3 or 4 free parameters varied, with the rest of the variables fixed, and the subsets represent combinations of the five input parameters.

Table III :
Statistics of the percentage error E 2 distribution, for the predictions of the NN trained on the TD.
some fluctuations of a numerical nature in the results.

Table IV :
Statistics of the percentage error E 2 distribution, for the predictions of the NN in comparison to the InterpolatingFunction, both trained on the truncated dataset.