MONKES: a fast neoclassical code for the evaluation of monoenergetic transport coefficients in stellarator plasmas

MONKES is a new neoclassical code for the evaluation of monoenergetic transport coefficients in stellarators. By means of a convergence study and benchmarks with other codes, it is shown that MONKES is accurate and efficient. The combination of spectral discretization in spatial and velocity coordinates with block sparsity allows MONKES to compute monoenergetic coefficients at low collisionality, in a single core, in approximately one minute. MONKES is sufficiently fast to be integrated into stellarator optimization codes for direct optimization of the bootstrap current and to be included in predictive transport suites. The code and data from this paper are available at https://github.com/JavierEscoto/MONKES/.


Introduction
Stellarators are an attractive alternative to tokamaks as future fusion reactors.While tokamaks require a large toroidal current to generate part of the magnetic field, in stellarators the field is produced entirely by external magnets.As a consequence, stellarators avoid current-induced instabilities and facilitate steady-state operation.These advantages come at the expense of making the magnetic field three-dimensional.In tokamaks, axisymmetry guarantees that the radial displacement that charged particles experience along their collisionless orbits averages to zero.Therefore, in the absence of collisions, all charged particles are confined.However, in a generic stellarator the orbitaveraged radial drift velocity does not vanish for trapped particles and they quickly drift out of the device.The combination of a non zero orbit-averaged radial drift and a small collision frequency (reactorrelevant fusion plasmas are weakly collisional in the core) produces, for a generic stellarator, intolerably large levels of neoclassical transport.
Hence, stellarator magnetic fields must be carefully designed in order to display good confinement properties.This process of tailoring of the magnetic field is called stellarator optimization.The objective of neoclassical optimization is to obtain a stellarator with levels of neoclassical losses equivalent or lower to those in an axisymmetric device.Stellarator magnetic fields in which the orbit-averaged radial magnetic drift is zero for all particles are called omnigenous [1].Thus, the goal of neoclassical optimization is to obtain magnetic fields which are close to omnigeneity.However, addressing only radial transport in the optimization process is not sufficient.In toroidal plasmas, the parallel flow of electrons and the rest of species is not, in general, balanced.This mismatch produces a net parallel current at each flux surface which, through Ampère's law, modifies the magnetic field B. When the current is generated by a combination of neoclassical mechanisms and non-zero plasma profile gradients, we speak of bootstrap current.The bootstrap current and its effect on the magnetic configuration must be taken into account in the design of optimized stellarator magnetic fields.
Two different subclasses of omnigenous stellarators have drawn particular attention: quasi-isodynamic (QI) and quasi-symmetric (QS) stellarators.Quasiisodynamic configurations are omnigeneous configurations in which the curves of constant magnetic field strength B := |B| on a flux surface close poloidally.This additional property has an important implication: exactly QI stellarators produce zero bootstrap current at low collisionality [2,3].Thanks to this feature, QI stellarators can control plasma-wall interaction by means of a divertor relying on a specific structure of islands, which could not be realized in the presence of large toroidal currents.The Wendelstein 7-X (W7-X) experiment was designed to be close to QI and demonstrates that theoretically based stellarator optimization can be applied to construct a device with much better confinement properties than any previous stellarator [4].Moreover, the bootstrap current produced in W7-X plasmas is smaller than in non-optimized machines [5].However, despite its success, there is still room for improvement.The two main configurations of W7-X, the KJM (or so-called "high mirror") and the EIM (also known as "standard") are not optimized for simultaneously having low levels of radial and parallel neoclassical transport [6,4]: While W7-X EIM has small radial transport, it has intolerably large bootstrap current.Conversely, W7-X KJM displays small bootstrap current but larger levels of radial transport.Consequently, optimization of QI stellarators is a very active branch of research and, recently, much effort has been put in pushing forward the design and construction of quasi-isodynamic stellarators [7,8,9,10,11].
The QS subclass of omnigenous configurations is attractive as the neoclassical properties of such magnetic fields are isomorphic to those in a tokamak [12,13].Recently, it has been shown that it is possible to design QS magnetic fields with extremely low neoclassical losses [14].
In contrast to QI configurations, QS stellarators are expected to have a substantial bootstrap current ‡ and its effect must be taken into account [16].Examples of this subclass are the Helically Symmetric eXperiment (HSX) [17] or the design of the National Compact Stellarator Experiment (NCSX) [18].
Typically, at each iteration of the optimization process a large number (∼10 2 ) of magnetic configurations are generated.Therefore, in order to neoclassically optimize magnetic fields, it is required to be able to evaluate fast the neoclassical properties of each configuration.Due to this requirement, neoclassical properties are typically addressed indirectly.For instance, one can tailor the variation of the magnetic field strength B on the flux-surface so that it nearly fulfils quasi-isodinamicity: the isolines of B can be forced to close poloidally and the variance of the extrema of B along field lines can be minimized.
A different approach relies on figures of merit, which are easy to calculate, for specific collisional regimes.For the 1/ν regime, the code NEO [19] computes the effective ripple ϵ eff , which encapsulates the dependence of radial neoclassical transport on the magnetic configuration.For transport within the flux surface, there exist long mean free path formulae for parallel flow and bootstrap current [20,21,22].Although they can be computed very fast and capture some qualitative behaviour, these formulae are plagued with noise due to resonances in rational surfaces and, even with smoothing ad-hoc techniques, they are not accurate [16].This lack of accuracy limits their application for optimization purposes.During the optimization process, an accurate calculation of the bootstrap current is required to account for its effect (e.g. for optimizing QS stellarators) or to keep it sufficiently small (when optimizing for quasiisodinamicity).
Recent developments allow direct optimization of radial neoclassical transport.Based on previous derivations [23,24], the code KNOSOS [25,26] solves very fast an orbit-averaged drift-kinetic equation that is accurate for low collisionality regimes.KNOSOS is included in the stellarator optimization suite STELLOPT [27] and in the predictive transport frameworks TANGO [28] and TRINITY [29].However, the orbit-averaged equations solved by KNOSOS only describe radial transport at low collisionalities.
In this work we present MONKES (MONoenergetic Kinetic Equation Solver), a new neoclassical code conceived to satisfy the necessity of fast and accurate calculations of the bootstrap current for stellarator optimization.Specifically, MONKES makes it possible to compute the monoenergetic coefficients D ij where i, j ∈ {1, 2, 3} (their precise definition is given in section 2).These nine coefficients encapsulate neoclassical transport across and within flux surfaces.The parallel flow of each species can be calculated in terms of the coefficients D 3j [30,31,32,33].In the absence of externally applied loop voltage, the bootstrap current is driven by the radial electric field and gradients of density and temperature.The socalled bootstrap current coefficient D 31 is the one that relates the parallel flow to these gradients.The six remaining coefficients D ij for i ∈ {1, 2} allow to compute the flux of particles and heat across the flux surface.
MONKES also computes fast the radial transport coefficients.Although at low collisionality it is not as fast as the orbit-averaged code KNOSOS, MONKES can compute the transition from the 1/ν and √ ν-ν regimes to the plateau regime or the banana regime.
The plateau regime may be relevant close to the edge, while the banana regime may be necessary for stellarators very close to perfect omnigeneity.Apart from optimization, MONKES can find many other applications.For instance, it can be used for the analysis of experimental discharges or also be included in predictive transport frameworks.This paper is organized as follows: in section 2, we introduce the drift-kinetic equation solved by MONKES and the transport coefficients that it computes.In section 3, we explain the algorithm used to solve the drift-kinetic equation and its implementation.In section 4, by means of a convergence study, we demonstrate that MONKES can be used to compute accurate monoenergetic coefficients at low collisionality very fast for the 1/ν and √ ν-ν regimes [24].In order to show this, MONKES results are also benchmarked against DKES [34,35] and, when necessary, against SFINCS [36].Finally, in section 5 we summarize the results and discuss future lines of work.

Drift-kinetic equation and transport coefficients
MONKES solves the drift-kinetic equation where b := B/B is the unit vector tangent to magnetic field lines and we have employed as velocity coordinates the cosine of the pitch-angle ξ := v • b/|v| and the magnitude of the velocity v := |v|.We assume that the magnetic configuration has nested flux-surfaces.We denote by ψ ∈ [0, ψ lcfs ] a radial coordinate that labels flux-surfaces, where ψ lcfs denotes the label of the last closed flux-surface.In equation ( 1), h a is the non-adiabatic component of the deviation of the distribution function from a local Maxwellian for a plasma species a Here, n a is the density of species a, v ta := 2T a /m a is its thermal velocity, T a its temperature (in energy units) and m a its mass.
For the convective term in equation ( 1) denotes the incompressible E × B drift approximation [24] and E 0 = E ψ (ψ)∇ψ is the electrostatic piece of the electric field E perpendicular to the flux-surface.The symbol ⟨...⟩ stands for the flux-surface average operation.Denoting by V (ψ) the volume enclosed by the flux-surface labelled by ψ, the flux-surface average of a function f can be defined as the limit where d 3 r is the spatial volume form.
We denote the Lorentz pitch-angle scattering operator by L, which in coordinates (ξ, v) takes the form In the collision operator, ν a (v) = b ν ab (v) and stands for the pitch-angle collision frequency between species a and b.We denote the respective charges of each species by e a and e b , the Chandrasekhar function by is the error function and log Λ is the Coulomb logarithm [37].
On the right-hand-side of equation ( 1) is the source term, is the expression of the radial magnetic drift assuming ideal magnetohydrodynamical equilibrium, Ω a = e a B/m a is the gyrofrecuency of species a and the fluxfunctions are the so-called thermodynamical forces.Mathematically speaking, there are still two additional conditions to completely determine the solution to equation (1).First, equation (1) must be solved imposing regularity conditions at ξ = ±1 Second, as the differential operator on the left-handside of equation ( 1) has a non trivial kernel, the solution to equation ( 1) is determined up to an additive function g(ψ, v).This function is unimportant as it does not contribute to the neoclassical transport quantities of interest.Nevertheless, in order to have a unique solution to the drift-kinetic equation, it must be fixed by imposing an appropriate additional constraint.
We will select this free function (for fixed (ψ, v)) by imposing for some C ∈ R. We will discuss this further in section 3.
The drift-kinetic equation ( 1) is the one solved by the standard neoclassical code DKES [34,35] using a variational principle.Although the main feature of the code SFINCS [36] is to solve a more complete neoclassical drift-kinetic equation, it can also solve equation (1).
Taking the moments {v ma • ∇ψ, (v 2 /v 2 ta )v ma • ∇ψ, vξB/B 0 } of h a and then the flux-surface average yields, respectively, the radial particle flux, the radial heat flux and the parallel flow where B 0 (ψ) is a reference value for the magnetic field strength on the flux-surface (its explicit definition is given in section 3).It is a common practice for linear drift-kinetic equations (e.g.[34,6,36]) to apply superposition and split h a into several additive terms.As in the driftkinetic equation (1) there are no derivatives or integrals along ψ nor v, it is convenient to use the splitting The splitting is chosen so that the functions {f j } 3 j=1 are solutions to for j = 1, 2, 3, where ν := ν(v)/v and E ψ := E ψ /v.The source terms are defined as Note that each source s j corresponds to one of the three thermodynamic forces on the right-hand side of definition (7).
The relation between h a and f j given by equation ( 17) is such that the transport quantities ( 14), (15) and ( 16) can be written in terms of four transport coefficients which, for fixed (ν, E ψ ), depend only on the magnetic configuration.As dν/dv never vanishes, the dependence of f j on the velocity v can be parametrized by its dependence on ν.Thus, for fixed (ν, E ψ ), equation ( 18) is completely determined by the magnetic configuration.Hence, its unique solutions f j that satisfy conditions (12) and ( 13) are also completely determined by the magnetic configuration.The assumptions that lead to ψ and v appearing as parameters in the drift-kinetic equation (1) comprise the so-called local monoenergetic approximation to neoclassical transport (see e.g.[38]).
Using splitting (17) we can write the transport quantities ( 14), ( 15) and ( 16) in terms of the Onsager matrix Here, we have defined the thermal transport coefficients as where w 1 = w 3 = 1, w 2 = v 2 /v 2 ta and we have used that g d 3 v = 2π ∞ 0 1 −1 gv 2 dξ dv for any integrable function g(ξ, v).The quantities D ija are defined as where are the monoenergetic geometric coefficients.Note that (unlike D ija ) the monoenergetic geometric coefficients D ij do not depend on the species for fixed ν (however the correspondent value of v associated to each ν varies between species) and depend only on the magnetic geometry.In general, four independent monoenergetic geometric coefficients can be obtained by solving (18): D 11 , D 13 , D 31 and D 33 .However, when the magnetic field possesses stellarator symmetry [39] or there is no radial electric field, Onsager symmetry implies D 13 = − D 31 [35] making only three of them independent (for further details see Appendix A).Hence, obtaining the transport coefficients for all species requires to solve (18) for two different source terms s 1 and s 3 .The algorithm for solving equation ( 18) is described in section 3.
Finally, we briefly comment on the validity of the coefficients provided by equation (18) for the calculation of the bootstrap current.The pitchangle scattering collision operator used in equation (1) lacks parallel momentum conservation.Besides, the pitch-angle scattering operator is not adequate for calculating parallel flow of electrons, which is a quantity required to compute the bootstrap current.Hence, in principle, the parallel transport directly predicted by equation ( 1) is not correct.Fortunately, there exist techniques [30,31,32,33] to calculate the radial and parallel transport associated to more accurate momentum conserving collision operators by just solving the simplified drift-kinetic equation (18).This has been done successfully in the past by the code PENTA [31,40], using the results of DKES.Nevertheless, the momentum restoring technique is not needed for minimizing the bootstrap current.In the method presented in section V of [33], when there is no net parallel inductive electric field (i.e.A 3a = 0), the parallel flow with the correct collision operator for any species vanishes when two integrals in v of D 31 vanish.Thus, minimizing D 31 translates in a minimization of the parallel flows of all species involved in the bootstrap current calculation, and therefore of this current.

Numerical method
In this section we describe the algorithm to numerically solve the drift-kinetic equation (18) and its implementation.The algorithm, based on the tridiagonal representation of the drift-kinetic equation, emerges naturally when the velocity coordinate ξ is discretized using a Legendre spectral method.
First, in subsection 3.1 we will present the algorithm in a formal way.We will use (right-handed) Boozer coordinates § (ψ, θ, ζ) ∈ [0, ψ lcfs ] × [0, 2π) × [0, 2π/N p ).The integer N p ≥ 1 denotes the number of toroidal periods of the device.The radial coordinate is selected so that 2πψ is the toroidal flux of the magnetic field and θ, ζ are respectively the poloidal and toroidal (in a single period) angles.In these coordinates, the magnetic field can be written as (27) § Even though we use Boozer coordinates, we want to stress out that the algorithm presented in subsection 3.1 is valid for any set of spatial coordinates in which ψ labels flux-surfaces and the two remaining coordinates parametrize the flux-surface.
and the Jacobian of the transformation reads where ι := B • ∇θ/B • ∇ζ is the rotational transform.The flux-surface average operation (4) is written in Boozer angles as We define the reference value for the magnetic field strength B 0 introduced in definition (16) as the (0, 0) Fourier mode of the magnetic field strength.Namely, Using ( 27) and ( 28), the spatial differential operators present in the drift-kinetic equation ( 18) can be expressed in these coordinates as After the explanation of the algorithm, in subsection 3.2 its implementation in MONKES is described.In order to ease the notation, in subsections 3.1 and 3.2 we drop when possible the subscript j that labels every different source term.Also, as ψ and v act as mere parameters, we will omit their dependence and functions of these two variables will be referred to as constants.

Legendre polynomial expansion
The algorithm is based on the approximate representation of the distribution function f by a truncated Legendre series.We will search for approximate solutions to equation (18) of the form where and N ξ is an integer greater or equal to 1.As mentioned in Appendix B, the expansion in Legendre polynomials (33) ensures that the regularity conditions (12) are satisfied.Of course, in general, the exact solution to equation (18) does not have a finite Legendre spectrum, but taking N ξ sufficiently high in expansion (33) yields an approximate solution to the desired degree of accuracy (in infinite precision arithmetic).
In Appendix B we derive explicitly the projection of each term of the drift-kinetic equation (18) onto the Legendre basis when the representation (33) is used.When doing so, we obtain that the Legendre modes of the drift-kinetic equation have the tridiagonal representation for k = 0, 1, . . ., N ξ , where we have defined for convenience f (−1) := 0 and from expansion (33) it is clear that f (N ξ +1) = 0. Analogously to (33) the source term is expanded as s = N ξ k=0 s (k) P k .For the sources given by (19) this expansion is exact when N ξ ≥ 2 as s (k) j = 0 for k ≥ 3. The spatial differential operators read Thanks to its tridiagonal structure, the system of equations ( 34) can be inverted using the standard Gaussian elimination algorithm for block tridiagonal matrices.
Before introducing the algorithm we will explain how to fix the free constant of the solution to equation (34) so that it can be inverted.Note that the aforementioned kernel of the drift-kinetic equation translates in the fact that f (0) is not completely determined from equation (34).To prove this, we inspect the modes k = 0 and k = 1 of equation (34), which are the ones that involve f (0) .From expression (32) we can deduce that the term D 0 f (0) + U 0 f (1) is invariant if we add to f (0) any function of For ergodic flux-surfaces, the only continuous functions on the torus that belong to the kernel of L 1 are constants.Thus, equation ( 34) is unaltered when we add to f (0) any constant (a function that belongs simultaneously to the kernels of B × ∇ψ • ∇ and b • ∇).A constraint equivalent to condition ( 13) is to fix the value of the 0−th Legendre mode of the distribution function at a single point of the flux-surface.For example, which implicitly fixes the value of the constant C in (13).With this condition, equation ( 34) has a unique solution and its left-hand-side can be inverted to solve for f (k) in two scenarios: when the fluxsurface is ergodic and in rational surfaces when E ψ ̸ = 0 (further details on its invertibility are given in Appendix C).Note that, as expansion ( 33) is finite and representation ( 34) is non diagonal, the functions f (k) obtained from inverting (34) constrained by (38) are approximations to the first N ξ + 1 Legendre modes of the exact solution to ( 18) satisfying ( 13) (further details at the end of Appendix B).
The algorithm for solving the truncated driftkinetic equation ( 34) consists of two steps.
(i) Forward elimination and the sources for k = N ξ − 1, N ξ − 2, . . ., 0 (in this order).Equations ( 39) and ( 40) define the forward elimination.With this procedure we can transform equation ( 34) to the equivalent system for k = 0, 1, . . ., N ξ .Note that this process corresponds to perform formal Gaussian elimination over to eliminate U k in the first row.
We can apply this algorithm to solve equation ( 18) for f 1 , f 2 and f 3 in order to compute approximations to the transport coefficients.In terms of the Legendre modes of f 1 , f 2 and f 3 , the monoenergetic geometric coefficients from definition (26) read where 3s 1 = B × ∇ψ • ∇B/B 3 .Note from expressions (44), ( 45), ( 46) and (47) that, in order to compute the monoenergetic geometric coefficients D ij , we only need to calculate the Legendre modes k = 0, 1, 2 of the solution and we can stop the backward substitution ( 43) at k = 2.In the next subsection we will explain how MONKES solves equation ( 34) using this algorithm.

Spatial discretization and algorithm implementation
The algorithm described above allows, in principle, to compute the exact solution to the truncated driftkinetic equation (34) which is an approximate solution to (18).However, to our knowledge, it is not possible to give an exact expression for the operator ∆ −1 k except for k = N ξ ≥ 1.Instead, we are forced to compute an approximate solution to (34).In order to obtain an approximate solution of equation (34) we assume that each f (k) has a finite Fourier spectrum so that it can be expressed as where the Fourier interpolant row vector map I(θ, ζ) is defined at Appendix D and the column vector f (k) ∈ R N fs contains f (k) evaluated at the equispaced grid points Here, N fs := N θ N ζ is the number of points in which we discretize the flux-surface being N θ and N ζ respectively the number of points in which we divide the domains of θ and ζ.In general, the solution to equation ( 34) has an infinite Fourier spectrum and cannot exactly be written as (48) but, taking sufficiently large values of N θ and N ζ , we can approximate the solution to equation (34) to arbitrary degree of accuracy (in infinite precision arithmetic).As explained in Appendix D, introducing the Fourier interpolant (48) in equation ( 34) and then evaluating the result at the grid points provides a system of N fs × (N ξ + 1) equations which can be solved This system of equations is obtained by substituting the operators L k , D k , U k in equation ( 34) by the N fs × N fs matrices L k , D k , U k , defined in Appendix D. Thus, we discretize (34) as for k = 0, 1 . . ., N ξ where s (k) ∈ R N fs contains s (k) evaluated at the equispaced grid points.This system has a block tridiagonal structure and the algorithm presented in subsection 3.1 can be applied.We just have to replace in equations ( 39), ( 40) and ( 43) the operators and functions by their respective matrix and vector analogues, which we denote by boldface letters.
The matrix approximation to the forward elimination procedure given by equations ( 39) and ( 40) reads for k = N ξ − 1, N ξ − 2, . . ., 0 (in this order).Thus, starting from all the matrices ∆ k and the vectors σ (k) are defined from equations ( 52) and ( 53).Obtaining the matrix ∆ k directly from equation ( 52) requires to invert ∆ k+1 , perform two matrix multiplications and a subtraction of matrices.The inversion using LU factorization and each matrix multiplication require O(N 3 fs ) operations so it is desirable to reduce the number of matrix multiplications as much as possible.We can reduce the number of matrix multiplications in determining ∆ k to one if instead of computing ∆ −1 k+1 we solve the matrix system of equations for X k+1 and then obtain for k = N ξ − 1, N ξ − 2, . . ., 0. Thus, obtaining ∆ k requires O(N 3 fs ) operations for solving equation (54) (using LU factorization) and also O(N 3 fs ) operations for applying (55).For computing the monoenergetic coefficients, the backward substitution step requires solving equation ( 41) for k = 0, 1 and 2. Therefore, for k ≤ 1, it is convenient to store ∆ k+1 in the factorized LU form obtained when equation (54) was solved for X k+1 .The matrix ∆ 0 will be factorized later, during the backward substitution step.
Similarly to what is done to obtain ∆ k , to compute σ (k) we first solve for y and then compute for k ≥ 0. Recall that none of the source terms s 1 , s 2 and s 3 defined by (19) have Legendre modes greater than 2. Specifically, equation (53) implies σ = 0 for k ≥ 3 and also σ 3 .Thus, we only have to solve equation (56) and apply (57) to obtain {σ k=0 are already LU factorized, solving equation (56) and then applying (57) requires O(N 2 fs ) operations and its contribution to the arithmetic complexity of the algorithm is subdominant with respect to the O(N 3 fs ) operations required to compute ∆ k .
For the backward substitution, we first note that solving the matrix version of equation ( 41) to obtain f (0) requires O(N 3 fs ) operations, as ∆ 0 has not been LU factorized during the forward elimination.On the other hand, obtaining the remaining modes {f (k) } 2 k=1 , requires O(N 2 fs ) operations.As the resolution of the matrix system of equations ( 54) and the matrix multiplication in (55) must be done N ξ times, solving equation ( 51) by this method requires O(N ξ N 3 fs ) operations.
In what concerns to memory resources, as we are only interested in the Legendre modes 0, 1 and 2, it is not necessary to store in memory all the matrices L k , D k , U k and ∆ k .Instead, we store solely L k , U k and ∆ k (in LU form) for k = 0, 1, 2. For the intermediate steps we just need to use some auxiliary matrices L, D, U , ∆ and X of size N fs .This makes the amount of memory required by MONKES independent of N ξ , being of order N 2 fs .To summarize, the pseudocode of the implementation of the algorithm in MONKES is given in Algorithm 1.In the first loop from k = N ξ − 1 to k = 0 we construct and save only the matrices {L k , U k , ∆ k } 2 k=0 .At this point the matrices {∆ k } 2 k=1 are factorized in LU form.In the second loop, the sources {σ 3 are computed and saved for the backward substitution.Finally, the backward substitution step is applied.For solving ∆ 0 f (0) = σ (0) we have to perform the LU factorization of ∆ 0 (just for one of the two source terms) and then solve for f (0) .For the remaining modes, the LU factorizations of {∆ k } 2 k=1 are reused to solve for {f (k) } 2 k=1 .Once we have solved equation (51) for f (0) , f (1)  and f (2) , the integrals of the flux-surface average operation involved in the monoenergetic coefficients (44), (45), ( 46) and (47), are conveniently computed using the trapezoidal rule, which for periodic analytic functions has geometric convergence [41].In section 4 we will see that despite the cubic scaling in N fs of the arithmetical complexity of the algorithm, it is possible to obtain fast and accurate calculations of the monoenergetic geometric coefficients at low collisionality (and in particular D 31 ) in a single core.
Algorithm 1 Block tridiagonal solution algorithm implemented in MONKES.1. Forward elimination: The reason behind this is that in the asymptotic relation O(N 3 fs ) ∼ C alg N 3 fs , the constant C alg is small enough to allow N fs to take a sufficiently high value to capture accurately the spatial dependence of the distribution function without increasing much the wallclock time.
The algorithm is implemented in the new code MONKES, written in Fortran language.The matrix inversions and multiplications are computed using the linear algebra library LAPACK [42].

Code performance and benchmark
In this section we will demonstrate that MONKES provides fast and accurate calculations of the monoenergetic coefficients from low (ν = 10 −5 m −1 ) to high collisionality∥ (ν = 3 • 10 2 m −1 ).
In subsection 4.1 we will see that for a correct ∥ In this context "accurate at high collisionality" means that the drift-kinetic equation ( 18) is solved accurately.
calculation of the monoenergetic coefficients for ν ≥ 10 −5 m, N fs ≲ 2000 and N ξ ≲ 200 are required.In subsection 4.2 it is shown that for these resolutions MONKES produces fast calculations in a single processor.Finally, in subsection 4.3 the coefficients computed with MONKES will be benchmarked with DKES and SFINCS.As a result of the benchmarking, we will conclude that MONKES calculations are accurate.

Convergence of monoenergetic coefficients at low collisionality
In low collisionality regimes, convection is dominant with respect to diffusion.As equation ( 18) is singularly perturbed with respect to ν, its solution possesses internal boundary layers in ξ.These boundary layers appear at the interfaces between different classes of trapped particles.At these regions of phase space, collisions are no longer subdominant with respect to advection.Besides, at these regions, the poloidal E×B precession from equation ( 18) can produce the chaotic transition of collisionless particles from one class to another due to separatrix crossing mechanisms [43,24].The existence of these localized regions with large ξ gradients demands a high number of Legendre modes N ξ , explaining the difficulty to obtain fast and accurate solutions to equation ( 18) at low collisionality.
In this subsection we will select resolutions N θ , N ζ and N ξ for which MONKES provides accurate calculations of the monoenergetic coefficients in a wide range of collisionalities.For this, we will study how the monoenergetic coefficients computed by MONKES converge with N θ , N ζ and N ξ at low collisionality.From the point of view of numerical analysis, the need for large values of N ξ is due to the lack of diffusion along ξ in equation (18).Hence, if MONKES is capable of producing fast and accurate calculations at low collisionality, it will also produce fast and accurate calculations at higher collisionalities.
For the convergence study, we select three different magnetic configurations at a single flux surface.Two of them correspond to configurations of W7-X: EIM and KJM.The third one corresponds to the new QI "flat mirror" [8] configuration CIEMAT-QI [7].The calculations are done for the 1/ν (cases with E r = 0) and √ ν-ν regimes [24] (cases with E r ̸ = 0) at the low collisionality value ν = 10 −5 m.In table 1 the cases considered are listed, including their correspondent values of E r := E ψ dψ/dr .We have denoted r = a ψ/ψ lcfs and, in this context, a is the minor radius of the device ¶.
In order to select the triplets (N θ , N ζ , N ξ ) for ¶ DKES uses r as radial coordinate instead of ψ.The quantities ν and Er are denoted respectively CMUL and EFIELD in the code DKES.sufficiently accurate calculations of D 31 , we need to specify when we will consider that a computation has converged.For each case of table 1 we will proceed in the same manner.First, we plot the coefficients D ij as functions of the number of Legendre modes in a sufficiently wide interval.For each value of N ξ , the selected spatial resolutions N θ and N ζ are large enough so that increasing them varies the monoenergetic coefficients in less than a 1%.We will say that these calculations are "spatially converged".Since, typically, the most difficult coefficient to calculate is the bootstrap current coefficient, we will select the resolutions so that D 31 is accurately computed.From the curve of (spatially converged) D 31 as a function of N ξ we define our converged reference value, which we denote by D r 31 , as the converged calculation to three significant digits.From this converged reference value we will define two regions.A first region for calculations that deviate less than or equal to an ϵ% with respect to D r 31 .This interval will be used for selecting the resolutions through the following convergence criteria.
We say that, for fixed (N θ , N ζ , N ξ ) and ϵ, a calculation D 31 ∈ R ϵ is sufficiently converged if two conditions are satisfied Additionally, we define a second interval to distinguish which calculations are at a distance smaller than or equal to ϵ from D r 31 .The reason to have two different regions is that for stellarators close to QI, the relative convergence criteria can become too demanding (the smaller D r 31 is, the narrower R ϵ becomes).Nevertheless, for optimizing QI configurations, it is sufficient to ensure that | D 31 | is sufficiently small.If the absolute error is much smaller than a value of | D 31 | that can be considered sufficiently small, the calculation is converged for optimization purposes.We will use this interval for two purposes: first to give a visual idea of how narrow R ϵ becomes.Second, to show that if R ϵ is very small, it is easier to satisfy an absolute criteria than a relative one.
Figure 1 shows the convergence of monoenergetic coefficients with the number of Legendre modes for W7-X EIM when E r = 0. From figures 1(a) and 1(b) we see that the radial transport and parallel conductivity coefficients converge monotonically with N ξ .
On the other hand, the bootstrap current coefficient is more difficult to converge as can be seen on figure 1(c).As a sanity check, the fulfilment of the Onsager symmetry relation D 31 = − D 13 is included.The converged reference value D r 31 is the spatially converged calculation for N ξ = 380.Defining a region of relative convergence of ϵ = 5%, allows to select a resolution of N ξ = 140 Legendre modes to satisfy condition (i).The selection is indicated with a five-pointed green star.Note that for this case, an absolute deviation of 0.005 m from D r 31 is slightly more demanding than the relative condition.This absolute deviation is selected as the 5% of D 31 ∼ 0.1 m, which can be considered a small value of D 31 .From figure 1(d) we choose the resolutions (N θ , N ζ ) = (23, 55) to satisfy convergence condition (ii).
The case of W7-X EIM with E r ̸ = 0 is shown in figure 2. We note from figure 2(c) that obtaining sufficiently converged results for the region R 5 is more difficult than in the case without radial electric field.For this case, the sizes of the intervals A 0.005 and R 5 are almost the same.This is in part due to the fact that the D 31 coefficient is smaller in absolute value and thus, the region R 5 is narrower.We select N ξ = 160 to satisfy condition (i).The selection (N θ , N ζ ) = (27, 55) satisfies condition (ii) as shown in figure 2(d).
The convergence curves for the case of W7-X KJM when E r = 0 are shown in figure 3. Due to the smallness of D r 31 , the amplitude of the region R 5 is much narrower than in the EIM case, being of order 10 −3 .It is so narrow that the absolute value region A 0.005 contains the relative convergence region.It is shown in figure 3(c) that taking N ξ = 140 is sufficient to satisfy condition (i).According to the convergence curves plotted in figure 3 The convergence of monoenergetic coefficients for CIEMAT-QI without E r is shown in figure 5. Note that as in the W7-X KJM case at this regime, the region of absolute error A 0.005 is bigger than the relative one.As the monoenergetic coefficients are smaller, we relax the relative convergence parameter to ϵ = 7%.In figure 5(c) we see that the region of 7% of deviation R 7 is quite narrow and that selecting N ξ = 180 satisfies condition (i).To satisfy condition (ii), we choose the resolutions (N θ , N ζ ) = (15, 119) as shown in figure 5(d).
Finally, the case of CIEMAT-QI with E r ̸ = 0 is shown in figure 6.Looking at figure 6(c) we can check that taking N ξ = 180 satisfies condition (i) for the region R 7 of 7% of deviation.In this case, the region of absolute error A 0.001 is five times smaller than in the rest of cases and is still bigger than the relative error region.As shown in figure 6(d), the selection (N θ , N ζ ) = (15, 119) satisfies condition (ii).

Code performance
In this subsection we will compare MONKES and DKES performance in terms of the wall-clock time and describe MONKES scaling properties.For the wall-clock time comparison, a convergence study (similar to the one explained in subsection 4.1) has been carried out with DKES on Appendix E. This convergence study is done to compare the wall-clock times between MONKES and DKES for the same level of relative convergence with respect to D r 31 .The comparison is displayed in table 2 along with the minimum number of Legendre modes for which DKES results satisfy convergence condition (i).In all six cases, MONKES is faster than DKES despite using more Legendre modes.Even for W7-X EIM, in which we have taken N ξ = 40 for DKES calculations with finite E r , MONKES is ∼ 4 times faster using almost four times the number of Legendre modes.For the W7-X EIM case without radial electric field, the speed-up is also of 4. For the high mirror configuration, MONKES is ∼ 20 times faster than DKES without E r and ∼ 10 times faster than DKES when E r ̸ = 0.In the case of CIEMAT-QI, MONKES is more than ∼ 13 times faster than DKES without radial electric field.In the case with finite E r , MONKES calculations are around 64 times faster than DKES ones.One calculation of MONKES takes less than a minute and a half and the same calculation with DKES requires waiting for almost an hour and a half.We point out that the wall-clock times for all the calculations shown are those from one of the partitions of CIEMAT's cluster XULA.Specifically, partition number 2 has been used, whose nodes run with Intel Xeon Gold 6254 cores at 3.10 GHz.
We next check that the arithmetic complexity of the algorithm described in section 3 holds in practice.The scaling of MONKES with the number of Legendre modes N ξ and the number of points in which the flux-surface is discretized is shown in figure 7. To demonstrate the linear scaling, the wall-clock time as a function of N ξ for N fs = 2025 points is represented in figure 7(a) and compared with the line of slope 0.61 seconds per Legendre mode.As can be seen in figure 7(b), the wall-clock time (per Legendre mode) scales cubicly with the number of points in which the fluxsurface is discretized N fs .As it was mentioned at the end of section 3, the constant C alg in a single core is sufficiently small to give accurate calculations up to ν ∼ 10 −5 m −1 .We have plotted in figure 7(b) the cubic fit C alg N 3 fs , where C alg = 0.61(1/2025) 3 ∼ 7 • 10 −11 s.As the LAPACK library is multithreaded and allows to parallelize the linear algebra operations through several cores, the scaling of MONKES when running in parallel is represented.Additionally, for the resolutions selected in subsection 4.1, we display in table 3 the wall-clock time when running MONKES using several cores in parallel.Note that for the W7-X cases, which require a smaller value of N fs , the speed-up stalls at 8 cores.For CIEMAT-QI, that requires discretizing the flux-surface on a finer mesh, this does not happen in the range of cores considered.2.0 1.4

Benchmark of the monoenergetic coefficients
Once we have chosen the resolutions (N θ , N ζ , N ξ ) for each case, we need to verify that these selections indeed provide sufficiently accurate calculations of all the monoenergetic coefficients in the interval ν ∈ [10 −5 , 300] m −1 .In all cases, MONKES calculations of the D 11 and D 31 coefficients will be benchmarked against converged calculations from DKES (see Appendix E) and from SFINCS + .The parallel conductivity coefficient will be benchmarked only against DKES.
The benchmarking of the coefficient D 11 for the six different cases is shown in figure 8.The result of the benchmark of the bootstrap current coefficient D 31 is shown in figure 9.
Finally, the parallel conductivity coefficient D 33 is benchmarked in figure 10.Due to the weak effect of the radial electric field in the D 33 coefficient, the symbols for this plot have been changed.In all cases, the agreement between MONKES, DKES and SFINCS is almost perfect.Thus, we conclude that MONKES calculations of the monoenergetic coefficients are not only fast, but also accurate.Additionally, we can evaluate the level of + SFINCS calculations are converged up to 3% in the three independent variables.optimization of the three configurations considered by inspecting these plots.
In figures 8(a) and 8(b) is shown that the W7X-EIM configuration has smaller radial transport coefficient than the W7X-KJM configuration.Figures 9(a) and 9(b) show that the smaller radial transport of the W7X-EIM configuration comes at the expense of having larger bootstrap current coefficient.As shown in figures 8(c) and 9(c), the optimized stellarator CIEMAT-QI manages to achieve levels of radial transport similar or smaller than the W7X-EIM configuration and a bootstrap current coefficient as low as the W7X-KJM configuration.

Conclusions and future work
In this paper we have presented the new code MONKES, which can provide fast and accurate calculations of the monoenergetic transport coefficients at low collisionality in a single core.By means of a thorough convergence study we have shown that it is possible to evaluate the monoenergetic coefficients in the 1/ν and √ ν-ν regimes in approximately 1 minute.Besides, when there are sufficient computational resources available, the code can run even faster using several cores in parallel.A natural application is the inclusion of MONKES in a stellarator optimization suite.MONKES rapid calculations will allow direct optimization of the bootstrap current and radial transport from low collisionalities (typical of the 1/ν and √ ν-ν) to moderate collisionalities (typical of the plateau regime).The low collisionality regimes are important in reactor relevant scenarios while the plateau regime can be important close to the edge, where the plasma is cooler.Massive evaluation of configurations to study the parametric dependence of D 31 or other coefficients on specific quantities of the magnetic configuration can also be done.Another application is its inclusion in predictive transport frameworks, which require neoclassical calculations to determine the evolution of plasma profiles.The neoclassical quantities required for these simulations can be calculated using MONKES.Equation ( 1), solved by MONKES, includes a collision operator which does not preserve momentum.An important continuation of this work would be the implementation of momentum-correction techniques, such as the ones presented in [30,31,32,33].As each calculation from MONKES can be executed in a single core, the scan in v (i.e. in ν) required to perform the integrals of the monoenergetic coefficients is parallelizable.
Therefore, it seems possible for the near future to obtain fast calculations of neoclassical transport with a model collision operator that preserves momentum.With this minor extension, MONKES could also be used for self-consistent optimization of magnetic fields in a similar manner to [16] for general geometry.• For fixed i and j, if s i and s j are both even,

Acknowledgements
The proof is as follows As in the last equality, due to (A.2), the roles of i and j are interchangeable, we have that D ij = D ji .• For fixed i and j, if s i and s j are both odd, D ij = D ji .The proof is as follows As in the last equality, due to (A.2), the roles of i and j are interchangeable, we have that D ij = D ji .• For fixed i and j, if s i is even and s j is odd, The proof is as follows As in the last equality, due to (A.3), interchanging the roles of i and j switches signs, we have that The equation ( 18) can be written in the form of (A.1) by setting the operators to be With these definitions, properties (A.2) and (A.3)♯ are satisfied and D ij = D ij .It is interesting to remark that the antisymmetry property (A.3) of V implies that the diagonal monoenergetic coefficients D ii are always positive.Note first that (A.3) implies ⟨f, Vf ⟩ S = 0 for any f ∈ F S .This implies that D ii = −⟨f i , νLf i ⟩ S and, as L is a negative operator (its eigenvalues are all negative or zero, see Appendix B), D ii ≥ 0. Also note that properties (A.2) and (A.2) imply that ⟨νLf j , 1⟩ S = 0 and ⟨Vf j , 1⟩ S = 0. Thus, the image of the drift-kinetic equation ( 18) is constrained by ⟨s j , 1⟩ S = 0. Now we distinguish the two cases for which the monoenergetic coefficients D ij satisfy Onsager symmetry relations.
(i) If E ψ = 0, the property is defined as It is straightforward to check that for this property, conditions (A.6), (A.7) and (A.8) are satisfied.Also, (ii) When E ψ is not necessarily zero, we define the property P as the one that defines stellarator symmetry [39] Pf (θ, and we have assumed without loss of generality that the planes of symmetry are θ = 0 and ζ = 0. Thus, when the magnetic field is stellaratorsymmetric B = B + .In this case, using ( 29), (31) and (32) it is straightforward to check † † that conditions (A.6), (A.7) and (A.8) are satisfied.Besides, ♯ As ∇ • v E = 0, the operator V can be written in divergence form.For the symmetry of L see Appendix B. † † Note that derivatives along θ and ζ switch parities with respect to the stellarator symmetry property, i.e. ∂f ± ∂θ = ( ∂f ± ∂θ ) ∓ and ∂f ± ∂ζ = ( ∂f ± ∂ζ ) ∓ .Also, as for stellarator-symmetric fields, √ g = √ g + the flux-surface average satisfies f − = 0. and regularity boundary conditions at ξ = ±1 where k ≥ 0 is an integer.
As L has a discrete spectrum and is self-adjoint with respect to the inner product in the space of functions that satisfy the regularity condition, {P k } ∞ k=0 is an orthogonal basis satisfying ⟨P j , P k ⟩ L = 2δ jk /(2k + 1).Hence, these polynomials satisfy the three-term recurrence formula obtained by Gram-Schmidt orthogonalization.Starting from the initial values P 0 = 1 and P 1 = ξ, the recurrence defines the rest of the Legendre polynomials.Additionally, they satisfy the differential identity Identities (B.4) and (B.5) are useful to represent tridiagonally the left-hand side of equation ( 18) when we use the expansion (33).The k−th Legendre mode of the term ξb • ∇f is expressed in terms of the modes f (k−1) and f (k+1) using (B.4) Combining both (B.4) and (B.5) allows to express the k−th Legendre mode of the mirror term ∇ • b((1 − ξ 2 )/2) ∂f /∂ξ in terms of the modes f (k−1) and f (k+1) as where we have also used For the collision operator used in equation ( 18), as Legendre polynomials are eigenfunctions of the pitchangle scattering operator, using (B.1) we obtain the diagonal representation Finally, we briefly comment on why the truncation error from (33) implies that the solution to ( 34) and ( 38) is an approximation of the Legendre spectrum of the exact solution to (18) satisfying (13).For this, we will assume that the solution to ( 18) and ( 13) is unique (which it is, see Appendix C).We denote this exact solution by f ex and its Legendre modes by f ex satisfy (34) for all values of k, including k > N ξ and, in general, f Denoting the error of the solution f (k) to ( 34) and ( 38) by is easy to prove that for k = 0, 1, . . ., N ξ − 1 and Note that the system of equations constituted by (B.11) and (B.12) for the error is identical to (34) substituting . Hence, by assumption, the solution to (B.11) and (B.12) satisfying ( 38) is unique, implying that

Appendix C. Invertibility of the spatial differential operators
In this Appendix we will study the invertibility of the left-hand-side of (34).We are only concerned in elucidating under which conditions the algorithm given in section 3 can be applied to solve (34).For instance, we will consider the possibility of the flux-surface being rational despite of the fact that (among other things) it may be inconsistent with the assumption that thermodynamical forces are a flux-function.We will conclude that the solution to (34) submitted to (38) is unique in ergodic flux-surfaces and also on rational flux-surfaces with E ψ ̸ = 0 and can be obtained with the aforementioned algorithm.In order to do this, we view L k , D k and U k as operators that act on F, where F is the space of smooth functions on the flux-surface equipped with the inner product where z denotes the complex conjugate of z and the inner product induces a norm In this setting L k , D k and U k are operators from F to F as all of their coefficients are smooth on the fluxsurface.However, the operators L k and U k given by ( 35) and ( 37) do not have a uniquely defined inverse.This is a consequence of the fact that the parallel streaming operator ξb where B • ∇W k = ω k and is integrated imposing W k p = 0. Note that this implies that Φ ̸ = 0 and that When Φ ∈ F, the left-hand side of (C.3) has a non trivial kernel (as an operator from F to F).In order to proceed further, we employ coordinates (α, l) where α := θ −ιζ is a poloidal angle that labels field lines and l is the length along magnetic field lines.Depending on the type of flux-surface there are two possible situations Hence, if ⟨ω k Φ⟩ ̸ = 0, equation (C.10) fixes the value of f 0 so that f is continuous on the torus.Note that if ⟨ω k Φ⟩ ̸ = 0, by virtue of (C.6), Φ is not univaluated and does not belong to F. On the contrary, if f 0 is free, then Φ is a continuous function on the torus.Then, (C.10) implies that KΦ is continuous on the torus when Φ is.The function K is also continuous as long as sB belongs to the image of B • ∇ + ω k .Note that using (C.9) we can derive from ⟨Eq.
Similarly to what happened to Φ when studying the invertibility of L k and U k , continuity of the solution implies that Ψ cannot be univaluated.We can write Ψ as where B × ∇ψ • ∇A k = νk and is integrated along with condition A k p = 0. Using (C.18), we can write Note that A k is monotonically crescent with α, which means that Ψ cannot be univaluated.Besides, (C.21) implies Ψ > 0, which means that ⟨Ψ⟩ ̸ = 0 and Lα 0 Ψ dα /B 2 ̸ = 0. Thus, there is a unique value of the constant g 0 which compensates the jumps in Ψ and IΨ so that g = g 0 Ψ + IΨ is continuous on the flux-surface.Hence, D k is an invertible operator from F to itself.
The inverse of D k for k ≥ 1 and E ψ ̸ = 0 is defined by where G 0 [s] and I[s] denote the linear operators which define, respectively, the constant of integration and the solution to (C.16) with I p = 0 for a given source term.Specifically, Finally, we will study the invertibility of the operator ∆ assuming that ∆ k+1 is an invertible operator from F to F. For this, first, we note that in the space of functions of interest (smooth periodic functions on the torus), using a Fourier basis {e i(mθ+nNpζ) } m,n∈Z , we can approximate any function truncating the modes with mode number greater than some positive integer N where fmn = f, e i(mθ+nNpζ) are the Fourier modes of f .Thus, we approximate F using a finite dimensional subspace F N ⊂ F consisting on all the functions of the form given by equation (C.27).
Hence, we can approximate D k , U k , ∆ k+1 and L k+1 restricted to F N (and therefore ∆ k ) in equation (C.26) by operators D N k , U N k , ∆ N k+1 and L N k+1 that map any f ∈ F N to the projections of D k f , U k f , ∆ k+1 f and L k+1 f onto F N .The operators D N k , U N k , ∆ N k+1 and L N k+1 can be exactly represented (in a Fourier basis) by square matrices of size dim F N .When the operators are invertible, these matrices are invertible aswell.Doing so, we can interpret the matrix representation of ∆ k as the Schur complement of the matrix It is well known from linear algebra that the determinant of When both D k and ∆ k+1 are invertible, the matrix M N k is invertible.Hence, note from (C.30) that, for k ≥ 1, the matrix ∆ N k can be inverted for any N , and therefore ∆ k (as an operator from F to F) is invertible.
The case k = 0 requires special care.In this case D 0 is not invertible and the previous argument cannot be applied.In order to make the solution unique, we need to impose an additional constraint to f (0) .On ergodic flux-surfaces, condition (38) is sufficient to fix the value of f (0) .However, this is not always the case when ι is rational.Condition (38) fixes the value of f (0) solely when the only functions that lie simultaneously at the kernels of D 0 = − E ψ B 2 −1 B × ∇ψ • ∇ and L 1 = b • ∇ are constants (flux-functions).If E ψ ̸ = 0, this occurs for any δ ̸ = −1/ι.However, the case δ = −1/ι is unphysical as it would imply √ g = 0.
Hence, in practice, when E ψ ̸ = 0 condition (38) is sufficient to fix the value of f (0) even if the surface is not ergodic.For rational flux-surfaces and E ψ = 0, condition (38) is insufficient to fix f (0) .In such case, we would need to fix the value of f (0) at a point of each field line as any function g(α) lies in the kernel of b • ∇.
In order to clarify this assertion, let's try to obtain f (0) assuming that f (1) is known.Integrating the Legendre mode k = 1 of equation ( 41) along a field line gives 0 does not depend on α.In this case, equation (C.31) and condition (38) fix f (0) for each σ (1) , f (1) .When ι is rational we need to distinguish between the case with and without radial electric field.
(ii) For E ψ ̸ = 0, inserting (C.31) in the Legendre mode k = 0 of equation (34) gives Integrating Lc 0 Eq. (C.32) dl gives a differential equation in α from which we can obtain f (0) 0 up to a constant.Thus, (C.31), condition (38) and (C.32) fix f (0) .Hence, in ergodic flux-surfaces or rational fluxsurfaces with finite radial electric field, M N 0 has a onedimensional kernel.Thus, for k = 0, it is necessary to substitute one of the rows of [D N 0 U N 0 ] by the condition (38) so that M N 0 is invertible for any N and as ∆ N 1 can be inverted, also ∆ N 0 constructed in this manner for any N , which implies that ∆ 0 (as the limit lim N →∞ ∆ N 0 ) is invertible.

Appendix D. Fourier collocation method
In this appendix we describe the Fourier collocation (also called pseudospectral) method for discretizing the angles θ and ζ.This discretization will be used to obtain the matrices L k , D k and U k .For convenience, we will use the complex version of the discretization method but for the discretization matrices we will just take their real part as the solutions to (18) are all real.
We search for approximate solutions to equation (34) of the form is the discrete inner product associated to the equispaced grid points (49), (50), ∥f ∥ N θ N ζ := ⟨f, f ⟩ N θ N ζ its induced norm and z denotes the complex conjugate of z.We denote by F N θ N ζ to the finite dimensional vector space (of dimension N θ N ζ ) comprising all the functions that can be written in the form of expansion (D.1).
The set of functions {e i(mθ+nNpζ) } ⊂ F N θ N ζ forms an orthogonal basis for F N θ N ζ equipped with the discrete inner product (D.3).Namely, e i(mθ+nNpζ) , e i(m ′ θ+n ′ Npζ) for −N θ1 /2 ≤ m ≤ N θ2 /2 and −N ζ1 /2 ≤ n ≤ N ζ2 /2.Thus, for functions lying in F N θ N ζ , discrete expansions such as (D.1) coincide with their (finite) Fourier series.The discrete Fourier modes (D.2) are chosen so that the expansion (D.1) interpolates f (k) at grid points.Hence, there is a vector space isomorphism between the space of discrete Fourier modes and f (k) evaluated at the equispaced grid.
Combining equations (D.1), (D.2) and (D.3) we can write our Fourier interpolant as where f (k) ∈ R N fs is the state vector containing f (k) (θ i ′ , ζ j ′ ).The entries of the vector I(θ, ζ) are the functions I i ′ j ′ (θ, ζ) given by, Note that the interpolant is the only function in F N θ N ζ which interpolates the data at the grid points, as Of course, our approximation (D.5) cannot (in general) be a solution to (34) at all points (θ, ζ) ∈ [0, 2π) × [0, 2π/N p ). Instead, we will force that the interpolant (D.5) solves equation (34) exactly at the equispaced grid points.Thanks to the vector space isomorphism (D.2) between f (k) and the discrete modes f (k) mn this is equivalent to matching the discrete Fourier modes of the left and right-hand-sides of equation (34).
Inserting the interpolant (D.5) in the left-hand side of equation (34) and evaluating the result at grid points gives Here, L k I(θ i , ζ j ), D k I(θ i , ζ j ) and U k I(θ i , ζ j ) are respectively the rows of L k , D k and U k associated to the grid point (θ i , ζ j ).We can relate them to the actual positions they will occupy in the matrices choosing an ordenation of rows and columns.We use the ordenation that relates respectively the row i r and column i c to the grid points (θ i , ζ j ) and (θ i ′ , ζ j ′ ) as i r = 1 + i + jN θ , (D.10)

R
ϵ := (1 − ϵ/100) D r 31 , (1 + ϵ/100) D r 31 (58) (i) Spatially converged calculations with N ′ ξ ≥ N ξ belong to R ϵ .(ii) Increasing N θ and N ζ while keeping N ξ constant produces calculations which belong to R ϵ .Condition (i) is used to select the number of Legendre modes N ξ and condition (ii) is used to select the values of N θ and N ζ once N ξ is fixed.

Figure 7 :
Figure 7: Scaling of MONKES wall-clock time.(a) Linear scaling with the number of Legendre modes for N fs = 27 × 75 = 2025 discretization points.(b) Cubic with N fs for different number of cores used.

s − 2 and s 3 = s + 3 .
Hence, we have D 12 = D 21 , D 13 = − D 31 and D 23 = − D 32 .Note that for equation (18), the Onsager symmetry relation D 12 = D 21 is trivial as s 1 = s 2 , which implies f 1 = f 2 and thus D 12 = D 21 = D 11 = D 22 , D 31 = D 32 and D 13 = D 23 .Nevertheless, if the definition of s 1 and s 2 was different, as long as their parity is the same, the relation D 12 = D 21 would still hold.
Appendix B. Legendre modes of the drift-kinetic equation Legendre polynomials are the eigenfunctions of the Sturm-Liouville problem in the interval ξ ∈ [−1, 1] defined by the differential equation 2LP k (ξ) = −k(k + 1)P k (ξ), (B.1) ∂ξ has a non trivial kernel comprised of functions g((1−ξ 2 )/B).On the other hand, the operator D k has a unique inverse for k ≥ 1.For k = 0, the operator D 0 is not invertible as it has a kernel comprised of functionsg(B θ θ + B ζ ζ).Whether L k and U k are or not invertible can be determined studying the uniqueness of continuous solutions (on the flux-surface) toB • ∇f + ω k f = sB, (C.3) for some s, ω k ∈ F. Note that equations L k f = ks/(2k −1) and U k f = (k +1)s/(2k +3) can be written in the form of equation (C.3) setting, respectively, ω k = (k−1)B•∇ ln B/2 and ω k = −(k+2)B•∇ ln B/2.We will determine a condition for ω k which, if satisfied, equation (C.3) has a unique solution f ∈ F. The solution to equation (C.3) can be written as f = (f 0 + K)Φ, .6)and (C.7) are integrated (along a field line) imposing Φ p = 1 and K p = 0 at a point p of the field line.Note that f 0 = f p is an integration constant.Depending on the form of ω k , f 0 can or cannot be determined imposing continuity on the fluxsurface.The solution to equation (C.6) can be written as Φ = exp(−W k ), (C.8)

( i )
For ergodic flux-surfaces, ι ∈ R\Q and satisfying (C.5) implies that f 0 is a flux-function.The solution f to (C.3) is a differentiable function on the torus if ⟨B • ∇f ⟩ = 0. Applying ⟨Eq.(C.3)⟩ combined with splitting (C.4) yields f 0 ⟨ω k Φ⟩ = ⟨Bs⟩ − ⟨Kω k Φ⟩ = ⟨B • ∇(KΦ)⟩.(C.10) (C.3)/Φ⟩ the solvability condition ⟨sB/Φ⟩ = 0. (ii) For rational flux-surfaces, ι ∈ Q and satisfying (C.5) implies that f 0 (α) depends on the field line chosen.At these surfaces, the field line labelled by α closes on itself after a length L c (α).If the solution f is continuous on the flux-surface, then Lc 0 B • ∇f dl /B = 0 for each field line.Applying Lc 0 Eq.(C.3) dl /B combined with splitting (C.Φdl/B ̸ = 0, condition (C.11) fixes a unique value of f 0 (α) (for each field line) for which f is continuous on the torus.As for ergodic surfaces, if (C.11) does not fix f 0 , then Φ and KΦ are continuous along field lines.Again, K is also continuous as long as sB belongs to the image of B • ∇ + ω k .Using (C.9) we can derive from Lc 0 Eq.(C.3)/Φdl/B the solvability condition Lc 0 sB/Φdl/B = 0. Thus, we have seen that when ⟨ω k Φ⟩ = 0 or Lc 0 ω k Φdl/B = 0, the operator B • ∇ + ω k from F to itself is not one-to-one (it has a non trivial kernel comprised of multiples of Φ).Moreover, we have the solvability conditions ⟨sB/Φ⟩ = 0 for ergodic surfaces and Lc 0 sB/Φ dl /B = 0 for rational surfaces.The existence of a solvability condition implies that B • ∇ + ω k is not onto.We can derive a simpler and equivalent condition for ω k from (C.8).Note that Φ is continuous on the torus only when W k is.As B • ∇W k = ω k , continuity of W k along field lines imposes ⟨ω k ⟩ = 0 on ergodic flux-surfaces and Lc 0 ω k dl/B = 0 on rational ones.Hence, the operator B • ∇ + ω k is invertible if ⟨ω k ⟩ ̸ = 0 or Lc 0 ω k dl/B ̸ = 0.This result can be applied to determine that L k and U k are not invertible.For both L k and U k , ω k ∝ B • ∇ ln B γ for some rational exponent γ.As B is continuous on the flux-surface we have for L k and U k that Lc 0 ω k dl/B = 0 or ⟨ω k ⟩ = 0, which means that neither L k nor U k are invertible.Now we turn our attention to the invertibility of D k for k ≥ 1.For E ψ = 0, D k is just a multiplicative operator and is clearly invertible when ν, k ̸ = 0.For E ψ ̸ = 0, the invertibility of D k can be proven by studying the uniqueness of solutions to B × ∇ψ • ∇g − νk g = − B 2 νk(k +1) B 2 /2 E ψ .The procedure is very similar to the one carried out for L k and U k .First, we write the solution to equation (C.12) as g = (g 0 + I)Ψ, 15) and (C.16) are integrated along a integral curve of B × ∇ψ imposing Ψ p = 1 and I p = 0 at the initial point p of integration.The integral curves of B × ∇ψ are, in Boozer coordinates, straight lines B θ θ + B ζ ζ = constant.In order to proceed further, we change from Boozer angles (θ, ζ) to a different set of magnetic coordinates (α, φ) using the linear transformation θ ζ = (1 + ιδ) −1 ι −δ(1 + ιδ) −1 1 α φ (C.17)where δ = B θ /B ζ .In these coordinates B = ∇ψ ×∇α, B α = 0 and B × ∇ψ • ∇ = B 2 ∂ ∂α .(C.18) Depending on the rationality or irrationality of δ we can distinguish two options (i) If δ ∈ R\Q, satisfying (C.14) implies that g 0 is a flux-function (the integral curves trace out the whole flux-surface).Note that if g is a differentiable function on the torus ⟨B × ∇ψ • ∇g⟩ = ⟨∇ × (gB) • ∇ψ⟩ = 0, where we have used ∇ × B • ∇ψ = 0. Taking ⟨Eq.(C.12)⟩ assuming that f is continuous on the flux-surface, combined with (C.13) gives

Table 1 :
Cases considered in the convergence study of monoenergetic coefficients and values of (ν, E r ).

Table 2 :
Comparison between the wall-clock time of DKES and MONKES.

Table 3 :
Wall-clock time of MONKES in seconds for the selected triplets (N θ , N ζ , N ξ ) when running in several cores.