Tokamak divertor plasma emulation with machine learning

Future tokamak devices that aim to create conditions relevant to power plant operations must consider strategies for mitigating damage to plasma facing components in the divertor. One of the goals of MAST-U tokamak operations is to inform these considerations by researching advanced divertor configurations that aid stable plasma detachment. Machine design, scenario planning and detachment control would all greatly benefit from tools that enable rapid calculation of scenario-relevant quantities given some input parameters. This paper presents a method for generating large, simulated scrape-off layer data sets, which was applied to generate a data set of steady-state Hermes-3 simulations of the MAST-U tokamak. A machine learning model was constructed using a Bayesian approach to hyperparameter optimisation to predict diagnosable output quantities given control-relevant input features. The resulting best-performing model, which is based on a feedforward neural network, achieves high accuracy when predicting electron temperature at the divertor target and carbon impurity radiation front position and runs in around 1 ms in inference mode. Techniques for interpreting the predictions made by the model were applied, and a high-resolution parameter scan of upstream conditions was performed to demonstrate the utility of rapidly generating accurate predictions using the emulator. This work represents a step forward in the design of machine learning-driven emulators of tokamak exhaust simulation codes in operational modes relevant to divertor detachment control and plasma scenario design.


Introduction
In a diverted tokamak configuration, the main body of confined plasma is surrounded by a thin scrape-off layer (SOL), delimited by the last closed flux surface, which consists of open magnetic field lines.Charged particles that escape the hot core are predominantly transported along these open field lines until they are deposited on a vessel tile or cool sufficiently to recombine.Without exhaust heat and power flux mitigation, machines that will operate at power levels relevant to electricity generation-such as ITER [1], SPARC [2] and STEP [3]-would have their exhaust plasma facing components destroyed.One mitigation strategy is to utilise a detached plasma configuration.A detached plasma is one with such significant energy dissipation between the midplane SOL and divertor target that the poloidal plasma boundary is situated upstream of the target [4][5][6].Detachment is driven by impurity radiation, plasma recombination and charge exchange, and is influenced by magnetic field geometry, ion recycling and gas pumping [6].The associated drastic reduction in energy and particle flux to the target has immense benefits for power exhaust handling, as energy in the hot exhaust is radiated away mostly upstream of the detachment front and dispersed over a wide area.This is in stark contrast to the attached scenario, in which energetic plasma particles deposit their energy locally on divertor target tiles.Machine designs that enable access to the diverted plasma regime are therefore critical for upcoming near-power-plant-level and demonstration devices.
Several present-day tokamaks exist with stated aims to investigate advanced divertor configurations and detachment control methodologies.For example, the Mega Ampere Spherical Tokamak Upgrade (MAST-U) tokamak has been designed in part to enable research into the Super-X divertor configuration [7,8].Predictive SOLPS-ITER [9] simulations have shown that the characteristic extended SOL magnetic field paths and increased flux expansion-which lead to increased radiative cooling and increased plasma wetted area-of the Super-X divertor can drastically reduce exhaust particle and heat flux on divertor tiles [10][11][12].Recent experimental results are in line with the SOLPS-ITER predictions, showing significantly reduced heat loading on the divertor target [13,14] and migration of hydrogenic [15,16] and impurity [16] radiation fronts upstream of the target.The Super-X divertor configuration also expands the parameter region in which stable diverted plasmas can be obtained [14,17], which could enable longer pulses and improved core confinement.
Numerical simulation of the SOL and divertor has significant utility.For example, the database of SOLPS simulations constructed to support the physics basis of the ITER divertor was used to predict the target peak heat flux, suggest impurity seeding parameters for plasma detachment, and motivate strong edge-localised mode control measures [18]; simulations of the Super-X divertor strongly influenced and formed the physics basis for the design of the MAST-U divertor [10,11,19,20]; and the effect of magnetic geometry in the divertor on detachment control has been studied with simulations for the MAST-U Super-X [21] and other devices [22].
A significant challenge when simulating SOL physics lies in its computational intensity.This prohibits (or makes it difficult for) such calculations to be used in several settings that are important for the development and operation of nextgeneration tokamak devices, such as data-driven control policy optimisation [23], digital twin development [24], or reactor design space exploration [25].Each workflow has its own time-to-solution constraints, but all require, or would greatly benefit from, orders of magnitude speedup compared with the typical time taken to generate simulated results.
The construction of emulators (or surrogate models) can drastically speed up the process of obtaining quantities of interest [26], including in edge modelling [27,28], which can help to enable the data-intensive tasks listed above.Emulators are created by devising a statistical model that can reproduce the output quantities of interest of an expensive-to-evaluate simulator based on sampled input values to some level of required accuracy.The exact form of the predictive model can vary, and its choice depends on numerous factors, especially required flexibility and training and evaluation data availability.
This paper presents a data set of 3367 SOL simulations in MAST-U-like conditions and describes a machine learningbased emulator of the simulations that is able to rapidly and accurately predict quantities of interest given the same input conditions as the simulations.The numerical methods used to simulate SOL plasmas, data set generation, and emulator development are described in section 2. Performance of the emulator, interpretation of its predictions, and a detailed parameter scan of upstream conditions are given in section 3. A summary is presented in section 4.

Data generation and discussion
The Hermes-3 [29] code was used to generate a data set of simulations modelling multi-species tokamak plasmas of various configurations.Hermes-3 is an open-source simulation tool that solves the plasma and neutral fluid equations in 1D, 2D or 3D, which extends the divertor modelling code SD1D [30] to multiple dimensions and facilitates the inclusion of multiple charged and neutral species (enabling the separate solving of electron and ion fluid equations).The model implemented in this work was designed to closely resemble conditions and geometry attainable in the MAST-U tokamak, and a single flux tube of a diverted deuterium plasma originating at the outboard midplane and terminating at the divertor target was simulated.Recombination, ionisation and charge exchange interactions were included.A fixed carbon impurity was used to represent the intrinsic carbon in the tokamak and to provide a diagnosable radiation profile [21,27].Carbon sputtering from the divertor target can affect detachment onset and stability [31].The lack of simulated sputtering in the model used in this work can be considered a limitation, although some of the effects of target sputtering will be captured by the fixed carbon impurity.A proportional-integral controller was used to modulate the upstream deuterium ion density, which drove simulations to steady state.While there are limitations to modelling divertor processes in 1D [32,33], the models can nevertheless provide insight into many of the physical processes in the SOL that drive detachment [30,34,35] and prove useful in design and optimisation tasks [30].
Several simulation input parameters were varied to generate diverse steady-state solutions.These input parameters, the symbols denoting them, and their ranges in the data set are given in table 1.A mapping from connection length, L, to flux tube area expansion factor, A, was designed to make A positively correlated with L. The length from the upstream midplane boundary to the X-point was set to a constant 10 m.Throughout this work, flux expansion refers to the total flux expansion between the X-point and target, and there is no flux expansion between the midplane and X-point.
To reduce the time taken to reach a steady-state solution, the input space was initially coarsely sampled using a Sobol sequence [36], which generates a low-discrepancy sampling of the space.The outputs of the corresponding converged simulations were used as a basis for launching a much larger number of subsequent simulations, which had input parameters generated by pseudo-random sampling of the input parameter space.For each simulation in the randomly-sampled set, the initial conditions were determined using the output of a Sobol-sampled simulation.This Sobol-sampled simulation was selected for its input parameters being closest in Euclidean distance to those of the given randomly-sampled simulation in standardised space.Tests showed the average speedup using this method to be around a factor of 3 when compared with starting from uniform initial conditions.This is a significant time and resource saving when this data generation method is deployed at scale.
The simulation data set consists of n = 3367 converged simulations.Of interest to this work are the following two derived quantities: electron temperature at the divertor target, T e,t , and carbon radiation front position, y RC,max .These quantities are important for the diagnosis and control of tokamak plasma detachment.In particular, the goal of plasma detachment is to reduce the power and heat flux loads on the plasmafacing components of the divertor, which is in part accomplished by minimising T e,t .The radiation front position is often used as an indicator of the detachment front location [16], and is a diagnosable quantity that may be used in detachment feedback control [37].Furthermore, deep detachment can lead to radiative collapse disruptive events, so balancing the location of the detachment front is imperative.The distribution of these quantities in the data set is shown in figure 1, with the carbon radiation front position normalised to connection length.T e,t ranges from 0.37 eV to 200.61 eV and has mean 28.65 eV and median 11.64 eV, and y RC,max /L ranges from 0.34 to >0.99 and has mean 0.91 and median 0.99.The upper bound of y RC,max /L is not equal to one due to the discrete simulation grid coordinate system being cell-centred.Both distributions are skewed towards configurations with relatively low T e,t and carbon radiation front positions that are close to the target, although deeply detached and strongly attached plasmas are also represented.
A visualisation of the data set is shown in figure 2. The central plot was generated by first linearly scaling the carbon radiation and electron temperature profiles to the range [0, 1].These vectors were then joined to create a list of length twice the number of grid points, g, in the simulations.These lists were then stacked to form a 2g × n matrix.The UMAP algorithm [38] was used to reduce the dimensionality of the feature space from 2g to 2 so that it could be visualised.A description of dimensionality reduction and UMAP is given in appendix A. The reduced-dimensional space is also coloured by T e,t .This step is useful for observing general trends in the data and identifying discrete clusters of features.For example, the dimensionality-reduced data is dominated by a continuous stream of data points that range from high-target-temperature, low-radiation plasmas with carbon impurity radiation profiles that peak at or very near to the divertor target (labelled 'a'), through to plasmas with low T e,t but carbon impurity radiation peaks near the target (labelled 'b'), and on to plasmas characterised by low T e,t and carbon impurity radiation profiles that are strongly peaked away from the divertor target (labelled 'c').Simulations associated with regions 'a', 'b' and 'c' can be loosely designated as attached, weakly detached (or detaching) and strongly detached plasmas, respectively.Plasmas in regions 'd' and 'e' have low T e,t and radiate most strongly around the X-point and upstream of the X-point, respectively.The electron temperature and carbon impurity radiation profiles for two simulations are shown in the plots on the left and right of figure 2. On the left are shown the profiles of a typical attached plasma, with high T e,t that is a significant fraction of the upstream temperature and an impurity radiation profile that peaks at the divertor target.This type of plasma configuration can result in serious damage to material components as the plasma exhaust energy deposition occurs in a concentrated area.The right two plots show a typical deeply detached plasma with the characteristic peaked carbon impurity radiation profile and associated large drop in electron temperature, where the plasma locally loses a significant portion of its energy due to the high rates of recombination and radiation.In this kind of plasma configuration, the exhaust energy would be radiated away and deposited over a large area, thus mitigating material damage.

Emulator development
In this section, the process of training a neural network emulator of the plasma model outlined in the section 2.1 is described.The purpose of the emulator is to rapidly and accurately obtain steady-state quantities of interest given some initial conditions without having to run a complete simulation.
To prepare the data for a statistical analysis, the data set was divided into three disjoint subsets using unbiased sampling: a training set (D train ), a validation set (D valid ), and a test set (D test ), with relative cardinalities of 0.85, 0.15, and 0.15, respectively.The training set's input parameters (n D + ,u , P u , L, A, F rec,t and F Ne ), denoted X train , were normalised to a mean of zero and a variance of one, giving Xtrain , and its output variables (T e,t and y RC,max /L), denoted Y train , were linearly transformed to fit exactly within the range [0, 1], giving Ỹtrain .This data preconditioning step stabilises and speeds up gradientbased learning methods [39].The functions used to scale the input and target variables of D train can be used to transform the corresponding variables in D valid and D test (or any other reasonable data) without further fitting, and values can be inversely transformed to retrieve physically meaningful units.
There are many hyperparameters associated with the topology, functions and training of neural networks [40].These hyperparameters vary in the magnitude of their impact on the training process and accuracy of the trained model, and it is not possible to know a priori what hyperparameter configuration will lead to an optimally performant model.The landscape of unique neural network hyperparameter configurations is vast and searching the space is complicated by the mixed nature of hyperparameter types, which can be real, integer or categorical.
Many techniques exist for searching the hyperparameter configurations of statistical models.In this work, the stateof-the-art tree-structured parzen estimator (TPE) optimisation algorithm [41] implemented in the Optuna library [42] was used.TPE is a Bayesian algorithm that uses historical knowledge of hyperparameter configuration performance to predict the expected performance of future configurations, and thus select promising candidates.This method was chosen for its sample efficiency, robustness to various hyperparameter types, implementation availability, and ability to handle multiobjective optimisation.
To make use of TPE for neural network hyperparameter optimisation, a configurable feedforward neural network and training method was designed.The main elements of the network and training process are shown in figure 3, and the configurable hyperparameters and ranges are given in table 2. Scaled input values from Xtrain were passed to the network, which had a configurable depth and width (denoted 'a' and 'b', respectively, in figure 3).Each hidden layer of the network has the same number of nodes, and the nonlinear activation function in each hidden node was configurable (label 'c').The output layer of the network consists of two neurons-one to predict Te,t and the other to predict ỹRC,max /L.The numerical values returned by the output node predicting ỹRC,max /L are transformed by a sigmoid function to constrain  the values between 0 and 1, and the outputs predicting Te,t are transformed by a rectified linear unit (ReLU) to set the minimum possible prediction to 0. These output functions impose simple physical constraints on the network outputs, ensuring that predictions are always meaningful (i.e.do not have a target electron temperature less than zero or a radiation front position outside the bounds of the flux tube).
During training, the mean squared error between network predictions, Ŷ, and standardised Hermes-3-simulated outputs, Ỹ, was calculated and used as the loss.The algorithm used for the network optimiser (label 'd' in figure 3) was configurable from the list of options given in table 2. The loss was also calculated for predictions on Xvalid (which was not used for training) at each iteration of training, and the calculation was used in two ways.Firstly, the learning rate (which had a configurable initial value, label 'e') was annealed by reducing by a factor of 10 when there had been no improvement in the validation loss in the last 5 epochs of training.This strategy enables the use of a higher learning rate early on to avoid local minima, while decreasing the learning rate later in training to aid stability [43].Secondly, the training process was autonomously terminated when the validation loss did not decrease for 10 epochs, which mitigated over-fitting.The plots on the righthand-side of figure 3 show the training and validation loss and learning rate.Vertical lines indicate the iterations at which the learning rate was annealed due to plateauing of the validation loss.
The metric used to measure the performance of a trained model for the purposes of hyperparameter optimisation was a summation of the coefficient of determination of target electron temperature and radiation front position measured on the validation set, given by equation ( 1) where the (i) superscript denotes summation over individual data set values and ȳf denotes the average value of standardised feature f.Measuring performance on the unbiased validation set and not on the training set gives an estimate of the generalisability of the model to unseen data.The entire hyperparameter tuning operation was terminated when the validation performance of the 10 most performant models in the experiment plateaued, which minimised computational resource waste and allowed a hands-off approach to model training and tuning.
Figure 4 shows the cumulative best model metric improving monotonically as the number of trial hyperparameter configurations grows.The performance improvement eventually levels off, at which point the hyperparameter optimisation process is halted.Due to the strong dependence of neural network hyperparameters on the data used to train the model, the hyperparameters discovered during this optimisation process are specific to the training data set described in section 2.1 [44].However, the hyperparameter optimisation routine described above is not computationally expensive compared with running even a single 1D Hermes-3 simulation, and so may be performed readily if the data set is modified, enhanced or changed entirely.

Overall emulator performance
Following hyperparameter optimisation, the best-performing model was chosen to be the one with the highest metric score on the validation set.The hyperparameters found for this model are given in table 3. The best-performing trained model was then evaluated on the unbiased test data set, and the resulting performance metrics are given in table 4. Prediction performance on the test set is summarised in the calibration plots shown in figure 5, where the dashed line indicates perfect predictions.The metrics and calibration plots both provide evidence that the model is able to make good predictions of T e,t and y RC,max /L.There are a small number of outliers in the predictions of T e,t where the model makes a significant underestimation, which occurs in some instances with low values of n D + ,u .The possible risk posed due to these outlier predictions can be mitigated by performing further numerical experiments with higher-fidelity codes before deployment.It is also highly likely that further gathering of training data would improve the prediction accuracy and potentially eliminate these outliers [44].
Inference on the model takes around 1 ms and can be performed in batches.This is a significant reduction when compared with the several core-hours typically taken to generate solutions by direct simulation.
An ablation on data set size, n, was performed by uniformly sampling n between 100 and 3367 (the number of samples in  the full D train data set).For each value of n, that number of random samples were drawn from D train and the data used to train a model from scratch with the hyperparameters given in table 3. The procedure was performed five times per value of n, and the mean and standard deviation of the resulting coefficient of determination values measured on D valid are shown in figure 6 by the points and shaded regions, respectively.
Although the physics model used as the basis for the data set used in this work is reduced, the general trends evident in simulations with, for example, kinetic neutrals are conserved [45].
It is therefore reasonable to expect that a similar number of simulations with more complete physics would be required to obtain the same levels of accuracy.With modern computing capability, it should be possible to produce a data set of simulations with high physics fidelity that would enable the training of a machine learning-based emulator for use in downstream tasks such as experiment steering, digital twinning, and nonlinear control policy development.

Interpretability analysis
The Shapley additive explanations (SHAP) [46] method was used to examine the influence that each input feature has on predictions made by the best-performing model.SHAP uses a game theoretic approach to explain the impact of a given feature value on the output of a machine learning model in comparison to the average prediction of the model (which is equivalent to the model's prediction when no feature information is supplied).In this way, SHAP is able to provide both global and local explanations of the impact of a given feature on model output.Global explanations describe the overall behaviour of the model, while local explanations provide interpretations of model decisions for specific instances.Explanations are given as SHAP values, which quantify the contribution of a feature to the explained model's output and provide an indication for how much a feature value pushes the model's prediction away  from a baseline.A more thorough description of SHAP and SHAP values is given in appendix C. Figure 7 shows a summary of SHAP value distributions for T e,t and y RC,max /L predictions for each input feature, measured on the training set.Each point corresponds to a simulation from D train and is coloured according to the magnitude of the feature value normalised to the range of that feature in D train , which is referred to as the standardised feature value.For example, an n D + ,u data point with standardised feature value of 1 has a real value of 2 × 10 18 m −3 .Table 1 may be referred to for feature ranges.The left panel of figure 7 pertains to predictions of T e,t , and the right panel to predictions of y RC,max /L.SHAP provides a SHAP value for every feature and every prediction, so each row (which corresponds to a given feature) contains |D train | points.The points have been allowed some vertical freedom to give an indication of density.For example, there are significantly more instances with a SHAP value for n D + ,u between −0.1 and 0.05 than there are with a SHAP value between 0.4 and 0.45.It is clear from figure 7 that n D + ,u has, on average, the greatest influence on model output when predicting both T e,t and y RC,max /L, followed by P u .F rec,t has very little impact on model output for all values, while A, L and F Ne have moderate impact on average.In general, there is an inverse relationship between input values of n D + ,u , A, L and F Ne and the model's output of T e,t and y RC,max /L.The opposite relationship holds for P u .
The relationships can be interrogated further by examining the SHAP value dependence on given input values, which is shown in figure 8 for the prediction of T e,t based on the most impactful input parameters on average-n D + ,u , P u , A and L. Each point again represents a simulation in D train and is coloured by the value of n D + ,u , except the SHAP values for n D + ,u ,  which are coloured by P u .There is a prominent inverse relationship between low values of n D + ,u and T e,t .Interestingly, this relationship levels off above approximately 2 × 10 19 m −3 , beyond which the SHAP value takes on a relatively consistent negative value.This suggests that, beyond this threshold, a further increase to n D + ,u has little impact on the value of T e,t .
The remaining plots of figure 8 reveal a positive relationship between the input parameter P u and its influence on the model's predictions of T e,t , and the inverse for F and A.
Colouring by n D + ,u shows that, for high values of n D + ,u , the value of T e,t has significantly less variance than for low values of n D + ,u .Phrased differently, large values of upstream plasma density reduce the flexibility of other input parameters to vary the target electron temperature.The bow tie structure of the relationships suggests that around P u ≈ 3 × 10 7 Wm −2 , F ≈ 1.75 and L ≈ 22.5 m, the model is not significantly influenced away from predicting the baseline (expected) value of T e,t .
SHAP can also be used to explain the influence of input parameter values on model predictions for a single set of inputs.Two examples are shown in figure 9, which demonstrates the impact of input parameters on the model's T e,t prediction for an attached (top) and detached (bottom) plasma.The input parameters used here are the same that were used for the simulations shown in figure 2. The plots are to be interpreted by beginning at the baseline prediction and following the influence of each parameter on the model's prediction, which is ordered by magnitude.For the detached plasma, the model's prediction for T e,t is based most strongly on P u , n D + ,u and A, which all contribute negative modifications to the baseline and result in a predicted T e,t of 2.6 eV.The values of A and P u for the attached plasma have a negative influence on the model's output of T e,t relative to the baseline.However, the value of n D + ,u has such a strong positive influence that the model ultimately outputs a predicted T e,t of 54.1 eVsignificantly above the baseline.

A high-resolution density and power scan
To demonstrate the predictive capability of the trained emulator, two high-resolution parameter scans of n D + ,u and P u were performed-one with conventional divertor geometry, the other with Super-X divertor geometry.In both cases, F rec,t was set to 0.999 and F Ne to 0.03.n D + ,u was varied in the range [0.2, 2.5] × 10 19 m −3 and P u in the range [0.26, 1.08] × 10 7 Wm −2 .The Super-X divertor had L = 25 m, A = 2.5, while the conventional divertor had L = 15 m, A = 1.25.
A total of 1 × 10 6 sets of inputs were passed to the model, evenly distributed in the density-power space.Inference took a total of 600 ms.Results are shown in figure 10, where the left column shows T e,t and the right column shows y RC,max /L.The upper two plots are results from the Super-X divertor geometry, while the lower two are from the conventional divertor geometry.This demonstrates the capability of the model to provide real-time predictions of quantities of interest given varying shot parameters.

Summary and discussion
We have developed a large database of 1D, steady-state Hermes-3 simulations in MAST-U-like conditions.The data set covers the parameter space of attached, weakly detached and strongly detached plasmas, and contains measured quantities of interest to detachment diagnosis and control.
A machine learning-based emulator was designed to predict target electron temperature and carbon impurity radiation front position given the same set of variable input conditions that are passed to the Hermes-3 simulations.Through thorough hyperparameter optimisation and careful regularisation, the resulting model is highly accurate and able to operate in near-real time.The model is valid in attached and detached regimes, which presents an advantage over analytical formulations such as the two-point model [47].While modifications to the twopoint model exist that enable the inclusion of power and momentum loss terms between the upstream and downstream boundaries, these require significant assumptions of these loss terms, which can limit their utility [48].Furthermore, discrepancies also exist between two-point model solutions and more complete physical models, such as SOLPS [34,49].Use of the emulator described in this work as a fast predictor of target quantities of interest in place of a two-point model formulation avoids the above limitations.The data set used to train the neural network may also be of interest for developing analytical models or regression analyses, a brief example of which is given in appendix B.
The SHAP method was used to aid interpretability of the emulator model.Global feature importance analysis suggested that upstream density is generally the dominant parameter when determining target electron temperature and carbon impurity radiation front position, but upstream power flux, magnetic geometry and neon impurity seeding also have significant impacts.The effect of target ion recycling fraction was found to be negligible.Local feature importance analysis was also demonstrated.
Finally, to demonstrate the utility of the emulator for rapidly exploring parameter space, a 2D scan of upstream density and power flux was performed in Super-X and conventional geometries to measure the impact on target electron temperature and carbon impurity radiation front position.
Future work will include data from more complete physics models (such as Hermes-3 2D or SOLPS), which will allow for the direct application of the resulting emulator to advanced design and control tasks, such as nonlinear control policy design and digital twin development.

Appendix A. Dimensionality reduction for data visualisation with UMAP
Dimensionality reduction refers to the use of unsupervised learning algorithms to transform high-dimensional unlabeled data to a lower-dimensional space.It is typically used to engineer features for statistical learning models and/or to enable the visualisation of high-dimensional data sets in two or three dimensions [50].We focus on the latter use case in the remainder of this section as it is pertinent to the work presented in this paper.The UMAP algorithm used in this work is in the class of manifold learning dimensionality reduction techniques, which assume that high-dimensional data sets lie close to a lower-dimensional manifold (the so-called manifold hypothesis) and aim to learn the low-dimensional manifold.
In section 2.1, the data to be visualised is the set of carbon radiation and electron temperature profiles for every simulation in the data set.Each of these profiles has an associated value at every point on the simulation grid.It stands to reason that there are similarities between the radiation and temperature profiles of similar SOL plasmas.For example, attached plasmas will typically exhibit a radiation profile that rises to a peak at the divertor target and a temperature profile without steep gradients and comparable upstream and downstream temperatures.Conversely, the radiation profiles of detached plasmas typically show a prominent peak upstream of the divertor target and a large corresponding drop in temperature.Furthermore, if radiation and temperature profiles were generated by randomly assigning values to grid points, then only a vanishingly small proportion would be realistic profiles obtainable in a real-world tokamak.Therefore, the manifold hypothesis holds, and it is likely that the profiles represented in the data set lie close to a manifold that is of lower dimensionality than the number of grid points used in the simulations.
UMAP seeks to group similar samples (in this case, concatenated radiation and temperature profiles) in the highdimensional input space using a graph representation, and then project the resulting graph to a lower-dimensional space (in this case, 2D) so that it can be visualised while preserving the important features and relations between the high-dimensional inputs.A UMAP-reduced data representation (such as that shown in figure 2) can be used to develop intuition about the data used to generate the visualisation, including in the identification of discrete clusters, the relationships between data, and variability within groups.
Information on the mathematical formulation of UMAP can be found in the reference paper [51] and a detailed tutorial is provided in the project's documentation [52].

Appendix B. Comparison with scalings
Given the large simulated data set described in section 2.1, a question arises: can the data be used to arrive at a scaling law that applies to the parameter regime encompassing the data set?A brief analysis of this sort is described in this section, but a more detailed study is left as future work.
Taking all the input parameters described in table 1 into account, a typical power law expression [47] relating the target electron temperature to the input parameters takes the form where   by calculating the coefficient of determination (R 2 ) associated with predictions on X valid , giving R 2 (T e,t ) = −1.58 and R 2 ( y RC,max /L ) = 0.39.These values, along with the calibration plots show in panels 'a' and 'b' of figure 11, show that a scaling governed by equation (B1) cannot capture the relationship between the input parameters and output values when considering the entire parameter space covered in the data.
The scaling analysis may instead be limited to attached and detached regimes.By taking the detachment threshold to be when T e,t is below 10 eV [48], disjoint subsets were created from the D train and D valid sets that contained only attached or detached plasmas.Linear models of the form given in equation (B2) were fit to each of the training data sets and the corresponding validation sets used to calculate the coefficient of determination values as before, which are summarised in table 5.The coefficient values resulting from the fit are given in table 6. Calibration plots for the attached and detached validation subsets are shown in panels 'c' and 'd' and panels 'e' and 'f', respectively.Separating the attached and detached regimes shows improved predictive power of the scalings, particularly for attached plasmas.However, the accuracy of predictions made by the scalings are significantly lower than for the neural network-based model presented in the main body of this work.Furthermore, the scaling approach is not constrained to making physically meaningful predictions.For example, it sometimes predicts radiation front positions downstream of the divertor target.An advantage of this approach is that the lower flexibility of the scaling model makes it more easily interpretable and the model therefore does not require the SHAP analysis presented in section 3.2.
Overall, the neural network-based approach has the advantages of significantly improved accuracy, being bounded to give physically meaningful output values, and being able to make predictions in both attached and detached regimes.With the addition of SHAP analysis, the model is also interpretable.

C.1. Shapley values
To understand SHAPs, it is helpful to first build intuition with Shapley values [53].Shapley values are a concept within cooperative game theory, and in particular coalition formation.
In coalition formation, players form coalitions and the contributions of individual players are quantified based on their impact on the result of a game.In the context of machine learning, the process of generating Shapley values is a model agnostic method for determining the influence that an individual feature (a player) has on the prediction made by a model (the game).More specifically, for a single input vector passed to a given model, Shapley values answer the following question: 'How much does the value of each feature contribute to the difference between the model's prediction based on these features and the expected prediction of the model given a representative set of inputs?'For a trained model, the representative set of inputs used to determine the average prediction should come from the same distribution underlying the data set used to train the model.The Shapley value for a given feature value is its average marginal contribution across all coalitions of features, given by where f (X) denotes predictions by the model f given input features X, and S is a coalition (or subset) of the set of all features N.All permutations of features are considered when calculating the Shapley value, which nullifies the effect of correlations between features at the cost of exponential computational complexity.To mitigate this, the values may be calculated by considering only a subset of all possible coalitions.

C.2. Shapley additive explanations
SHAP is a collection of methods for calculating Shapley values for machine learning model predictions.The conventional Shapley value calculations described in appendix C.1 require a new machine learning model to be trained for all possible permutations of features, which is computationally intractable in most cases.SHAP makes the calculation of Shapley values for machine learning models feasible by including a kernel-based method.In this approach, a local linear model is trained to approximate the output of the model to be explained, weighted by feature subsets that are most likely to influence the output of the model.This drastically reduces the number of times a model must be evaluated to calculate Shapley values, as well as the computational expense of model evaluation.
Furthermore, the notion of a null player (who has opted out of the game) does not directly translate to statistical learning, where predictive models typically require an exact number of features.SHAP addresses this limitation by replacing features that are not included in a coalition by drawing from another randomly sampled instance.This step is repeated many times to average out the effect of particular randomly drawn instances.
SHAP values exhibit the following properties, which are useful for the interpretation of machine learning model predictions.

Additive
The explanation of the influence of a set of features on the prediction of a model is a linear combination of the explanations for each feature.

Locally accuracy
Local feature explanations sum to the model prediction based on these features when added to the base prediction.

Missingness
Missing features make no contribution to the explanation.

Figure 1 .
Figure 1.Distributions of target electron temperature and carbon impurity radiation front position.

Figure 2 .
Figure 2. Visualisation of the data set.Centre: UMAP-reduced space of electron temperature and carbon radiation profiles.Left, right: Electron temperature and carbon radiation profiles for typical attached and detached plasmas.

Figure 3 .
Figure 3. Schematic of a configurable neural network and training process.Plots on the right show training progress and the adaptive learning rate schedule.Labels 'a'-'e' are described in the text.

Figure 4 .
Figure 4. Cumulative best model metric (defined in equation (1)) improvement as a function of hyperparameter optimisation trial number.

Figure 5 .
Figure 5. Calibration plots of target electron temperature and carbon impurity radiation front position.

Figure 6 .
Figure 6.Variation of the coefficient of determination for target electron temperature and carbon impurity radiation front position with data set size for a model and training process with fixed hyperparameters.

Figure 7 .
Figure 7. Overview plot of the distribution of SHAP values in the prediction of target electron temperature (left) and carbon impurity radiation front position (right).

Figure 8 .
Figure 8. Showing the dependence on feature value of the feature importance in the prediction of target electron temperature.

Figure 9 .
Figure 9. Waterfall plots demonstrating local model interpretability of the target electron temperature prediction for an attached (top) and detached (bottom) scenario.

Figure 10 .
Figure 10.Showing the variation of target electron temperature (left) and carbon impurity radiation front position (right) with upstream density and power flux for Super-X (top) and conventional (bottom) divertor geometries.

Figure 11 .
Figure 11.Calibration plots for simulated Te,t (left panels) and y RC,max /L (right panels) against predictions based on a scaling.Panels 'a' and 'b' relate to the scalings derived from all simulations in the training set.Panels 'c' and 'd' show attached plasmas, and panels 'e' and 'f' show detached plasmas.

Table 1 .
Data set parameter ranges.

Table 3 .
Hyperparameters of the best-performing model.

Table 4 .
Performance metrics of the chosen model measured on the unbiased test data set.

Table 5 .
Coefficient of determination values for the scalings derived from attached and detached training subsets.

Table 6 .
Coefficient values for Te,t and y RC,max /L scalings derived in attached and detached regimes.See equation (B1) for symbol definitions.This section describes SHAPs, the method used in the main text to aid interpretation of emulator predictions.An overview of Shapley values, which are computed by SHAP, is given first.