Predict the last closed-flux surface evolution without physical simulation

One of the main challenges in developing effective control strategies for the magnetic control system in tokamaks has been the difficulty in obtaining the last closed-flux surface (LCFS) evolution results from control commands. We have developed a data-driven model that combines a predictive model and a surrogate model for physics simulation programs. This model is capable of predicting the LCFS without relying on physical simulation codes. Addressing the data characteristics of LCFS, we have proposed a specialized discretization approach to achieve dimensionality reduction. Furthermore, we have excluding the control references, the model can be seamlessly integrated into the control system, providing real-time LCFS prediction. Following comprehensive testing and multifaceted evaluation, our model has demonstrated highly satisfactory results of 95% or above, meeting practical requirements.


Introduction
Controlling the distribution of magnetic fields is a core research area in tokamak physics.It is necessary to keep the plasma confined.However, magnetic control is not trivial, especially for advanced configurations.This is due to the resulting distribution of magnetic fields is determined by the interaction between complex plasma state evolution and a wide range of actuator inputs.Last closed-flux surface (LCFS) is the magnetic field boundary between the confined plasma and the open field lines that interact with the divertor and vacuum wall in a tokamak.The shape and position of the LCFS a These authors contribute equally.* Authors to whom any correspondence should be addressed.
Original Content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence.Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.determine plasma equilibrium, stability, transport, and divertor performance.As such, accurately reconstructing the LCFS is essential for understanding and optimizing plasma performance in fusion experiments.The interaction between plasma and external circuits can be described by nonlinear partial differential equations.However, controller design techniques are based on ordinary differential equation (ODE) models that are usually linear, time-invariant, and of low order [1].
Although the magnetic control system has high requirements for time and accuracy, the typical physical workflows are difficult to balance efficiency and accuracy.For firstprinciple models, the accuracy depends on the completeness of the physical process involved.Tokamak discharge process exhibits non-linear, multiple-scale, multi-physical characteristics.Therefore, tools that can efficiently and reliably reconstruct the evolution of magnetic fields are crucial for both designing experiments and developing robust control strategies [2][3][4][5].
Magnetic field reconstruction can be approached through two methods: physics-driven and data-driven.Traditional approaches are physics-driven methods that tackle a timevarying, non-linear, high-dimensional inverse problem.They begin by predicting a set of currents and voltages for a group of actuator coils (such as poloidal field coils, etc) based on experimental design [4,6,7].Then, using simulation codes [5,8,9], the estimated tokamak plasma equilibrium is implemented, and based on this, the voltages of actuator coils are adjusted to achieve the desired objectives.In the past few decades, extensive researches have been conducted on physicsdriven methods for magnetic field reconstruction, leading to the development of various simulation codes, such as equilibrium fitting (EFIT) [10][11][12], LIUQE [13], RAPTOR [14].These physics simulation methods are often effective, but they heavily rely on accurate models and assumptions, which may introduce uncertainties.Moreover, when there are changes in the magnetic structure of the tokamak, a significant amount of effort and expertise from physicists is required to adjust the methods accordingly and adapt to the new conditions.
To overcome these bottlenecks or as a supplementary approach, an increasing number of researchers have turned their focus toward data-driven methods.In recent years, various machine learning (ML) approaches have also been explored in magnetic confinement fusion research for magnetic field reconstruction or prediction, and they have been applied in various scenarios, including equilibrium reconstruction and solver [15][16][17][18], plasma control [19][20][21][22][23][24][25][26], etc. From neural networks [27] to reinforcement learning [4,28,29], various data-driven ML methods have achieved promising results in the field of magnetic confinement.However, few studies [4,30] focused on the LCFS modeling/prediction.In that work [30] similar to ours, two models were proposed: the real-time and the offline models, each with its own advantages and applications.This particular work outputs the magnetic field measurements, then generates the equilibrium reconstruction through the EFIT code.This process requires additional inference time, resulting in decreased efficiency.In our model, we directly predict the LCFS, estimating discrete points of the curve in the specific angles of the polar coordinates.It is a combination of a EFIT surrogate model and a magnetic field measurement prediction model.This model is more accurate and efficient as it avoids the error accumulation of two different models.Furthermore, that work includes the shape reference and the nominal current of the poloidal field coils in the inputs.These signals serve as control objectives for the magnetic field shape and may exhibit a strong causal relationship with the LCFS.This can potentially result in the constructed data-driven model placing significant weight on the shape reference input, while treating other physical inputs as supplementary adjustments rather than uncovering relationships between low-dimensional physical quantities.We want to construct an approximate model of the nonlinear dynamical system to approximate the underlying physical processes rather than pay attention on references.Hence, we improved the input parameters and eliminated the influence of reference signals, enabling their use within real-time control systems.
In this work, we utilized large-scale data from the Experimental Advanced Superconducting Tokamak (EAST) [31][32][33] to train a 1D Shifted Windows Transformer model (1D-SWIN Transformer) [30] that effectively integrates the entire discharge process with subsequent EFIT simulation calculations.This model enables real-time reconstruction of the LCFS magnetic field.The 1D-SWIN Transformer exhibits the capability to efficiently extract multi-scale features from sequences and performs computationally fast even with long sequences.Due to its non-sequential structure, parallel computations can be easily implemented, which results in the model's computational complexity being nearly linearly correlated with the sequence length n.
This specific model can seamlessly integrate with a plasma control system (PCS) and effectively facilitate immediate magnetic field control.As a physics-enhanced data-driven model, it possesses the ability to predict future magnetic field evolution, and it runs without relying on any physics simulation codes.It provides essential information regarding the LCFS during the subsequent time step, significantly enhancing magnetic field control.This model has the capability to accurately forecast the magnetic field evolution throughout the entirety of the tokamak discharge process and provide the LCFS with eleven vital diagnostic signals and two essential actuator signals (detailed in table 1) as inputs, that are actual plasma current I p , normalized beta β n , toroidal beta β t , beta poloidal β p , electron density n e , stored energy W mhd , loop voltage V loop , elongation at plasma boundary κ, q at magnetic axis q 0 , q at 95% flux surface q 95 , in-vessel coil No. 1 current IC1, poloidal field coils voltage PF.

ML model
Our ML model follows a hierarchical architecture similar to the work [30], utilizing a variant of the SWIN transformer known as the 1D shifting window attention mechanism, to simulate the dependency and interaction between long sequence inputs and outputs.
Transformer [34] is composed entirely of attention mechanisms, which effectively reduce the distance between any two positions in a sequence to a constant value.This allows computations at time step t to be independent of the previous time step t − 1, resulting in better parallelism compared to the sequential structure of RNNs.Additionally, it addresses the issue of information loss during sequential computations.As a variant of the Transformer, the 1D-SWIN Transformer [30] adopts a layered structure similar to convolutional neural networks.It leverages gradually expanding receptive fields to extract multi-scale features.Moreover, the shifting operation enhances the computational capacity for particularly long sequences, such as high-sampling-rate actuator signals, while keeping the computational complexity nearly linearly dependent on the sequence length, n.This provides an efficient solution for processing long sequences while maintaining powerful feature extraction capabilities.The 1D shifting window attention mechanism with a 'window size' set to 1 is well-suited for real-time prediction, achieving a single-step inference time of ∼0.6 ms.

Data selection
In this study, we selected 11190 shots from the EAST tokamak (with discharge ranging between #53825-81692 from 2015-2018) [31][32][33] to construct the overall dataset.We use the same data processing approach from previous work [30].The dataset is divided based on chronological order, with 80% of the data allocated for the training set, and the remaining 20% split between the validation set and the test set and the validation set is relatively smaller proportions.
As indicated in table 1, the following variables have been chosen as inputs: actual actual plasma current I p , normalized beta β n , toroidal beta β t , beta poloidal β p , electron density n e , stored energy W mhd , loop voltage V loop , elongation at plasma boundary κ, q at magnetic axis q 0 , q at 95% flux surface q 95 , in-vessel coil No.1 current IC1, poloidal field coils voltage PF.The output is denoted as LCFS.The inputs comprise essential diagnostic signals and actuator signals from the tokamak , both of which influence the model's output, particularly the LCFS.Notably, the absence of control references in the inputs allows for seamless integration into the real-time control system.The output is the discretized representation of the LCFS.

Discretization of LCFS
From a physics perspective, the LCFS is commonly represented using the poloidal cross-sections of the flux surfaces, which correspond to a two-dimensional curve.How to effectively represent this two-dimensional curve while minimizing information loss and retaining essential features is a question worth considering.For neural network models, using a continuous curve expression to represent the 2D curve as input or Internal inductance 1 Internal inductance q 0 q at magnetic axis 1 q at magnetic axis q 95 q at 95% flux surface 1 q at 95% flux surface

PCS commands IC1
In-vessel coil No. output is difficult.It's challenging for the model to learn the distribution characteristics solely through equation parameters.Hence, discretizing the curve into a set of points becomes necessary.This raises the question of how to obtain and represent these points effectively while preserving the LCFS's distinctive features.This issue can be seen as the inverse problem of curve parameterization, wherein we seek an appropriate way to transform the continuous curve into a discrete set of points.To tackle this, we can draw parallels from three primary methods of curve parameterization, namely the uniform parametrization, the cumulative chord length parametrization, and the 'centripetal model' parametrization [35,36].
Let LCFS be represented as f (x), and we take n discrete points at It would yield a set of points evenly spaced along the x-axis.However, since the LCFS is a closed curve, this approach overly emphasizes symmetric points and may affect model parameter convergence.Additionally, it can overlook significant curve features with higher curvatures, leading to substantial errors.By drawing inspiration from the cumulative chord length parametrization, we can set ∆ = ´xi xi−1 f(t)dt, and starting from a point, use a fixed arc length as the step size to obtain discrete points.However, this method involves complex arc length calculations and, when used as an output, each point is relative to the previous adjacent point, potentially leading to error accumulation.
Finally, by analogizing the 'centripetal model' parametrization, we choose to set ∆ = θ i − θ i−1 , θ i represents the angle of the curve at point (x i , f i ) relative to the center point.In this method, we leverage the shape characteristics of the LCFS as a closed curve by placing it in a polar coordinate system.As shown in figure 1, we consider the magnetic surface center as the origin and use a fixed angle ∆ = 1 • as the step size to discretize the curve into 360 points that can be represented using the radial coordinate R. The fixed angle can be hidden in the data setting, allowing us to represent the points as a onedimensional array instead of their two-dimensional coordinates.Consequently, we transform the two-dimensional curve distribution into a one-dimensional vector that can be used as output in the model, facilitating accurate predictions and rapid convergence.

Model training
It is essential to perform basic data preprocessing before model training.We utilize the MapReduce [38] approach to calculate the mean and variance for standardizing all the data, including the LCFS data obtained from the physical simulation program.
As analyzed in section 2.1, the input dimension of our realtime model training is 384, which includes the previous system output.We utilize the teaching force technique during training to accelerate the learning process of the model.
Our model is implemented on CentOS 7 operating system and executed on 8x P100 GPUs.Similar to the previous work [30], we employ the following customized masked mean squared error (MSE) loss function during the model training process.
where x is batch experimental sequence data, y is batch predicted sequence result, x i j , y i j are the jth point values of the ith experimental and predicted sequences.f i is a signal data existence vector of ith experimental sequence, f i equals to 1 when the sequence exists and 0 otherwise.f i is used to mask a signal that does not have original data.The is another mask for the invalid length of the sequence.This loss function masks the signals that do not have original data and the sequence segments that are zero-padded during model training.This prevents the model from training on sequences without actual target values and meaningless zero-padding at the end.This approach improves the accuracy and speed of the training process.Furthermore, we utilize the bucket algorithm [39] for training acceleration and the Tree of Parzen Estimator algorithm [40] for architecture hyperparameter search.We also experiment with various optimizers and schedulers, and ultimately determine the optimal hyperparameter set as shown in table 2.

Results
The model training, validation, and testing using experimental data from the EAST campaigns between 2016-2020, with discharge numbers in the range #53825-81692 [31][32][33].The input and output can be found in section 2.2 for more details.We employed various similarity metrics to evaluate the model's performance from different perspectives.
Figure 2 illustrates the prediction results of our model for LCFS of EAST discharge #73978.This discharge lasts ∼50 s with the sequence length of ∼5 × 10 4 .As depicted in figure 2, our model accurately reconstructs the different LCFS configurations during the flat-top phase of the plasma current, rampup, and ramp-down phases in the entire discharge process.
The Pearson correlation coefficient is often used to measure model similarity [30,41,42], but it is not suitable for our model.This coefficient is commonly used for vector data, while our model deals with matrices.Specifically, our model outputs the distribution of LCFS over a long duration, and although we reduced the two-dimensional curve representing LCFS into a one-dimensional vector during the essential discretization process of training, we are still predicting a time series.Consequently, the output remains a two-dimensional matrix with a time dimension.Exactly, in our case, the Pearson correlation coefficient may not be the most suitable measure of similarity since it primarily focuses on the correlation between predicted values and ground truth.However, we are more concerned with the value of the errors between them.So we have employed multiple methods to measure the similarity of the predicted two-dimensional curve distribution over the time series of LCFS.
Firstly, we employ the 1-norm of the relative error matrix as a measure of similarity which is defined as follows [43]: ( In equation (3), S is the similarity, X and Y are the predicted and target values, all of which are matrix of dimension m × n.Where m is the sequence length and n is sequence dimension.
x ij and y ij are the ith point values of jth predicted and experimental sequences.In this context, S ∈ [0, 1], where the value of 1 indicates a perfect match between the predicted and true values, while the value of 0 indicates an error larger than the ground truth.
The 1-norm considers the entire LCFS at a single time step as a whole, effectively representing our prediction accuracy for the complete LCFS at that specific moment.As shown in 3, the average similarity of the test set during the flattop phase reaches 95%.This indicates that our model performs well in predicting the distribution of the LCFS during this critical phase.
Then, we utilize the infinity norm (∞-norm) of the relative error matrix as a measure of similarity, defined as follows: In equation ( 4), the symbols have the same meaning as in equation (3).
Under our discretization approach, from a physical perspective, the infinity norm of the error matrix focuses on the error at a specific angle along the LCFS curve.By considering the values at that position across different time steps as a whole, it effectively reflects the model's accuracy in predicting over longer time scales.As shown in figure 4, the similarity of the test set exceeds 95%.This indicates that our model performs well in predicting the distribution of the LCFS over extended periods, which is crucial for plasma control and stability during tokamak operation.
With the same predicted values, we still achieve a high level of similarity based on the infinity norm, which focuses on a specific position along the entire time sequence of the LCFS curve, even during the ramp-up and ramp-down phases.This demonstrates that the model's predictions can be effectively utilized in real-time control systems.The similarities of some discharges is less than 0.6.Through detailed review, we found that the impurity radiation of these discharges is high, reflecting that the tokamak wall conditions are poor.Poor wall conditions generally mean that discharges are challenging to reproduce and control.

Discussion and conclusion
By employing advanced dimensionality reduction techniques, we have successfully enhanced the previous model architecture to achieve convergence in predicting two-dimensional curve.This has resulted in a superior model that retains the linear computational complexity of its predecessor while directly simulating the magnetic field distribution during the discharge process.The outcome is a more efficient and accurate model that effectively couples the prediction of discharge process signals with subsequent simulation calculations.
Unlike previous works, our study concentrates solely on modeling and simulating the complex physical process of discharges, excluding the references for control to enhance inputs, aligning with the conditions of usage within real-time control systems.We have also proposed a suitable data discretization method for the last closed-flux surface (LCFS), enabling a multi-angle evaluation of model accuracy for different application scenarios.This approach allows us to better assess how well the model predicts the actual distribution and how close the predicted curve aligns with the ground truth data.
Our model directly predicts the LCFS and can be utilized without physical simulation programs.This feature allows for easy integration into control systems.Furthermore, our model can be utilized as a simulator in reinforcement learning applications [4,44].Obtaining a robust reinforcement learning model hinges on the simulator's ability to compute rapidly and closely match real-world conditions.The current model easily serves as a simulator, providing valuable support for reinforcement learning applications.

Figure 1 .
Figure 1.Model usage process in real-time control.(a) Model section, the model takes the LCFS output from the previous time step, the current plasma state obtained from the device, and the control commands as input to predict the LCFS for the next time step.(b) Control Command References, the prediction is compared with the target LCFS, enabling the model to provide more accurate control commands for the next time step.(c) Experiment section, the controller sends control commands to the tokamak device, and the device returns the detected current state.

Figure 3 .
Figure 3. Similarity distribution of predicted results on the test set, based on the 1-norm similarity metric.The test set (see section 2.2) is in discharge range #79342-81692 and some long-time discharges for a total of 1094 shots.

Figure 4 .
Figure 4. Similarity distribution of predicted results on the test set, based on the ∞-norm similarity metric.The test set (see section 2.2) is in shot range #79342-81692 and some long-time shots for a total of 1094 shots.

Table 1 .
The input and output signals of the models.

Table 2 .
Our model Hyperparameters.Model architecture can be found in figure2.