Development of a neural network model for peeling–ballooning stability analysis in the KSTAR tokamak pedestals

The neural network model, MISHKA-NN is developed to mitigate the computational burden associated with the linear ideal magnetohydrodynamic (MHD) stability analysis of the pedestal based on the peeling–ballooning (P–B) model. By utilizing both 1D plasma profiles (current density, pressure gradient, and safety factor) and 0D parameters (plasma geometry, total current, and toroidal mode number), the model predicts linear growth rate of edge-localized ideal MHD instability in a given equilibrium state. By enabling the prediction of each instability within a second, the model reduces the time required for plotting a pedestal P–B stability diagram ( j−α diagram) from approximately 100 CPU hours to a few CPU minutes. Notably, even with the utilization of parametric pressure and current profiles and plasma boundary shapes for the training dataset, the model shows a satisfactory level of performance in benchmarking the j−α diagram for the reconstructed equilibrium from a KSTAR tokamak experiment. We anticipate the model to serve as a versatile alternative to 2D linear MHD stability codes, alleviating numerical costs.


Pedestal and edge-localized mode
A pedestal at the edge of the plasma is one of the key players in high-performance tokamak operations based on H-mode [1].Formation of a pedestal structure can improve global performance via profile stiffness [2].
However, a pedestal usually coexists with a bursty instability named edge-localized mode (ELM), which periodically relaxes pedestal temperature and density profiles.The most commonly observed type-I ELM [3] is foreseen to have deleterious effects on future reactor-scale tokamaks based on the extrapolation of the estimated heat pulse towards the divertor plate in ITER [4,5].So, there have been numerous studies to uncover the physics behind ELMs (not only type-I ELMs, but also the other types of ELMs) to suppress or predict their occurrence.
The most widely accepted hypothesis for the origin of the type-I ELMs is that ideal magnetohydrodynamic (MHD) instabilities localized at the edge of plasma are the main triggers [6].Studies related to this hypothesis can be effectively incorporated into the peeling-ballooning (P-B) model [7,8].In the picture drawn by this model, a pedestal keeps cycling between stable and unstable regions, defined by the linear ideal MHD.During the evolution of a pedestal, the slope of a pedestal pressure profile is constrained by edge turbulence such as kinetic ballooning mode, which sets soft limit in the growth of a pedestal [9].
In contrast, the P-B mode (PBM) can be considered as hard limit, which sets the ultimate upper limit of the pedestal height.The soft limit can be different from plasma to plasma, as operating conditions may vary [10].However, the hard limit is quite universal compared to the soft limit regardless of the pedestal characteristics such as normalized ion Larmor radius (ρ * i ) or collisionality (ν * e ).According to the PBM, any pedestal that crosses the ideal MHD stability boundary will induce ELM and return to an MHD-stable state again.We note that resistivity can be introduced to the linear MHD stability analysis of the pedestal [11], enabling the description of additional types of ELMs, such as type-III ELMs.However, we focus on the linear ideal MHD stability in this paper.

Peeling-ballooning stability and j − α diagram
Utilizing the PBM, we can analyze the stability property of an arbitrary pedestal in a unified way.By giving perturbations to the pressure gradient and the current density of a pedestal, it is possible to generate neighboring equilibria close to the reference equilibrium [12].Then, by checking MHD stability for all the equilibria, we can draw a P-B stability diagram or so-called j − α diagram.Here, j and α represent the current density and normalized pressure gradient, respectively.The value for α at each point of the diagram is usually set to be the maximum value in the pedestal region for each equilibrium.Likewise, the value for j is the maximum value of the current density profile in the pedestal region [13], or occasionally set to the average of the maximum and the value at the edge (separatrix) ((j max + j edge )/2) [12], and at times, it is derived from the average over the pedestal [14].An example of the diagrams is shown in figure 1.
The j − α diagram visualizes the state of a pedestal.It shows whether the pedestal is stable or unstable to MHD instabilities and the distance between the operational point and the stability boundary can be thought of as a margin for further growth of the pedestal.For example, if an operational point is located at the black star in figure 1, we can assume that the pedestal can still evolve to sustain more pressure.Meanwhile, ELM may occur for the white star in the unstable region.
In many cases, the most unstable toroidal mode number (n) is also plotted together on the diagram, to show which kind of MHD mode will lead to ELM at the point [6,12,13].Examining the radial mode structure is the most definitive method for assessing the characteristics of the dominant mode [6].Nevertheless, through the inspection of the location of the equilibrium on the diagram and the most unstable n, it is typically possible to deduce the mode's characteristics without checking the mode structures one by one.
Usually, pedestals sitting at the nose of the boundary are unstable to PBM with intermediate-n around 5-15, which are dominant in moderate ν * e .The PBM is induced by coupling between the current density-driven peeling components and the pressure gradient-driven ballooning components.On the upper left region of the boundary in figure 1, ν  * e is relatively low, and bootstrap current can be largely induced compared to the other domains in the diagram.Peeling mode (PM) is driven by the current density and low-n ≃ 3 − 7 becomes most dominant on this side.Lastly, the right side of the boundary is typically dominated by high-n ≳ 10 − 20 ballooning mode (BM) driven by a pressure gradient, because the bootstrap current is suppressed due to high ν * e , too low to induce peeling type instabilities [9].

Motivation
To analyze the stability of a type-I ELMy H-mode pedestal, 2D linear MHD stability codes are widely used for the numerical calculation of the MHD instabilities [15][16][17][18].We have used MISHKA-1, which was first developed to calculate the linear growth rates of global MHD instabilities [15].It is also broadly used in the pedestal stability analysis [6,12,19].The code is usually executed for a series of n selected in a range 3 ≲ n ≲ 30 per equilibrium.Execution of the code can be numerically tolerable because linear MHD code like MISHKA-1 takes much less time than non-linear codes.A single linear stability calculation for each n usually takes a few minutes (≃10 3 s).However, plotting a j − α diagram can be quite a numerical burden because one diagram may contain 50-100 points.It approximately costs about ≳100 CPU hours in total to draw a single stability diagram.
Our study aims to mitigate the calculation time using a neural network (NN) technique.The NN approach has gained substantial attention and demonstrated its utility across diverse domains within the tokamak community [20][21][22][23][24], including prediction of the stable pedestal structure [25].
In this paper, the development of the NN model MISHKA-NN is described.The model has been designed to replace linear MHD codes in the analysis of the ideal MHD stability of the pedestal for the KSTAR tokamak [26].The paper also demonstrates its application in drawing the j − α diagram.The concept and a prototype have already been introduced in [27].The key idea is to map the equilibrium (especially that of the pedestal) to linear growth rates for each n calculated by the MISHKA-1.For simplicity, the code MISHKA-1 and its NN model MISHKA-NN will be respectively abbreviated as M-1 and MISHKA-NN (M-NN) throughout the paper.
The remainder of the paper is organized as follows.First, we introduce the methods for generating and collecting parametric equilibria that constitute the training dataset in section 2. In section 3, the model architecture and training process are described.In section 4, the validation and application of M-NN are covered with some discussions.Lastly, the final section covers a summary and conclusion.

Dataset collection
In the assembly of the massive training dataset, equilibria are generated by assigning random coefficients to the parametric profiles.Subsequent subsections elucidate the configurations of these profiles and provide detailed explanations of the routines utilized in collecting the dataset.

Parametric profiles and boundaries
The pressure profile is created using the following formula [28]: where p 0 and p 1 refer to the pressure component for the core and pedestal, respectively.ψ n is the normalized poloidal flux, which is set to 0 at the magnetic axis and 1 at the plasma surface.ψ ped , ψ mid , and ∆ are radial positions of the pedestal top, pedestal center, and pedestal width in the unit of ψ n , respectively.α p0 and α p1 are exponents which control the profile of the core pressure component, and u is the Heaviside step function.
Next, we introduce a parametric current density profile in equation (2), which consists of three terms: ( The first term represents the core part of the profile, where j 0 is a coefficient determining the height, α j0 and α j1 are the exponents that control the peakedness of the term, respectively.The second term resembles a Sigmoid function and is introduced to imitate typical current density profiles around ψ n ≃ 0.8 − 0.9.Within this term, j 1 and σ 1 serve as parameters controlling the height and width of the term, respectively.The last term is the bootstrap current proxy proportional to the pressure gradient, where j 2 and σ 2 modulate the height and width of the component, respectively.By adjusting the parameters above, it is possible to imitate the bootstrap current models in [29,30] closely.With equation ( 2), exaggerated or underestimated current density profiles can also be generated.Even though the exaggerated and underestimated profiles do not resemble the shape of a current density profile which satisfies the neoclassical theories, they are also included in the dataset to provide neighboring data of a realistic equilibrium in j − α space.
Lastly, the plasma boundary is generated using the following equations: Here, θ represents the poloidal angle, while R and Z denote the radial and vertical plasma boundary points.The variables R 0 and a correspond to the major and minor radii, and κ, δ u , and δ l denote the plasma elongation, upper triangularity, and lower triangularity, respectively.The profiles and boundaries are generated by random seeds in the first loop in algorithm 1. Figures 2 and 3 show examples of profiles and boundaries within the dataset.The normalized current density (j N ) in figure 2(b) is defined as: where I p is the total plasma current and A is the plasma poloidal cross-sectional area.α and safety factor (q) profiles in figures 2(c) and (d) are acquired by solving the equilibrium using CHEASE [31].The definition of α is as follows [32]: Here, V represents the plasma volume enclosed by the poloidal flux ψ, and R c denotes the geometric center of each ψ surface.

Routine for dataset collection
A routine for collecting the dataset is illustrated by the pseudocode in algorithm 1.
In the first loop, parametric pressure, current density, and boundary are randomly generated by selecting the coefficients such as p 0 , j 0 , R 0 , ... in equations ( 1)-(3) using a random number generator (RNG), which is set to include typical KSTAR tokamak operational regimes.
At this stage, random numbers are selected in the parameter ranges and some unintended equilibria may be also generated.For example, equilibrium with very high edge safety factor q a ⪆ 15, or extremely low axis safety factor q 0 ≪ 1 can be generated by a combination of the coefficients at the extremes of parameter ranges.Given the complexity of presetting the RNG to exclude all potentially unrealistic combinations, these unrealistic cases are filtered out as follows.During the numerical equilibrium calculations performed by CHEASE, some of the previously mentioned unrealistic cases are preemptively removed because CHEASE usually fail to converge for the cases.Another subroutine is implemented to further eliminate the improbable equilibria based on safety factor value criteria mentioned above.
During each iteration, five equilibria are created, sharing the same core components (first terms in equations ( 1) and ( 2) and the plasma boundary but featuring different edge profiles (remaining terms in equations ( 1) and ( 2).This approach is chosen to enhance dataset diversity and capture various edge characteristics so that the NN can effectively learn the complex relation between the growth rates of instabilities and the edge profiles.
In the second loop, M-1 is executed for fifteen randomly selected n in the range of 3 ⩽ n ⩽ 30, for each equilibrium that has passed the first loop.The probability of selecting each n is adjusted to prefer a low-n, composing a set of n values similar to those in [6,9].This allows the dataset to have higher resolution for the domains where it needs to be.
In an unstable case, M-1 returns the growth rate normalized to the Alfvén frequency (γ N = γ/ω A ), which is then added to the output dataset.Here, γ N , γ, and ω A refer to the normalized growth rate, the growth rate, and the Alfvén frequency, respectively.On the other hand, the code does not calculate the negative growth rate for a stable case; it will terminate if it reaches the maximum iteration limit before detecting instability.In this case, the equilibrium is assumed to have no positive growth rate for the n, and γ N = 0 is recorded in the output dataset.
In the context of the pedestal stability analysis, the term 'stable' can refer to two situations: one characterized by γ N = 0, indicating absolute stability where the actual growth rate is negative, and the other with a marginally positive growth rate (technically unstable) below the critical threshold γ crit N , assumed to be linearly stabilized.The criterion for γ crit N is somewhat arbitrary.The mode is considered unstable if γ N > 0.03 [33,34], or more strictly γ N > 0.001 [12], or often criterion based on diamagnetic stabilization is used [8,9,33].In the remaining sections of the paper, the term 'stable' refers to all the cases inside the stability boundary, γ N ⩽ γ crit N .We have used γ crit N = 0.03.The q profile of each equilibrium within the input dataset is slightly adjusted so that it maintains x 0 (x 0 = m 0 − nq a ) constant for every n in the n-scan, where m 0 is defined as m 0 = Int[nq a ] + 1, representing the first rational surface outside the plasma [7,16].This process brings minor changes to the equilibrium but enables M-1 to obtain consistent results by avoiding overestimation of the effect of peeling components.This kind of technique is commonly utilized in the pedestal stability analysis [35,36].In this study, x 0 is set to 0.3.
The aforementioned loops are executed sequentially, producing the initial raw database.Subsequently, following a postprocessing step intended to eliminate ill-converged stability calculations, a final dataset consisting of approximately 113 400 sets of input and output data is assembled.For the purpose of visualizing the distribution of the collected equilibria, the 2D kernel density estimation (KDE) technique is employed, utilizing a Gaussian kernel.Figure 4 depicts the estimated density distribution of the input dataset in the j − α space.The region with a high concentration of data points (dark color) predominantly includes the operational regimes of the KSTAR tokamak.However, for extreme conditions, the density of data points tends to be relatively lower.

Design and training of the NN
This section covers the design of the inputs, a description of the model architecture, preprocessing of the datasets, and the training scheme.

Inputs for the NN
M-NN is a regression model that predicts γ N of the edgelocalized ideal MHD mode for the equilibrium represented by the input data.The inputs in this study include 0D parameters and 1D plasma profiles.The parameters and profiles are listed in table 1.The first five parameters in table 1 are chosen to define the plasma geometry and boundary shape.The shaping parameters (κ and δ u,l ) are selected as inputs because they are known to notably affect the P-B stability properties [37].The three 1D profile inputs for M-NN consist of j N , α, and q.I p is included to provide a normalization factor for the j N profile.Here, R 0 , a, κ, and δ u,l are the inputs for CHEASE, generated from equation (3), while I p , j N , α, and q are the outputs of CHEASE.n is not an equilibrium-related parameter, but is added to the inputs to give the information about the instability to M-NN.
In the previous study [27], j N and α profiles are compressed into a few parameters by Gaussian function-fitting, and q profile is represented solely by q a .This approach enables the efficient compression of profiles, facilitating the design of a straightforward input layer.However, this compression results in some loss of pedestal information.Therefore, in this study, 1D profiles are fed to M-NN using 1D convolutional layers.Convolutional layers (Conv layers) and convolutional NNs are extensively utilized in deep learning due to their ability to extract crucial features from images (time-series data) while maintaining the interrelations among neighboring pixels (time steps).Typically, pooling layers follow Conv layers, reducing the dimensions of the data while retaining vital information.In many cases, Conv layers are positioned before fully connected (FC) layers, which carry out classifications or inferences, effectively acting as preprocessors [38].
The 1D profiles can be conceptualized as 1D image with three channels (similar to RGB channels in 2D color image), and each ψ n surface can be treated as pixels within the image.To decrease the dimension of a 1D image, each profile is chosen to include only 32 points in the range of 0.75 ⩽ ψ n ⩽ 1.00 that represent the pedestal and intermediate region between the edge and the deep-core region.

Model architecture
Overall model architecture is illustrated in figure 5.The model can be divided into two main parts: a preprocessing unit and a prediction unit.
The preprocessing unit is on the left side of the dashed line in figure 5.It consists of a series of Conv and pooling layers which are responsible for extracting features from the 1D profiles.The Conv layers employ the same-padding, which prevents the loss of edge information (ψ n = 0.75 and ψ n = 1.00) during convolutions.Schematic representations of Conv kernels are provided adjacent to each layer in the form of compact white boxes.The kernel sizes are indicated by the height of the stacks of boxes (1× and 3× in figure 5 also refer to the kernel sizes).The first numbers below each Conv layer represent the profile dimension (height in the figure), while the second ones correspond to the number of profiles (depth in the figure) generated by the kernels used in each layer.Max-pooling (MP) layers with a pooling size of 2 and a stride of 2 are chosen as the pooling layers for M-NN.The MP layers reduce the dimension of the profiles by half, by selecting the maximum value between two neighboring values in each profile.By piling multiple Conv layers with small kernel sizes (1x and 3x, rather than 5x or greater), it becomes feasible to attain a sufficient level of non-linearity, comparable to that of a more complicated model employing larger kernel sizes [39].This implies that the model can perform tasks of similar complexity with a reduced number of parameters, a characteristic that is advantageous from a standpoint of generality.
The colors indirectly signify the degree of compression and non-linear mixture of information, growing darker in the direction of the data flow.Following the arrows in the figure, it can be seen that the dimension of the profiles decreases while the number of profiles increases.This can be interpreted as the information contained in the three original profiles on the leftmost layer being non-linearly fused and reorganized into more compact latent profiles.These latent profiles in each subsequent layer contain useful features for determining the stability of the equilibrium.After traversing four Conv layers and two MP layers, 64 final latent profiles each with a dimension of a quarter of the initial profiles are generated, as each MP layer reduces the dimension of the profiles from 32 to 16, and then from 16 to 8.
Subsequently, these profiles are averaged profile-wise using a global average pooling (GAP) layer, resulting in 64 parameters.These parameters are then transformed into nine 0D-like parameters through FC layers (pale yellow FC layer on the right of the GAP layer).These nine parameters are the final output of the preprocessing unit, and they are fed into the next unit along with the 0D parameters.
The prediction unit, located on the right side of the dashed line in figure 5, consists of the input layer (pale yellow), two hidden layers (pale green), and the output layer (pale red).All the layers in this unit are FC layers, and the size of each layer (number of neurons) is indicated below each layer.The nine parameters from the preprocessing unit and the seven 0D parameters mentioned in table 1 are concatenated to compose the input layer.Then, two hidden layers with a layer size of 256 and 32 each, perform prediction based on the inputs.Dropout layers are added after each hidden layer to mitigate model overfitting [40].Finally, the output layer prints the intermediate prediction made by the hidden layers.The intermediate (γ N ) predictions are then postprocessed, which will be covered in the next subsection.The architecture of M-NN includes 21 402 internal parameters, and both units possess nearly equal number of parameters.

Training of the NN
M-NN is designed and trained using Keras API [41].When designing the model, two essential features must be determined.These two features are an activation function, which introduces non-linearity to the model, and a loss function which represents the value the model aims to minimize throughout the training process.For M-NN, the Rectified Linear Unit (ReLU) function, denoted as f(x) = max(0, x), is chosen as the activation function for all layers [42].The loss is mean squared error (MSE), which is defined as MSE = Σ(y − y 0 ) 2 /N, where y, y 0 , and N refer to a predicted value, a true value, and the total number of samples, respectively.To prevent the network from becoming overly reliant on specific parameters, all input values are rescaled to locate within the range of 0 to 1.
M-NN is constructed as a composite of 32 sub-models.Each constituent sub-model of the M-NN ensemble undergoes separate training.Owing to the stochastic nature of the training process and the independence between sub-models, their predictions collectively constitute a set of diverse growth rates.The ultimate output of M-NN is obtained by calculating the average over the set.The ensemble ambiguity is established through the standard deviation of the sub-outputs [43].This strategy is known as ensemble averaging, a widely utilized technique that aims to improve performance and partly alleviate the overfitting tendencies of individual sub-models.Although various methods exist to optimize weights for averaging, we have chosen a simple equal-weight averaging.This choice is straightforward, yet it exhibits a satisfactory level of performance.
Each sub-model is trained using a sub-dataset randomly generated from the original dataset.Moreover, to perform the final test of M-NN, it is essential to secure a dataset portion that has not been utilized by any of the ensemble components.Thus, the original dataset is partitioned into multiple subsets according to the following logic and guidelines.
The original output dataset comprises two output classes: γ N = 0 and γ N > 0. In the context of binary classification, the ratio between the two classes is roughly 40:60.This distribution can be regarded as suitable for classification tasks.However, M-NN functions as a regression model, designed to predict γ N values rather than merely classifying stable/unstable cases.In this situation, an imbalance in the distribution of values can pose training challenges.The class with γ N > 0 is affected by a continuous distribution, whereas the class with γ N = 0 stands out as a significantly predominant group.So, the original dataset is first split into two groups of γ N = 0 and γ N > 0.Then, 5% of each group is separated in the first place, as the test dataset.
After excluding the test dataset, a total of 32 sub-datasets are generated from the remaining pools.To partially address the imbalance among the groups, only 75% of the group with γ N = 0 is randomly sampled each time to assemble the subdataset, whereas the entire dataset from the group with γ N > 0 is included.Following the generation of 32 distinct subdatasets containing randomly selected 75% of the γ N = 0 data, each sub-dataset is subsequently divided into training and validation datasets with a ratio of 75:25.A total of 32 distinct pairs of training and validation datasets are prepared, along with one final test dataset.The size of each dataset is described in table 2, and the histogram of the test dataset is presented in figure 6(a).
To further optimize training efficiency, γ N values within the output datasets are subject to preprocessing based on the following assumption, aimed at redistributing the data.In the P-B stability analysis, the primary task is to determine if the equilibrium is either stable or not.From this perspective, M-NN needs higher accuracy in a relatively low growth rate domain (0 < γ N ≲ 0.06) which is close to the stability thresholds, while predicting the approximate growth rate is acceptable in a deeply unstable domain with γ N ≳ 0.06.Hence, the raw γ N values in the initial output datasets are reshaped to form a denser distribution within the range of 0 ⩽ γ N ≲ 0.06, thereby imposing a higher penalty on M-NN for making inaccurate predictions within this range.Outside of this range, a more relaxed distribution is applied using the following transformation based on the Sigmoid function: Here, k is a factor determining the degree of concentration near γ crit N .The final term is included to preserve a labeling of γ N = 0 in the transformed distribution: γN = 0.The transform with k = 10 and γ crit N = 0.03 is visualized in figure 6(b).Upon examining both the transformation relation (black curve) and the derivative of the transform function (red curve), it becomes evident that the new distribution is concentrated in the lowgrowth rate region.In this study, the value of k is set to 10, after some tuning to achieve the optimal degree of transformation.For 16 sub-models, γ crit N is set to 0, while for the remaining 16 sub-models, it is set to 0.03.Each sub-model learns the transformed growth rate values, and their outputs are inversetransformed prior to the process of ensemble averaging, as depicted in the rightmost procedure (an arrow with the Inv sign above) in figure 5.
By adjusting the distribution through equation ( 6), M-NN is directed to learn efficiently within the domain with γ N > 0. Nonetheless, a fundamental issue regarding the output dataset remains unresolved.Even though each equilibrium is a unique state with different stability properties, all equilibria with γ N = 0 are mapped to γN = 0, unlike the equilibria with γ N > 0. Due to the remaining imbalance, M-NN may still not be able to train efficiently near γ N = 0.
The utilization of ReLU activation is employed to address this matter.In a typical model architecture, the output layer does not incorporate any activation.However, by the application of ReLU activation, M-NN can learn negative growth rates for equilibria labeled with γN = 0, effectively treating them as if they were γN < 0. In other words, M-NN can guide itself to internally remap the inputs from equilibrium which are initially mapped to γN = 0, to an alternative negative value instead.This negative value is then rectified back to 0 to align with the original label of γN = 0 in the final step.This is indicated by ReLU sign next to the pale red box in figure 5. Consequently, M-NN can learn a more natural distribution, even within the domain with γN = 0.The negative predictions generated by one sub-model of the M-NN ensemble are visualized in figure 6(c), along with its positive predictions, which correspond to γN > 0. The figure illustrates that M-NN recognizes the output dataset as being extended to negative γN region, rather than being truncated at 0.
Lastly, log n is used as an input to M-NN, instead of raw n.This preprocessing effectively rescales the distance in the n-space, concentrating more in the low-n region.The function of this scheme can be explained through the following example, which consists of two cases.In the first case, the most unstable mode is a n = 5 PM, while M-NN predicts it to be dominated by a n = 10 PBM.In the second case, the dominant mode is n = 25 BM, while M-NN predicts a dominance by n = 30 BM. Making a mistake in the first case is more critical because it can alter the characteristics of the instability.On the other hand, an error in the second case has a relatively minor effect, despite the same level of inaccuracy.Along with the previously discussed biased probability distribution for selecting n in section 2.2, this measure encourages M-NN to make accurate predictions in the low-n domain, while permitting relatively larger errors in the high-n domain.

Training results
Every sub-model comprising the M-NN ensemble is trained using the Adam optimizer [44], and the training progress of M-NN is visualized in figure 7. The implementation of the earlystopping method, which concludes training when the validation loss stops to decrease over a specific number of iterations, leads to distinct epochs at which each sub-model halts training.Nevertheless, all sub-models undergo training for a minimum of 600 epochs.In figure 7, it is evident that both losses generally exhibit a monotonic decrease, implying that the ensemble has effectively trained the dataset and stopped before overfitting the training dataset.
To illustrate the performance of M-NN, a 2D KDE plot is shown in figure 8 for the benchmark against the test dataset.A closely aligned distribution along the y = x line with R 2 = 0.990, M-NN can be deemed well-trained and suitable for realworld applications.

Validation in plotting j − α diagrams
To validate the general applicability of M-NN in the P-B stability analysis, j − α diagrams for two equilibria are presented as examples in figure 9.The growth rates of ten toroidal mode numbers (n = 3, 5, 6, 8, 10, 12, 15, 20, 25, and 30) are computed.To enhance the visibility of distinctions between the diagrams produced by M-1 and M-NN, the colormap is more segmented than in figure 1.The blue-spectrum (dark blue and blue) colors denote the area containing the stable equilibria with growth rates below the stability threshold, while the purple and red-spectrum (red and dark red) colors indicate regions that have surpassed the threshold.In these diagrams, the stability threshold is defined as γ N = 0.03.The reference equilibria are denoted by white squares in the diagrams, accompanied by error bars representing 20% uncertainties.The dimensions of the error bars correspond to the  measurement uncertainties typically observed in the other diagrams [12,45].
The diagrams in figures 9(a) and (b) are constructed using parametric density and temperature profiles.An equilibrium reconstructed from shot 16 545 [45] of the KSTAR tokamak is employed as the reference equilibrium for the diagrams in figures 9(c) and (d).The current density profiles are calculated from the Sauter current model [29] for both equilibria.
It can be seen that M-NN offers satisfactory predictions for the stability boundary and the contours, in both cases.the equilibrium's marginally unstable state sitting on the nose of the P-B side boundary in figure 9(d).In both diagrams, the deviations from the reference diagrams can be deemed acceptable, when taking into account the typical measurement uncertainties in the pedestal region, which are represented by the error bars.
Figure 10 presents detailed results of the n-scan.In figure 10(a), stars are used to mark three equilibria in the j − α space, which are selected for the presentation of the n-scan results.In figures 10(b) and (c), we plot the current density (j ϕ ) and α profiles of the selected equilibria.Figure 10(d) displays the growth rates predicted by M-NN (dashed squares) and those calculated by M-1 (solid circles).The M-NN predictions show good agreement with the growth rates calculated by M-1, yielding an R 2 value of 0.91.Simultaneously, M-NN also accurately predicts the most dominant n at each equilibrium.As we progress from the mint star (peeling side) to the yellow star (P-B side) and further to the magenta star (ballooning side), M-NN predictions closely follow the results of M-1, where the dominant n exhibits an increasing trend.

Error analysis
In this subsection, the error analysis results are presented.Figure 11 displays the density distributions of errors between values calculated by M-1 and those predicted by M-NN.The error density for the entire test dataset (black solid curve) exhibits a peak near 0, indicating that M-NN is optimized for accurate predictions in general.The dashed curves show that the deviation of the distribution increases gradually as the domain transitions from relatively low γ N to high γ N (from green to blue and then to red), while the peak is still close to 0. This observation suggests that M-NN provides more accurate predictions in the low γ N domain, as it is intentionally designed to enhance its accuracy near stability boundaries.

Reductions in numerical cost
The decrease in the overall time needed for plotting a j − α diagram due to the utilization of M-NN is elucidated.The total computation time is approximately proportional to T E + N S T S ,  where N S represents the number of n values per equilibrium, and T E and T S denote the typical time required for neighboring equilibrium calculation and stability calculation, respectively.The time scales for T E and T S are on the order of 10 2 s and 10 3 s, respectively.
By employing M-NN, T S can be decreased to ≲ 10 0 s.The utilization of M-NN can also lead to an indirect reduction in T E .The 1D profiles and 0D parameters for M-NN can be acquired through equilibrium solvers using coarse grids, while input files for M-1 need equilibria in higher resolution.Thus, the generation of inputs for M-NN typically incurs a computational cost approximately ten times lower than that for M-1, resulting in a reduction of T E to approximately 10 1 s.It is noteworthy that with the utilization of M-NN, the bottleneck shifts from the stability calculation to the equilibrium calculation.
When considering a typical number of equilibria and a range of n values for each equilibrium in the j − α diagram, the generation of a single j − α diagram using M-1 requires approximately 100-200 CPU hours.However, M-NN can reduce the total computation time into a few CPU minutes.When employing parallelization across 32 cores, assigning one core to each sub-model of M-NN, the computational time is further reduced to a few tens of seconds, whereas M-1 still demands several hours.

Conclusion
Plotting the j − α diagram has become a common practice for characterizing the stability of an H-mode plasma pedestal.However, plotting such diagrams requires conducting a substantial number of instability calculations using 2D linear MHD stability codes like MISHKA-1.In order to mitigate this computational burden, we have developed M-NN using NN techniques.
M-NN predicts the linear growth rate of an edge-localized ideal MHD instability with a specific toroidal mode number (n) for a given equilibrium.The current version is specifically designed to accelerate the analysis in the KSTAR tokamak.Parametric pressure profiles, current density profiles, and plasma boundaries are generated in a random manner to mimic the typical KSTAR-like numerical equilibria, forming the training dataset.M-NN can also undergo expansion through the integration of new input parameters, such as resistivity, and by incorporating additional numerical equilibria representing other tokamaks into the dataset.Further validations can be also possible utilizing more recent bootstrap current model [46].Through these enhancements, M-NN can be readily applied for the linear MHD stability analysis in other tokamak pedestals.
M-NN takes 1D radial equilibrium profiles and 0D plasma parameters as inputs.It is composed of two main components; the preprocessing unit, which extracts meaningful features from the 1D profiles, and the prediction unit, which uses the extracted features along with the 0D parameters to predict the growth rate.Suitable transformation and techniques are employed to ensure the effective functionality of M-NN in the pedestal stability analysis.The paper includes cases where M-NN is applied to plotting j − α diagram, effectively demonstrating its notable overall performance.In addition, detailed n-scan results are presented to confirm the accurate prediction capabilities of M-NN.
To create a j − α diagram, equilibrium recalculation and stability calculation must be performed multiple times.By employing M-NN, the computational times for equilibrium and stability calculations are decreased by a factor of 1/10 and 1/1000, respectively.The primary computational bottleneck shifts from the stability calculation to the equilibrium calculation.As a result, the overall computational cost can be significantly reduced from the magnitude of 10 2 CPU hours to below 1 CPU hour.
Finally, we propose potential future works and applications.One promising direction is the utilization of M-NN for realtime pedestal stability analysis.Achieving further reductions in equilibrium calculation time could pave the way for the realization of the real-time pedestal stability analysis based on the P-B theory in the future.For example, the Physics-informed Neural Network technique [47] can potentially address the bottleneck related to the equilibrium calculations.This might pave the way for an advanced real-time ELM prediction, effectively working as a dashboard for the pedestal stability.
Furthermore, the integration of M-NN into other workflows related to the pedestal stability will be tried in the future.Examples of such workflows may include expediting the prediction of pedestal height based on the EPED model [9] or exploring the feasibility of operational regimes such as Super H-mode [48] in the KSTAR tokamak.These workflows also require a similar amount of stability calculations to that of plotting a j − α diagram, thus being attractive for future applications of M-NN.

Figure 1 .
Figure 1.An example of j − α diagram.The x-axis and y-axis correspond to the normalized pedestal pressure gradient and current density, respectively.At each side of the stability boundary, the most dominant modes are indicated, with PM, PBM, and BM representing Peeling Mode, Peeling-Ballooning Mode, and Ballooning Mode, respectively.Stars represent two equilibria with different pedestal pressure gradients and current densities.

Figure 2 .
Figure 2. The profiles of five equilibria from the input dataset.The profiles are plotted within the range of 0.75 ⩽ ψn ⩽ 1.00.Profiles of the plasma pressure p (a), normalized current density j N (b), normalized pressure gradient α (c), and safety factor q (d).The colors of the profiles correspond to each selected equilibrium.

Figure 3 .
Figure 3. Examples of the boundaries from the input dataset.R and Z respectively indicate the radial and vertical positions in a poloidal cross-section.

Figure 4 .
Figure 4.The 2D KDE plot of the equilibria constituting the input dataset in j − α space.Here, j N max and αmax refer to the maximum values of the current density and the normalized pressure gradient within the pedestal region.The colormap visually represents the relative density of points across the 2D space.

Figure 5 .
Figure 5. Schematic representation of the M-NN model architecture.The preprocessing and prediction units are demarcated by dashed lines.Labels indicate the layer's name and intended function.The preprocessing unit receives the input of the three 1D profiles, while the seven 0D parameters are directly fed into the prediction unit.A list of the inputs is listed in the legend.The figure does not include the depiction of input scaling.

Figure 6 .
Figure 6.(a) The histogram of the output dataset for the final test, with an enlarged version positioned in the top-right corner.The x-axis represents the growth rate, while the y-axis represents the frequency of the bins.(b) Depiction of the transformation (black) and the gradient of the transformation (red).The red curve is represented in arbitrary unit.The x-axis and y-axis refer to the original and transformed growth rates, respectively.(c) An instance depicting the intermediate predictions, γN , prior to undergoing the ReLU activation.The histogram is constructed for the same dataset as in (a).The x-axis is expressed in terms of the transformed growth rate units.γ N = 0 of (a) is expanded into the negative γN domain.

Figure 7 .
Figure 7. Log-log plot illustrating the trends of the averaged training and validation losses of the M-NN ensemble with the shaded region indicating the variance of the loss values.The x-axis represents the trained epochs, while the y-axis represents the MSE loss.
The R 2 values in figures 9(b) and (d) denote the similarity between figures 9(a) and (b) and between figures 9(c) and (d), respectively.The values also show a high level of prediction capability.Although there are some discrepancies

Figure 8 .
Figure 8.A 2D KDE of the regression results of M-NN against the test dataset.The x-axis and y-axis represent γ N calculated by M-1 (ground truth) and predicted by M-NN (prediction), respectively.The R 2 value is provided in the lower-right corner of the figure.The colormap indicates the relative density of each data point.

Figure 9 .
Figure 9. j − α diagrams are plotted for a KSTAR-like numerical equilibrium in (a) and (b), as well as for KSTAR shot 16 545 in (c) and (d).The diagrams on the left are generated using M-1 (reference), while those on the right are produced by the M-NN predictions.The boundaries drawn by M-1 are also illustrated in (b) and (d) using black dashed lines for comparison.In each diagram, the x-axis and y-axis represent the normalized pressure gradient α and the normalized current density j N , respectively.The colorbar illustrated on the right indicates the magnitude of γ N .The stability boundary is positioned between the blue and purple regions.Dominant n values are indicated at each unstable point.R 2 values of the plotted growth rates are displayed to show the prediction scores of M-NN.

Figure 10 .
Figure 10.Selected equilibria are denoted by stars in the j − α space in (a).The corresponding current density profiles for each equilibrium are presented in (b), while (c) displays the normalized pressure gradient profiles.The results of the n-scan are presented in (d).Solid circles represent the growth rates calculated by M-1, while dashed squares indicate the predictions made by M-NN.Shades indicate the ambiguity of the M-NN predictions.Each profile can be matched to the stars with the same color in (a).

Figure 11 .
Figure 11.Estimated densities (in arbitrary units) of the absolute errors for predicting the growth rates.The black solid curve represents the error density for the entire test dataset, while the dashed curves indicate the error densities within each domain as labeled in the legend.γ 1 N and γ NN N represent the growth rates calculated by M-1 and predicted by M-NN, respectively.

Algorithm 1 .
Pseudo-code for the dataset collection.1:temporary input dataset, input dataset, output dataset initialized 2: criterion ← set by certain q 0 and qa N exists/not exist ← call(MISHKA-1; equilibrium, j th n, x 0 = 0.3) 24:add equilibrium and n to input dataset

Table 1 .
The parameter list of M-NN.The dimension of each input parameter is presented in the final column.

Table 2 .
Size of the original, training, validation, and test output datasets.