Neural network identification of the weakly coherent mode in I-mode discharge on EAST

The improved energy confinement mode (I-mode) is widely considered as an important operation regime for ITER. I-mode implementation depends on the specified basic plasma parameters and certain operation conditions, which are discovered by statistical plasma characteristics from a large number of I-mode discharges on a tokamak. The extraction process of I-mode plasma characteristics is complicated, time-consuming, and limited to the sampling rate of the measured signals. Experimental observation of the I-mode is accompanied by the appearance of a weakly coherent mode (WCM). However, it takes much time to accurately scan and quantify WCM characteristics when analyzing many I-mode discharges. Recently, a neural network identification method was developed as an I-mode detector to traverse a whole database as a replacement for manual identification. Two fully connected neural network models were trained with the spectrum of propagation velocity of density perturbation from Doppler backward scattering and the electron density measured by a polarimeter-interferometer system with the experimental advanced superconducting tokamak I-mode database. An accuracy of 98.30% in identifying WCMs in I-mode discharges is achieved with the WCM classification model. In addition, the regime classification model was also utilized to successfully distinguish between the low confinement mode (L-mode), I-mode, and high confinement mode (H-mode) with 96.03% accuracy. Finally, ablation experiments were performed on the regime classifiers, showing that there is potential for further performance improvement with future use of RNN model.


Introduction
The improved energy confinement mode (I-mode) was first observed in ASDEX Upgrade tokamaks in the late 1990s [1] and then observed and investigated on Alcator C-mod [2] and DIII-D [3].It has been found that the I-mode has a high energy confinement performance that is similar to that of the H-mode and a lower particle confinement that is similar to that of L-mode plasma, which avoids impurity accumulation and erosion of the plasma facing components of fusion devices.Moreover, I-mode plasma with no edge localized mode [4] is naturally suitable for long-pulse steady-state operation in a high magnetic field machine since it is not sensitive to the heating power range.
The operation of the I-mode was confirmed with the experimental advanced superconducting tokamak (EAST) database in 2019 [5] and has been widely investigated both experimentally and in simulations [6,7].Experimental observations indicate that the I-mode is often detected after and when a weakly coherent mode (WCM) appears [2,5,[7][8][9][10][11].When transitioning from the I-mode phase to the H-mode, the broadband fluctuations a sharply decrease, and the WCM promptly disappears.
This suggests that the I-mode can be identified by the appearance of the WCM when analyzing plasma behaviors.
The WCM is considered an electromagnetic mode and exists in the pedestal region near the scrape off layer of I-mode plasma.In the EAST experiment, the WCM can be observed in the 25-150 kHz range in the spectrum of the propagation velocity of the phase of the density perturbation along the perpendicular magnetic field direction [5,12].Specifically, the relative frequency broadening of the time-averaged spectrum is much larger for the WCM than for the edge coherent mode in the pedestal region.The traditional method for identifying the WCM depends on short-time Fourier transform magnitude spectra by analyzing the magnetic signals or density fluctuations with frequencies up to several kilohertz.While the 10 MHz of Doppler backward scattering (DBS) brings higher resolution to the spectrum, the time to analyze and process each time slice manually is also significantly increased.
Hence, traditional identification is generally achieved through observations with the naked eye, which is not only time-consuming and laborious but also susceptible to subjective factors.In addition, due to the uncertainty of the WCM peak amplitude and its corresponding frequency in the spectrum, it is difficult to identify the WCM by setting a frequency threshold value.
With over 100 000 pulses in the EAST database, it is impossible to visually check each time slice of each discharge spectrum one by one.By scanning early discharges, it is possible to find unlabeled WCM and I-mode in other parameter spaces that may contain new physics for use in future studies.Efforts to use machine learning to process the complex physics in tokamaks have been ongoing and have yielded many results.Many machine learning algorithms, such as random forests and neural networks based on experimental data, are widely used for disruption prediction in different tokamaks, such as JET [13], ASDEX Upgrade [14] and DIII-D [15].Recent attempts to use reinforcement learning for control have also been implemented successively on TCV [16] and KSTAR [17].Various machine learning algorithms have been used in recent years for regime identification on TCV [18,19], COMPASS [20] and the Alcator C-mod [21].These works focus on classifying regimes based on one-dimensional time series of diagnostic signals, and in this paper the timefrequency spectrum will be used as input for regime classification.Consequently, a neural network method is proposed to replace the conventional method to accomplish the identification of WCM in I-mode plasma.For this purpose, two fully connected neural network models, named the WCM classifier and regime classifier, are trained to identify plasma in tokamaks.The WCM classifier is used to determine whether a WCM exists in a time slice, while the regime classifier determines the confinement regime (L-, I-, and H-modes) in it.
The results show that the WCM classifier can achieve 98.30% identification accuracy with a test set size of 2703 samples.By reasonably constructing the feature vectors, the regime classifier achieves 96.03% accuracy on test set.Finally, by performing ablation experiments, it is indicated that regime classifier can be improved in the future by choosing suitable RNN models.
This paper is organized as follows.The basic architecture and key parameters of neural networks are introduced briefly in section 2.1.The method of splitting the dataset and the preprocessing of the data in this paper will be described in detail in sections 2.2-2.4.The training and analysis of the above two models are discussed in sections 3.1 and 3.2, respectively.Finally, a summary is presented in section 4.

Neural network algorithm
A fully connected neural network is composed of an input layer, an output layer, and several hidden layers in the middle.A simple neural network with 16 × 12 × 10 × 1 nodes is shown in figure 1.A feature vector is input into the left input layer, computed in the hidden layers and finally output by the output layer as the classification result.In this paper, the feature vectors will consist mainly of diagnostic data.The sum of the weighted outputs of the preceding layer is used as input to the next layer after the activation function.The common activation functions are the sigmoid function [22,23] (1), rectified linear unit (ReLU) [24,25] (2) and softmax function [26,27] (3), which are defined as (1) where x and y are the scalar inputs and outputs of the activation function, respectively, while z and f are the vector inputs and outputs, respectively.

Data split
In machine learning, data needs to be divided into training data and test data, and special attention needs to be paid to the construction of the test set that will be used as the final evaluation of the model's performance.It is crucial to ensure sufficient independence between the test set and the training set, so as to prevent information leakage from causing the model performance to be estimated too optimistically.For the above considerations, the data from December 2016-July 2017 was selected as the training set, while the data from July 2018 will be used as the test set.The discharges from different campaigns are obtained under different wall conditions and diagnostic device states, so the generalization ability of the model and its robustness to different experimental conditions can be ensured.

Database for the WCM classifier
In this section the characteristics of the experimentally observed WCM accompanying the appearance of the I-mode will first be presented and the dataset will be visualized.The results of the data visualization provide reference for subsequent data preprocessing, followed by a specific description of the data preprocessing methods, including data augmentation and data dimensionality reduction.

Background and data visualization.
Figure 2 shows the transition from the L-mode phase to the I-mode phase at t = 2.2 s, and the lower frequency mode is the edge temperature ring oscillation [7,10].It is shown that the broadband frequency mode is located at 40-80 kHz, which indicates the appearance of the WCM in I-mode plasma , while the density ne is almost unchanged.However, W MHD increased by 35% until t = 3.35 s when the I-H transition into the H-mode occurs.
During this time, the density ne and W MHD maintain stable levels.After entering the H-mode phase, the WCM disappears, and the ne and W MHD electron densities increase significantly.
Since the three phases have completely different characteristics, it is theoretically possible to achieve the identification of the L-, I-, and H-modes by training the neural network with relatively sufficient data.Figure 3 shows the power spectrum of the u ⊥ perturbation for time slices during t = 2.9 s to t = 3.0 s from the same channel as that used in figure 2 panel (c), where u ⊥ refers to the propagation velocity of the phase of the density perturbation along the perpendicular magnetic field direction.There is a significant bulge in the 40-80 kHz section (the lower frequencies are not critical for identifying the Imode and therefore are not plotted in the figure).The appearance of the WCM is determined by at least one of the DBS signals (corresponding to 67.5 GHz, 70 GHz, 72.5 GHz and 75 GHz) on channels 5 to 8 that exhibit WCM characteristics.For this reason, 181 amplitude data points with frequencies ranging from 20 to 200 kHz (with an interval of 1 kHz) are processed and fed into the neural network as features for WCM identification.
Visualizing the dataset before starting the training process gives an understanding of the distribution of the data, which can provide a reference for data preprocessing.Figure 4 represents the average amplitude of each discharge in the dataset, defined as where f 1 , f 2 are taken as 10 (kHz) and 200 (kHz), respectively (the amplitudes of modes <10 kHz are too large and the WCM does not occur in such a low range so they are omitted), and t 1 and t 2 are the starting and ending points for each shot.It can be noticed that the average amplitude of the test set is significantly lower than that of the training data, i.e. the distributions of the two are not quite the same, which is mainly due to the fact that the I-mode discharges are only a few tens of shots.After using the data augmentation method mentioned in section 2.3.2, the range of the training data was significantly expanded to include the range of the test set.
In addition to this, visualizing the features of the two labels of a binary classification problem can be beneficial in understanding the difference in the distribution of the features.For this purpose we define the intensity of the WCM relative to the background in the way as shown in figure 5, dividing the WCM from the background by 60% of the peaks within 25 kHz-184 kHz and characterizing the relative intensity by the ratio of the areas.Figure 6 shows that the distribution range of the two labels is the same, except that the overall intensity of the WCM is higher.This indicates that there are not only significant WCMs in the dataset, but also weak WCMs, in addition to samples with high background intensity.The robustness of the model trained together using these samples with different characteristics can be guaranteed.

Data augmentation: Gaussian noise.
A sufficient amount of data ensures various features in the training process, which improves the generalizability of the neural network, i.e. the trained neural network makes correct judgments for unseen data.Data augmentation increases the amount of data by generating copies of existing data with slight modifications or creating new synthetic data from existing data [28].It can reduce overfitting [29,30] when the amount of data is insufficient.In image recognition, shifting, cropping, and mirroring images are simple and effective data augmentation methods [31].Adding Gaussian noise to the original samples is also a common method.As can be seen in figure 4, the average amplitude of the test set is significantly lower than most of the training data, which may be due to the different campaign device states (wall condition, diagnostic device state, etc).As can be seen in figure 3, recognizing WCM requires focusing on whether a broadband bulge can be seen in the spectrum, rather than the amplitude itself.Therefore it is possible to add Gaussian noise N 1 ∼ N(0, σ 1 ) to the spectral data of each sample to make its amplitude vary sufficiently without destroying the shape of its waveform, and in this paper each sample gets an additional 10 new samples by adding Gaussian noise.But perhaps this leads to potential problems, as the amount of data increases substantially, so the fine structure of the waveform may be extracted as features by the model.However WCM is a broadband perturbation and the fine structure of the waveform should not be a concern for classification.Therefore another Gaussian noise N 2 ∼ N(0, σ 2 ) is added to each feature, where σ 2 < σ 1 so that the fine structure of each sample is not exactly the same without destroying the overall shape.

Data augmentation: temporal window movement.
The specific implementation used in this section is as follows: a window of 10 ms in length is sampled over a stable current interval, and each window moves backward by 3 ms., i.e. if the first sample is averaged over 0-10 ms, then the second sample is averaged over 3-13 ms, which means that the latter sample will overlap with the former by 7 ms.DBS data with a sampling rate of 10 MHz can be amplified by data augmentation to obtain a large number of samples.It is noted that the spectrograms in the transition phase are usually difficult to identify, more aggressive data augmentation is used in the transition phase to improve the identification accuracy.In this paper, the transition phase is defined as the 50 ms at the beginning and end of each regime.The transition phase window is moved backward by only 1 ms each time compared to the nontransition phase.After data augmentation, 144 308 samples corresponding to 10 ms time slices are obtained from the 111 discharges in the training set, and the percentages of the labels are shown in table 1, where the two labels are balanced.

Dimensionality reduction.
For a machine learning algorithm, having more data features does not necessarily lead to higher accuracy results.This is because excessive information, for example, WCM amplitude features, results in noise  formation and interferes with efficient training, which may lead to a slow training speed or even cause neural network overfitting and poor generalizability.The performance of the trained neural network model will be significantly improved if the data can be preprocessed empirically before training and as few data features as possible are used while retaining key information.The common dimensionality reduction methods are principal component analysis and linear discriminant analysis.There is no standard process for dimensionality reduction, and different dimensionality reduction methods (or even a combination of them) need to be used in a targeted manner for the data based on the characteristics of the input and the type of problem.For WCM identification, the problem is essentially identifying whether there is a distinct bulge as a feature between 20 and 150 kHz.The key to identifying the WCM is the contour of the spectrum rather than the fine structure.Therefore, instead of feeding the amplitude corresponding to each frequency point (interval of 1 kHz) as a feature vector into the neural network, the spectrum can be outlined with only a small number of data points.Thus, two data points can be skipped after each point selected in figure 3. It is impossible to know in advance whether a dimensionality reduction method is appropriate before training.As will be demonstrated in section 3.1, the above mentioned dimensionality reduction method for DBS signals is appropriate for this binary classification problem.

Dataset overview and WCM classifier design.
According to the previous description, it is necessary to observe whether spectrograms of DBS signals from the 5th to 8th channels have WCM characteristics in a specific time period.After applying the above dimensionality reduction method to the amplitude data in the spectrograms of the four channels, four feature sequences containing 61 amplitude features are obtained.The feature vector of one sample obtained after a connection is a sequence with a length of 244 (4 × 61).Since WCM identification is a binary classification problem, it is feasible to use 1 to denote the labels of samples with WCMs and 0 to denote samples without WCMs.At this point, the training data are prepared.
The 111 discharges in the training set without data augmentation yielded 69 200 time slices, which were augmented as described in section 2.3.3 to yield 144 308 samples, followed by 1587 388 (144 308 × 11) samples after augmentation  The architecture of the WCM classifier contains a 244-node input layer and a 1-node output layer, corresponding to 244 input features and a binary classification result, respectively.This neural network contains two hidden layers with 200 and 8 nodes respectively, and the activation function is ReLU-ReLU-sigmoid.

Database for the regime classifier 2.4.1. Background and data visualization.
Using only the WCM signal is not enough to distinguish different transition types, for example, the density change after the disappearance of WCM will play an important role in determining whether it is an L-H transition or an L-I transition.The WCM changes and density changes corresponding to the different transition types are listed in table 2, indicating that the changes are unique for the six transitions, so by adding both the information from the DBS diagnostic and the density from polarimeter-interferometer (POINT), it is theoretically possible to distinguish between the six transitions.The statistics of the density distributions of the three regimes are shown in figure 7. It can be seen that the densities of the L and I mode are similar, while the density of the H mode is significantly higher than the remaining two in general.However, it should be noted that a small number of H-mode densities overlap with the densities of L-mode and I-mode, and this part of the sample mainly corresponds to the H-mode in the density climbing stage, so it still needs to be judged by a model trained with a large amount of data, rather than simply setting a series of thresholds.

Feature vector.
Each sample corresponds to a slice of length 10 ms, so the smallest unit of regime identification is 10 ms.The composition of the feature vector is represented by figure 8.According to the description in section 2.4.1, the samples used for regime identification should contain both WCM information and density change information.The density information from the POINT [32] will be used to compose the feature vector.
The method to represent the change in density is to add the density information of 50 ms before and after the target sample (10 ms) for a total of 110 ms of density timings into the feature vector.For example, to identify the regime between 3.0 s-3.01 s, the density time series between 2.95 s-3.06 s will be  Since the regime identification of the target sample needs to be compared with the previous samples, it is necessary to add the regime of the previous one samples (part a in figure 8) and the WCM labels of the previous three samples (part c).The three WCM labels are added to increase redundancy and reduce the effect of identification mistakes by the WCM classifier, which are indicated by a 0 or 1 in the output of the WCM classifier.

Dataset overview and regime classifier design.
The discharges described in section 2.2 are still used as training samples, the feature vectors and labels need to be redefined because the classification problem has changed from a binary classification problem to a multiclass classification problem.As the problem is defined as a triple classification problem, i.e. identifying the L-,I-, and H-modes, one-hot coding, which is a widely used coding method in multiclass classification problems, is introduced.Labels encoded by 3 bits are defined as shown in table 3.
Based on the above analysis, the feature vector composition is shown in figure 8.The 62 cells in the figure correspond to the 62 nodes in the input layer.The three nodes in the output layer correspond to the 3-bit encoding defined by one-hot encoding.Two hidden layers with 20 and 10 nodes respectively were set up manually and ReLU-ReLU-softmax was used as the activation function.

WCM classifier
There is a trade-off between TPR and TNR(1-FPR).The ROC curve shown in figure 9 represents the relationship between TPR and FPR, and by reasonable threshold setting, the final model obtains an accuracy of 98.30% on the test set.The corresponding confusion matrix is shown in figure 10.
Testing the performance of the spectrum at different resolutions was implemented and the results are listed in figure 11, which helps to find the input information for optimal classification.Neither overfitting due to excessive  noise information nor feature extraction failure caused by losing too much information.The results show that the features selected in this paper better characterizes the WCM, which means that the bulge in the spectrum as shown in figure 2 is simply and accurately outlined.

Regime classifier
The confusion matrix in figure 12 shows that the model can accurately identify the difference between the L-, I-, and Hmodes.Among the 1621 I-mode samples in the test set, only 19 samples are incorrectly identified, and this error rate is much lower than that of L-mode and H-mode.This indicates that the use of WCM as an important feature of I-mode can be of significant help in the identification of I-mode.Moreover, more samples were misidentified as L-mode than H-mode, which is consistent with the experimental observation that the density of I-mode in L-mode is almost the same.And the significantly higher density when the WCM disappears can be an important sign of I-H transition.This further indicates that the feature vector construction method as shown in figure 8 contains almost all the necessary information to characterize the 3 regimes.The feature vector contains information from four parts, and studying the contribution of each part's information to the performance is beneficial for subsequent work to improve the model's performance.Therefore ablation experiments for the four components were implemented and the results are listed in table 4. As can be seen in the table, ablating away the regime of the previous sample has the greatest impact on the model performance.This indicates that the model is very dependent on the previous information, and in the future we will apply Bi-LSTM or transformer model to better utilize the previous information for identification.However, training the memory and forgetting gates in LSTM networks requires more data for the model to determine how long information before and after the transition occurs should be focused on, i.e. memorize the key information and forget the noise.Special attention needs to be paid to the ablation results for the density information corresponding to part (b), the 90% accuracy may indicate that the density changes are not important.However the classification using only WCM information does distinguish most of the I-modes (with WCM).The remaining non-WCM samples (900 L-mode and 175 H-mode samples) can lead to an overestimation of the model's ability to classify L-mode due to imbalance.Because all discharges containing I-mode in the dataset were obtained with unfavorable configuration, there are fewer H-mode samples.In fact density information is very important to distinguish L-mode from H-mode.In summary, obtaining more samples containing I-mode can both improve the generalization ability of LSTM and make the samples of the 3 modes more balanced after carefully constructing the dataset.The next focus is to find more discharges containing I-mode with the initially trained model after an scan of the database.
The two models are coupled to identify the #77427 and #77572 shots in the test set, and their timing diagrams are shown in figure 14  In figure 14(a), the WCM appears in the spectrum at t = 4.2 s, at which time the W MHD also starts to rise, and the model identifies that the L-I transition occurs at this point and enters the I mode until the WCM disappears at t = 7.3 s.In panel (a5), it can be seen that the WCM intensity changes when t = 6.8 s-6.9 s, at which time the W MHD also fluctuates simultaneously, and it can be observed that the WCM classifier accurately identifies the WCM with different intensities in panel (a2).The generalization ability of the WCM classifer is derived from the different distributions of WCMs in the training set.
Figure 14(b) represents discharges that contain three regimes and in which the misidentified samples are representative.As can be seen from panel (b5), the WCM features are very weak and the model still accurately identifies the WCM when t = 2.5 s-4 s.This is mainly due to the inclusion of various intensities of WCM in the training set.However, when the I-H transition is about to occur, several consecutive samples are incorrectly identified as non-WCM, causing the model to predict these samples as L-mode.As can be seen in panel (b2), although occasional samples are identified as non-WCM, the model still considers it to be I-mode.Based on the above two points, it can be inferred that the previous order WCM redundancy that we added in the feature vector might be helpful for the model's robustness.It is not until t = 4 s that D α begins to drop suddenly the discharge begins to enter the H-mode and is accompanied by a significant increase in W MHD .It should be noted that at t = 7.55 s, the model incorrectly predicts that the discharge enters the H-mode, however, it should actually be the L-mode, and the analysis suggests that there is a small increase in density at this point, which may be recognized by the model as an important indication of the occurrence of an I-H transition.However, the density did not continue to increase after this point, suggesting that it may not be sufficient to consider only the density change within 50 ms.In future work, the use of RNNs will have great potential to improve the model's field of view and thus obtain a more global judgment.

Summary
The I-mode has been widely studied because of its many good properties and will play an increasingly important role as the magnetic field rises.However, there are only a few hundred pulses with I-mode phases of more than 100 000 pulses on EAST.The I-mode on EAST with different parameter spaces will be of great help for subsequent studies of new physics.However, it is difficult to precisely quantify WCM characteristics in DBS spectra, which are important indicators of the appearance of I-modes, and a more advanced identification method that can traverse the entire database to replace manual identification needs to be developed.For this purpose, a fully connected neural network method is introduced, and two models are trained with processed DBS data and POINT density data to identify WCM and the L-, I-, and H-modes with high accuracy.
Compared with the original one-dimensional time series, the Fourier transform of diagnostic signals can obtain richer information with greater potential.However, it also presents a higher challenge for data preprocessing.Overlapping computation of time slices and addition of Gaussian noise as data augmentation improves the generalization ability of the model.And experiments with downsampling of the spectrum were performed, which showed that a sampling rate of 3 kHz most efficiently characterized the WCM.The data preprocessing method applied to the spectrum in this paper could provide a reference for other diagnostic signals that will be fed into the machine learning model.
Finally, the two models were coupled and applied to two discharges in the test set (#77427, #77572) in order to demonstrate the model's identification of the L-, I-, and H-mode and to perform the typical error analysis.Combined with the results of the ablation experiments, it was found that the identification of time slices is affected by before and after information.In future work it will be studied how to better utilize these information without being interfered by it.
In a follow-up study, the trained models will be applied to scan the EAST database for unlabeled I-mode discharges, and in the future, neural network models will be trained to identify the L-, I-and H-modes as well as additional modes using only conventional diagnostic information such as current, stored energy, and other data.

Figure 1 .
Figure 1.The basic architecture of a neural network contains an input layer and an output layer, as well as two hidden layers. y

Figure 2 .
Figure 2. Typical I-mode discharge on EAST.Time evolution of the (a) line-average density ne, (b) stored energy W MHD , and (c) u ⊥ perturbation.The dashed lines represent the transition phases: the L-, I-, and H-modes.

Figure 3 .
Figure 3. Power spectrum for t = 2.9 s-3.0 s in shot #75514, blue circles indicate features of the input after dimensionality reduction.

Figure 4 .
Figure 4. Discharge data from different campaigns.

Figure 5 .
Figure 5. Dividing the background of the power spectrum with the 60% height of the peak.

Figure 6 .
Figure 6.Distribution of the intensity of the sample peaks of the dataset relative to the background in the 4-channel DBS signal.

Figure 8 .
Figure 8.The composition of the feature vector input into the neural network; each cell corresponds to a scalar.(a) The one-hot encoding corresponding to the mode of the previous sample.(b) The normalized 10 ms density of the target sample and the preceding and subsequent 50 ms densities.(c) WCM labels of the three samples preceding the target sample (denoted by 0 or 1).(d) The WCM label of the target sample.

Figure 9 .
Figure 9. ROC curve of WCM classifier on test set.

Figure 10 .
Figure 10.Confusion matrix of WCM classifier on test set.

Figure 11 .
Figure 11.Accuracy on test sets using different combinations of downsampling rates and data augmentation methods.

Figure 12 .
Figure 12.Confusion matrix of regime classifier on test set.

Table 4 .
Ablation of the different parts and their accuracy on the test set

Figure 13 .
Figure 13.The types of diagnostic data input and output results of the 2 models.(Part (a) in figure 8 is not drawn in this figure).
as a demonstration.The timing of different types of diagnostic data input to the 2 models and the results of the model output are shown in figure 13.The output of the WCM classifier is used as part of the regime classifier input.Modular classifiers form the final identification model, and this structure is more convenient in terms of debugging and error checking, which are rare in relatively black box neural networks.It also reserves the possibility to combine other modules for more complex identifications in future work.

Figure 14 .
Figure 14.Neural network identification results for 2 discharge in test set.From top to bottom are the evolutions of the line-average density ne from POINT, WCM identification results (with color fills to indicate the existence of a WCM), stored energy W MHD , Hα recycling and u ⊥ perturbation from DBS.

Table 1 .
Percentage of WCM and non-WCM samples in the 144 308 samples augmented from 111 discharges.

Table 2 .
WCM and density changes corresponding to different transitions.

Table 3 .
The percentages of the three classes in the database and the corresponding labels in the form of one-hot encoding.