Cross-tokamak disruption prediction based on domain adaptation

The high acquisition cost and the significant demand for disruptive discharges for data-driven disruption prediction models in future tokamaks pose an inherent contradiction in disruption prediction research. In this paper, we demonstrated a novel approach to predict disruption in a future tokamak using only a few discharges based on domain adaptation (DA). The approach aims to predict disruption by finding a feature space that is universal to all tokamaks. The first step is to use the existing understanding of physics to extract physics-guided features from the diagnostic signals of each tokamak, called physics-guided feature extraction (PGFE). The second step is to align a few data from the future tokamak (target domain) and a large amount of data from existing tokamaks (source domain) based on a DA algorithm called CORrelation ALignment (CORAL). It is the first attempt at applying DA in the cross-tokamak disruption prediction task. PGFE has been successfully applied in J-TEXT to predict disruption with excellent performance. PGFE can also reduce the data volume requirements due to extracting the less device-specific features, thereby establishing a solid foundation for cross-tokamak disruption prediction. We have further improved CORAL called supervised CORAL (S-CORAL) to enhance its appropriateness in feature alignment for the disruption prediction task. To simulate the existing and future tokamak case, we selected J-TEXT as the existing tokamak and EAST as the future tokamak, which has a large gap in the ranges of plasma parameters. The utilization of the S-CORAL improves the disruption prediction performance on future tokamak. Through interpretable analysis, we discovered that the learned knowledge of the disruption prediction model through this approach exhibits more similarities to the model trained on large data volumes of future tokamak. This approach provides a light, interpretable and few data-required ways by aligning features to predict disruption using small data volume from the future tokamak.

Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence.Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

Introduction
In future tokamaks such as ITER [1], DEMO [2] and SPARC [3], disruption is considered a catastrophic event that requires reliable avoidance or mitigation [4][5][6].Data-driven disruption prediction, benefiting from decades of data accumulation during the operation of tokamaks, is a highly feasible approach for disruption prediction.Numerous data-driven disruption predictors have been developed on JET [7][8][9][10][11][12], ASDEX-U [13], DIII-D [14,15], C-Mod [14,16], JT-60U [17], HL-2A [18,19], EAST [20][21][22], and J-TEXT [23][24][25] with high accuracy on their own tokamaks.However, the high-performance operation of future tokamaks imposes a significant cost for unmitigated disruption, making it impractical to achieve large data for training such models.The large gap in device size and operation regime between future and existing tokamaks also renders using the predictors trained on existing tokamaks directly on future tokamaks less reliable.To date, there have also been many efforts and achievements in attempting to address this issue.Adaptive learning by building a predictor from scratch in JET and ASDEX-U has also been considered to address the challenge of disruption prediction in newly deployed tokamaks, and it has yielded promising results [26][27][28][29].Deep learning-based disruption predictors have achieved favourable results in cross-tokamak disruption prediction by mixing data from two [30] or three [31] different tokamaks.When making predictions across parameter regimes, the 'Scenario adaptive' approach has successfully utilized high-parameter data from existing tokamaks mixed with low-parameter data from the target tokamak to predict high-parameter data of the target tokamak [32].
Transfer learning [33] is a strong candidate for training cross-tokamak disruption predictors using limited target tokamak data.The current approach primarily involves training by mixing data to predict disruption in a new tokamak.However, data mixing is a fundamental method in transfer learning.There exist advanced methods within transfer learning that facilitate improved predictions of cross-tokamak disruption.Domain adaptation (DA) [34] is applicable for addressing the problem where the source and target domains have the same features and categories, but different feature distributions.DA has been widely applied in many areas of machine learning research, such as computer vision (CV) [35], natural language processing (NLP) [36], and brain-computer interface (BCI) [37].However, it is rarely mentioned in the field of magnetic confinement fusion, especially in disruption prediction.Recently, disruption prediction on EAST using the DA algorithm called maximum mean discrepancy (MMD) has significantly improved the model's performance under different wall conditions [38].Cross-tokamak disruption prediction is also a typical application scenario of DA under the assumption that the mechanism of disruption is the same in all tokamaks.
DA algorithm can be helpful in exploring a new cross-tokamak disruption prediction approach for future tokamaks.
Our team has developed a deep model for cross-tokamak disruption prediction with the application of freeze and finetune technique [39].However, there is still room for improvement in the performance of cross-tokamak disruption prediction.Deep learning-based predictors are supposed to require more and diverse data for the pre-trained model to ensure generalization.To train a generalized pre-trained model for disruption prediction, relying solely on data from a single tokamak such as J-TEXT is not sufficient.Deep learningbased predictors are also naturally difficult to understand due to the complexity inherent in deep learning.Compared to deep learning-based disruption predictors, decision tree-based disruption predictors are more interpretable, required fewer data and consume fewer computational resources.However, decision tree-based models rely more on expert experience and knowledge for selecting and processing input features.The detailed analysis of input features is also a crucial factor for achieving impressive performance on the Classification And Regression Trees (CART) based adaptive predictors [28].Extracting features through expert knowledge can partially align the data distribution among tokamaks.An interpretable disruption predictor based on physics-guided feature extraction (IDP-PGFE) [24] has achieved excellent accuracy on J-TEXT by employing physics-guided feature engineering on the raw diagnostic data.Although IDP-PGFE has the ability to train with limited data, it performs poorly on smaller datasets and is still difficult to directly apply to future tokamaks.Therefore, we need to explore a cross-tokamak approach to reduce the requirement of disruption predictor on target tokamak data.IDP-PGFE has the following two advantages in cross-tokamak disruption prediction: (1) The required input features for IDP-PGFE are diagnostic-independent and possess physical information.Physics-guided feature extraction (PGFE) processes the raw diagnostic signals from different tokamaks into a unified format, which to some extent aligns certain feature information across tokamaks.Due to PGFE does not rely on training a feature extractor, it no longer requires data to train features when performing feature extraction on the target tokamak.(2) IDP-PGFE has a certain level of interpretability, which can help researchers understand what the model has learned.This can provide researchers with greater confidence in applying the disruption predictor to future tokamaks and may guide them in gaining insights into potential improvements.These natural advantages enable it to have inherent strengths in cross-tokamak disruption prediction tasks.
However, when there are significant differences in size, operational regimes, and even configuration among tokamaks.These issues will be encountered when transferring existing tokamak disruption predictors to future tokamaks.In this work we selected J-TEXT as the existing tokamak and EAST as the future tokamak.J-TEXT is a medium-sized circular section tokamak with a full-carbon wall.The standard J-TEXT discharge can only last 0.7-0.8s.All the discharges are ohmic discharge.In contrast, EAST is a largersized elliptical section tokamak with a metal wall.The standard EAST discharge can last 7-8 s.The long-pulse discharges can last even tens of seconds.There are also H-mode discharges in EAST.Therefore, compare to the existing tokamak like J-TEXT, EAST could be treated as a future tokamak.Although PGFE can reduce some of the data distribution differences between tokamaks, difference in tokamaks might result in distinct decision boundaries.Consequently, even when applying PGFE to cross-tokamak disruption prediction, it might still be necessary to leverage transfer learning to enhance the effectiveness of cross-tokamak prediction.CORrelation ALignment (CORAL) [40] is a simple, widelyused, and efficient DA method.CORAL minimizes domain shift by aligning the second-order statistics of source and target distributions, without training or adjustment of any hyperparameters.CORAL aligns the source and target domains in a 'frustratingly easy' way, which is lighter and more interpretable.Therefore, in this work, we adopt CORAL as the DA method for cross-tokamak disruption prediction approach.
In this paper, we present a novel approach for crosstokamak disruption prediction based on DA.It represents the first attempt to apply DA techniques to the task of crosstokamak disruption prediction.To simulate scenarios where significant differences may exist between future and existing tokamaks, J-TEXT is considered as an existing tokamak, while EAST is regarded as a future tokamak.The following section will provide an overview of the cross-tokamak approach, which involves the PGFE application on both J-TEXT and EAST and the adaptation of CORAL to be more suitable for disruption prediction tasks.Section 3 introduced the dataset used in this work, encompassing J-TEXT and EAST.The cross-tokamak result on EAST will be introduced in section 4, which shows the improvement of disruption prediction performance by applying CORAL.In section 5, we investigate the reasons behind the good performance of S-CORAL in cross-tokamak disruption prediction.The summary and future plan of DA in disruption prediction research and prospects for cross-tokamak disruption prediction is in section 6.

The structure of the cross-tokamak disruption prediction
This section will describe the structure of the cross-tokamak disruption prediction based on CORAL, which consists of four components, feature extractor, DA module, disruption classifier and explainer.Compared to IDP-PGFE, there is an additional DA module used for aligning features.As shown in figure 1, The first step is pre-processing the raw signal using PGFE to diagnostic-independent and disruptive-related features.Then, CORAL is applied to align the features, mapping the knowledge from the target domain to the source domain.Next, a decision tree-based model called Dropouts meet multiple Additive Regression Trees (DART) [41] will be trained on the mapped data.The trained disruption predictor can be directly used in the target domain.Finally, the trained model can be used for interpretability analysis using SHapley Additive exPlanations (SHAP) [42].
DART and SHAP have been extensively discussed in our previous work.The detailed algorithm of PGFE has also been introduced in prior research.This paper will primarily focus on the application of PGFE on EAST and highlight some differences.

PGFE application on both J-TEXT and EAST
The diagnostic systems are designed for the engineering and physics requirements of each tokamak.Therefore, it is nearly impossible to directly use diagnostic signals for cross-tokamak disruption prediction.In previous studies on cross-tokamak disruption prediction, different data pre-processing methods [28,30,31] have been employed to standardize the diagnostic data from different tokamaks into a consistent input format.In this work, PGFE as a feature extractor not only extracts disruptive-related features but also standardizes the input format simultaneously.
The disruption classification for EAST has been analysed [21] and found that impurity radiation, density limit, VDE, and MHD instabilities are also the main causes for EAST disruption.Therefore, the physics-guided features still have been extracted based on MHD instabilities, radiation, density related disruption and basic plasma control system (PCS) signals [24], just like IDP-PGFE.Since the diagnostics are different in J-TEXT and EAST, it is necessary to adjust the parameters in the feature extraction algorithm or redesign the algorithm specifically for EAST.Therefore, PGFE is not an algorithm solely designed for a specific tokamak.Instead, it is necessary to design a unique algorithm for each tokamak to ascertain consistent or analogous physics features.The features for J-TEXT and EAST are listed in table 1.The numbers of channels used for feature extraction is followed by the name of diagnostic in the second column.The first number represents the number of channels used in J-TEXT, while the second number represents the number of channels used in EAST.
MHD instabilities related features have been proven to make significant contributions to the disruption prediction task in J-TEXT.However, due to the assumption of circular cross-sections based on J-TEXT, it is difficult to extract mode_number_m (MNM) feature on EAST.In the future research, if Mirnov data from the entire cross-section of EAST are available, new algorithms can then be designed to compute MNM in EAST.Radiation related and density related features are primarily provided by Soft x-ray (SXR) [43,44], Absolute Extreme Ultraviolet (AXUV) [45][46][47] arrays and polarimeter-interferometer arrays (Far-infrared three-wave polarimeter-interferometer, FIR [48] in J-TEXT and POlarimeter-INTerferometer, POINT [49] in EAST).The cross-tokamak disruption prediction from J-TEXT to EAST demonstrates an advantage of PGFE in handling array signals.Even though the number of SXR, AXUV and  polarimeter-interferometer array channels differs between J-TEXT and EAST, PGFE can transform them into the same set of features.Four new features, n = 1 phase, v_loop, B t /I P , I P _diff, are extracted additionally.It is possible that the phase after natural locking may be affected by the inherent differences in the error fields of J-TEXT and EAST.Therefore, n = 1 phase has been considered in this work, which could also be calculated like n = 1 amplitude [50] by fitting four locked mode detectors.Due to the different wall conditions between EAST and J-TEXT, it is difficult to solely assess the impurity situation only using the CIII signal.Therefore, v_loop has been introduced to reflect the overall impurity condition.The feature B t /I P can approximately represent the information of the boundary safety factor, while I P _diff is used to express the variation of the plasma current.As a result, 90 channels of diagnostics in J-TEXT and 65 channels of diagnostics in EAST have been extracted into 25 features.The extraction algorithms and significance of most features have been extensively discussed in this research [24].The sample rate for all the features is 1 kHz, the same as the sampling rate of plasma current in EAST.

DA module: CORAL
The goal of DA is to bridge the gap between the source and target domains by transferring knowledge learned from the source domain to the target domain.This transfer of knowledge enables the model to generalize well and make accurate predictions on the target domain despite the differences.CORAL [40] is an efficient, lighter and more interpretable DA method.The goal of CORAL is to minimize domain shift by aligning the second-order statistics (covariances) of source and target domain data distributions, thereby making the features from different domains more similar in terms of their distributions.CORAL aligns the distributions by recolouring the whitened source features using the covariance of the target distribution.It is a simple and more interpretable method that involves two main computations: (1) calculating covariance statistics in each domain and (2) applying the whitening and re-colouring linear transformation to the source features.Then, the transformed source features can be trained.The brief mathematical derivation and assumptions underlying the implementation of CORAL is followed.Supposing that the source domain data is and − → u i are the D-dimensional input feature representations.A linear transformation A has been applied to the original source features and the Frobenius norm has been used as the matrix distance metric.As a result, the distance between the secondorder statistics (covariance) of the source and target features could be minimized [40].
where C S' is covariance of the transformed source features D S A. and ∥•∥ F 2 denotes the matrix Frobenius norm.The optimal solution of A can be expressed by deduction as [40]: where r = min(r CS , r CT ), r CS and r CT denote the rank of C S and C T , respectively.Since C S and C T are symmetric matrices, conducting singular value decomposition (SVD) on C S and C T gives C S = U S S U T S and C T = U T T U T T , respectively.Σ + is the Moore-Penrose pseudoinverse of Σ [40].
In this work, the data from the source tokamak is mapped onto the data from the target tokamak.We whitened the data from the source tokamak, re-coloured it with the information from the target tokamak.Then, a new model is trained with the transformed source data for the target machine.The target data is fed directly to this model.So, there is no need to do any transformation when predicting on the target tokamak.
CORAL was originally designed as an unsupervised domain adaptation (UDA) algorithm [34]; however, the target data are actually labelled in cross-tokamak disruption prediction.Therefore, we improved CORAL to a supervised domain adaptation (SDA) version and applied it in this paper called supervised CORAL (S-CORAL).The idea for improvement is similar to the approach of designing SDA in machine learning research [37], which is to consider label information when CORAL minimizes the distance between the covariance matrices in different domains.If CORAL were applied directly in our work, both the non-disruptive and disruptive data from the source domain would be whitened together, then re-coloured using both the non-disruptive and disruptive data from the target domain.However, it cannot effectively consider the information from labelled data.This is a waste for disruption prediction tasks with clearly defined positive and negative samples.In contrast, S-CORAL treats nondisruptive and disruptive data from the source domain separately during whitening, then re-colours them using the nondisruptive and disruptive data from the target domain separately.This allows S-CORAL to effectively use information from labelled data by aligning disruptive and non-disruptive samples separately between the target and source domains.To distinguish between CORAL and S-CORAL, and to compare their respective performances, we refer to CORAL as unsupervised CORAL (U-CORAL) in this paper.The flowchart of S-CORAL and U-CORAL is shown in figure 2.

Dataset description
The introduction provides a brief overview of the differences between J-TEXT and EAST.Compared to the 'existing tokamak' J-TEXT, EAST can be treated as a 'future tokamak'.This section will introduce the two tokamaks in detail first, then describe the dataset selected to train, valid and test.
J-TEXT is a medium-sized tokamak with a major radius R = 1.05 m and a minor radius a = 0.25 m [51].The typical discharges on the J-TEXT are characterized by a plasma current (I P ) of approximately 200 kA, a toroidal field (B t ) of around 2.0 T, a pulse length of 700-800 ms, plasma densities (n e ) ranging from 1 to 7 × 10 19 m −3 , and an electron temperature (T e ) of about 1 keV as the limiter configuration [52].The typical resistive time scales in J-TETX is about 25 ms (τ R ≈ 25 ms).EAST is an ITER-like fully super-conducting tokamak with a major radius R = 1.85 m and a minor radius a = 0.45 m [53].The typical discharges on the EAST are characterized by a plasma current (I P ) of approximately 450 kA, a toroidal field (B t ) of around 1.5 T, a pulse length of approximately 10 s, and a β N of around 2.1 as the divertor configuration, The typical resistive time scales in EAST is larger than 500 ms (τ R ⩾ 500 ms) [52].
The dataset and its split are similar to our previous works [24,52].A split of datasets is shown in table 2. 1354 discharges (188 disruptive) are selected as the training set.160 discharges (80 disruptive) are selected as the validation set.220 discharges (110 disruptive) are selected as the test set.As for EAST, the dataset and its split are similar to our previous work [52].The training, validation and test sets are still selected randomly.A total of 1896 discharges (355 disruptive) discharges are selected as the training set and 120 discharges (60 disruptive) are selected as the validation set for the full EAST dataset model [52].110 (10 disruptive) discharges are selected as training set for the cross-tokamak models.360 discharges (180 disruptive) are selected as the test set full EAST dataset model and cross-tokamak models to make a fair comparison.
Data close to the current quench (CQ) time from disruptive discharges are considered as the disruption precursor data and are positive samples for the model.All data at the flat-top of the non-disruptive discharges are negative samples for the model.The disruption precursor data can be determined by a time threshold before CQ time.The time threshold can be selected manually [54] by the statistical analysis either equal for each discharge [14] or individually for each discharge [13,55].Our previous works applied an automatic approach to select the time threshold by finding the best performance of the model.Thorough scanning the time threshold from 5 ms to 50 ms in J-TEXT and 5 ms to 500 ms in EAST, the time threshold of J-TEXT is selected as 25 ms and the time threshold of EAST is selected as 125 ms before CQ time.The negative samples (non-disruptive samples) are much more than the positive samples (disruptive samples).Most of the negative samples belong to stable plasma discharges and are therefore more similar, especially in the long-pulsed EAST nondisruptive discharges.Therefore, to balance the dataset, we randomly dropped a portion of negative samples and increased the weights of disruptive samples.

The training of the disruption prediction models
In this paper, five models have been trained.The first two models are self-tokamak disruption prediction models, distinct from the cross-tokamak disruption prediction model.The J-TEXT model demonstrates performance in the source domain, serving as the base for the cross-tokamak models.The EAST model, trained using the full data training set of EAST as shown in table 2, representing the peak of performance.These two models adopt the structure of IDP-PGFE [24] instead of the structure in figure 1.The third model is the mixing data model, which mixed the training set of J-TEXT and crosstokamak training set of EAST in table 2. The mixing data model also used the structure of IDP-PGFE.The last two models are CORAL models, which used the structure in figure 1.After applying PGFE to the entire dataset, the cross-tokamak training set from EAST is aligned with the J-TEXT training set in table 2 by CORAL.For validation, we exclusively use the J-TEXT validation set.Once training is completed, we will use the EAST test set to make predictions.Finally, a thorough analysis using SHAP is conducted to find out what have the models learned and where could be improved for the crosstokamak disruption prediction.

Predictive performances of the models
This section shows the predictive performances of various models.The self-tokamak models of J-TEXT and EAST will be first shown as the benchmark model for the cross-tokamak in section 4.1.Then section 4.2 will compare the mixing data model, unsupervised model and supervised model for crosstokamak disruption.The hyperparameter search determines the hyperparameters of each best performance model in this section.
Disruption prediction is a binary classification task, where the performance is often evaluated using a confusion matrix.However, in the context of disruption, True Positive (TP) and False Negative (FN) have specific way to be defined.For J-TEXT, any predicted disruption with a warning time of more than 10 ms is considered TP (less than 10 ms is considered FN) [24].For EAST, any predicted disruption with a warning time of more than 30 ms is considered TP (less than 30 ms is considered FN) [52].The normalization process for the two tokamaks is performed independently with the z-score method.It is worth noting that, in the normalization process of crosstokamak disruption prediction on EAST, only 110 known discharges were used to calculate the normalization parameters, aiming to simulate real application scenarios.

The self-tokamak models of J-TEXT and EAST based on PGFE
In this part, the self-tokamak model of J-TEXT and EAST was trained with full training set data and the ROC curves are shown in figure 3. The yellow and orange line represents the ROC curves of the J-TEXT model and the EAST model, respectively.The AUC value of J-TEXT model is 0.971 and the AUC value of EAST model is 0.936.The predict performance of the J-TEXT model is not as good as the previous IDP-PGFE model (AUC = 0.987), due to the lack of some features.To maintain consistency with the sampling rate of EAST, the J-TEXT model takes each sample as 1 ms, while the previous IDP-PGFE model takes each sample as 0.1 ms, which may also impact the model performance.
Figure 4 shows the accumulated percentage of disruption predicted versus warning time with the model threshold =0.56 for J-TEXT model and the model threshold =0.93 for EAST model.In this work, warning time larger than 10 ms for J-TEXT and 30 ms for EAST is treated as tardy alarms.Therefore, the final performance of J-TEXT and EAST selftokamak model is TPR = 93.64%,FPR = 8.18% with a tolerance of 10 ms and TPR = 93.33%,FPR = 16.11% with a tolerance of 30 ms, respectively.The average and median of the warning time for the J-TEXT model are both 0.14 s, while for the EAST model, the average warning time is 1.34 s and the median warning time is 0.75 s.Due to the presence of long pulse discharges in EAST, the average warning time may be significantly influenced.Therefore, the median value better reflects the overall warning time of the EAST model.
As a result, PGFE has successfully achieved good performance for both J-TEXT and EAST self-tokamak models by considering the common physics-guided features shared by J-TEXT and EAST tokamaks.This provides a base for the next cross-tokamak works, serving as a model and reference.The selected TPR and FPR is the result at a selected threshold of the model.When considered the real-time application, the threshold should be adjusted based on the device and discharge conditions to select an acceptable TPR and FPR.Considering the protection of tokamak, this work sets a higher requirement for TPR, which may consequently lead to a higher FPR on EAST.

Models of cross-tokamak disruption prediction from J-TEXT to EAST based on PGFE and CORAL
In this part, three models of cross-tokamak disruption prediction from J-TEXT to EAST, whose strategies are mixing data, U-CORAL and S-CORAL, are described and analysed.
Although IDP-PGFE has the ability to train with limited data, it performs poorly on smaller datasets, such as 10 disruptive discharges and 120 non-disruptive discharges in J-TEXT.It is worth noting that even when training with limited data from a single tokamak, it is still necessary to have a certain amount of validation data to prevent overfitting and select the best model.Therefore, the functionality of training with limited data is not suitable for scenarios such as 10 disruptive discharges and 100 non-disruptive discharges in EAST.
The models of five cases, including the three cross-tokamak models, are shown in table 3.In the 'EAST data' column, the numbers represent the 'total number of discharges (number of disruptive discharges)'.The model of case 1 is a baseline model for the cross-tokamak disruption prediction, which is directly testing EAST data on the J-TEXT self-tokamak model called zero-shot test.The model of case 2 adopts the strategy of mixing limited EAST data with J-TEXT data to train a model.This strategy is also commonly used in cross-tokamak disruption prediction approaches.The model of case 3 directly uses CORAL to map the knowledge from EAST data into J-TEXT data for training.We called the data strategy of this case is CORAL and the CORAL strategy is unsupervised.The model of case 4 uses S-CORAL to map the knowledge from both disruptive and non-disruptive EAST data to J-TEXT data separately, and then trains the model.We also called the data strategy of this case is CORAL and the CORAL strategy is supervised.The model of case 5 is the self-tokamak model of EAST as a benchmark, which means the peak performance of the datasets and has been described in section 4.1.
The ROC curve of these five models are shown in figure 5  disruption prediction with significant differences in device and discharge parameters, such as from J-TEXT to EAST, the performance of the mixing data model is still unacceptable.The U-CORAL model shows a little improvement in prediction performance compared to the mixing data model, but the improvement is not significant (AUC value from 0.764 to 0.797).However, the S-CORAL could significantly improve the performance compared to other strategies (AUC value from 0.797 to 0.890) and has a smaller performance gap compared to the EAST model.
Figure 6 shows the accumulated percentage of disruption predicted versus warning time with the model threshold =0.93 for EAST model, the model threshold =0.01 for mixing data model, the model threshold =0.31 for U-CORAL model and the model threshold =0.58 for S-CORAL.The warning time is selected as 30 ms for EAST test set.Considering the protection of tokamak, this work sets a higher requirement for TPR.Our principle for selecting the model threshold is to first ensure the TPR under this threshold higher than 90%.Based on this criterion, we further choose a model threshold that achieves a lower FPR.However, for the mixing data model, the highest TPR is 73.33% under the model threshold is 0.01.Therefore, the final performance of mixing data model, U-CORAL model and S-CORAL model is TPR = 73.33%,FPR = 27.78%,TPR = 90.56%,FPR = 46.67%and TPR = 90%, FPR = 25.56% with the tolerance of 30 ms, respectively.When aiming to select a TPR greater than 90%, the threshold chosen for the U-CORAL model is 0.31, corresponding to TPR and FPR of 90.56% and 46%, respectively.Although the mixing model can achieve a lower FPR, but the highest TPR it can get is just 73.3%.The AUC value can reflect the overall predictive performance of the model.The AUC value of U-CORAL model is higher than that of mixing model.The average warning time is 1.48 s and the median warning time is 0.72 s for mixing data model.The average warning time is 1.85 s and the median warning time is 1.54 s for U-CORAL model.The average warning time is 1.43 s and the median warning time is 0.74 s for S-CORAL model.Except for the U-CORAL model, the average and median warning times of the other two models are similar to the EAST model.
To better reflect the actual situation of disruption prediction, we have divided the True Positive Rate (TPR) and False Positive Rate (FPR) at the selected threshold into more detailed categories: success alarm rate (SAR), tardy alarm rate (TAR), early alarm rate (EAR), missed alarm rate (MAR) and false alarm rate (FAR), to better demonstrate the performance of disruption prediction.These categories are also be mentioned in the cross-tokamak disruption prediction on JET an ASDEX-U [28].For EAST, we considered that the warning time less than 30 ms should be treated as the tardy alarm and the warning time more than 3 s should be treated as the early alarm [52].As shown in table 4, compared to full-data model, TAR, MAR and FAR is increased, SAR is decreased and EAR remains unchanged for S-CORAL model.The EAR of all models is similar, which may be due to certain discharges in the test set that have earlier causes of disruption, leading to early alarms.This requires more detailed interpretability research in the future.
It can be concluded that the application of CORAL, outperforms the previously widely used method of mixing data in cross-tokamak disruption prediction.S-CORAL model further improves the performance of cross-tokamak disruption prediction, achieving the TPR of 90%, FPR of 25.56%, and AUC value of 0.89.This performance is close to that of the EAST self-tokamak model trained with full data.Therefore, in terms of performance, cross-tokamak disruption prediction based on DA is a competitive approach for achieving crosstokamak disruption prediction in future tokamaks.We have further investigated the performance of S-CORAL when there is a smaller data volume (primarily the shot of disruptive discharges).We retrained the S-CORAL model using disruption shots of 1, 3, 5, and 8 (paired with a proportion of non-disruptive discharges identical to that of 10 disruptive discharges) and obtained the ROC curves of each model as shown in figure 7, as well as the curve of AUC values as a function of the number of disruptive discharges as shown in figure 8. Unfortunately, when only using one disruption discharge as training data, the AUC value of S-CORAL is only 0.726.When the number of disruption discharges is fewer than five, the performance of S-CORAL improves significantly as disruption discharges increases.However, once the number of disruption discharges exceeds five, this performance improvement slows down.For disruption prediction on a new tokamaks, it might be beneficial to intentionally design scenarios with less-harm disruptive discharges that cover a broader range of disruption types at the beginning.

Interpretability study of the cross-tokamak disruption prediction based on DA
This section will describe the interpretability study.The objective of the interpretability study is to investigate why the S-CORAL model can outperform the mixing data model and U-CORAL.It can also provide insights and valuable experience for future applications on other future tokamaks such as ITER and SPARC.Section 5.1 will investigate how S-CORAL aligned training data distribution between J-TEXT and EAST, which can be called intrinsic interpretability.Section 5.2 will use SHAP to explore the differences in knowledge learned by the mixing data, U-CORAL, and S-CORAL models compared to the knowledge learned by the full data trained EAST selftokamak model on the test set.This interpretable approach can be called post-hoc interpretability.A method to evaluate this difference has been identified to demonstrate that S-CORAL indeed learns knowledge closer to that of the EAST selftokamak model.Section 5.3 will show a disruption example of S-CORAL model for analysing the disruption causes.

Data distribution analysis
Although PGFE can align the diagnostic signals of EAST and J-TEXT into physics-guided features, the large gap between the two tokamaks means that the decision boundary for each tokamak will differ.For instance, the same physics phenomena or parameter changes might trigger a disruption warning in J-TEXT, but not necessarily cause a disruption in EAST.The purpose of normalization and CORAL is to align each feature as much as possible in terms of data distribution.S-CORAL could perform better than U-CORAL and mixing data because more features could be aligned better than in the other two cases.The probability density of four typical normalized features n e0 , n = phase, SXR_array_skew and SXR_array_kurt have shown in figures 9 and 10. Figure 9 represents the probability density for non-disruptive data, while figure 10 represents the probability density for disruptive data.The yellow region and lines represent the probability density of J-TEXT data.The orange region and lines represent the probability density of EAST data.The green region and lines represent the probability density of U-CORAL data.The navy-blue region and lines represent the probability density of S-CORAL data.Due to the imbalance between disruptive data and non-disruptive data volumes, in the distribution of non-disruption data, the results of U-CORAL and S-CORAL overlap significantly.Hence the blue and green lines in figure 7 are almost indistinguishable.The zoomed subfigure in figure 7(d) shows the little difference of blue line and green line.It could found that the distribution differences of the raw signal between two tokamaks, are much significant than those in PGEF features.Therefore, we use DA to further align the data distributions.Compared to only using PGFE, applying S-CORAL further reduces the differences between data distributions.
Figure 9 demonstrates that the non-disruptive data distribution between J-TEXT and EAST is already quite similar.This indicates that the process through PGFE and normalization is sufficient to align the distributions of non-disruptive data between J-TEXT and EAST effectively.The non-disruptive data distribution of U-CORAL and S-CORAL is the same, which indicates that the two methods of CORAL will not affect the non-disruptive data distribution.In comparison, figure 10 shows that the disruptive data distribution of S-CORAL is more similar to the disruptive data distribution of EAST than that of U-CORAL.At the same time, the disruptive data distribution of U-CORAL is more similar to the disruptive data distribution of J-TEXT than that of S-CORAL.After alignment using S-CORAL, the disruptive data distribution contains more information from the distribution of EAST data.But not all features require the S-CORAL for aligning the disruption data distribution, and not all features can be effectively aligned through PGFE and CORAL.The application of PGFE In summary, the non-disruptive data distribution of over half features could be aligned sufficient by PGFE and normalization, while aligning the disruptive data distribution of most features requires the additional use of S-CORAL.

SHAP analysis of the models
The data distribution analysis shows that the feature-based method PGFE and CORAL could align the data distribution between J-TEXT and EAST.It is a kind of intrinsic interpretability [56] in the interpretable machine learning due to the PGFE and CORAL are kind of rule-based models.SHAP is an attribution-based interpretable approach, which is a kind of post-hoc interpretability.
SHAP provides global interpretability for the models, analysing the contribution of feature variations to the model's output.In this section, the full data trained EAST self-tokamak model is a benchmark model.The three cross-tokamak models (mixing data, U-CORAL and S-CORAL) will be compared to this reference model to analyse which cross-tokamak model has learned more from the EAST data by the similarity of the global interpretability.To make a fair comparison, the dataset for the SHAP is selected as the test set.Figure 11(b) shows the global SHAP value of different features and their relations with feature value of EAST self-tokamak model.The SHAP results on the test set can be understood as the knowledge the model learned that is applied when distinguishing 'disruptive' or 'non-disruptive' in the test set.The order of the features represents the contributions of features.The colormap represents the feature value of each feature, red means high and blue means low.The advantage of SHAP in global interpretability is that it cannot only provide the ranking of feature contributions to the model but also indicate whether the variations of the features contributes positively or negatively to the model's predictions.When analysing whether multiple models have learned similar knowledge, the positive or negative contribution of feature variations to the model is more important than the ranking of the feature's contribution.When predicting disruptions, even for physicists, judgments about disruptions might be made based on various features.Different physicists might have varying rankings of feature importance when predicting disruptions.However, regardless of how feature importance is ranked, the variations of the feature contributed to 'disruptive' or 'non-disruptive' should remain the same.For instance, before a density limit disruption, not only is the density a critical feature, but MARFE, MHD instabilities are also crucial indicators of density limit disruption.No matter the feature importance ranking, the higher the density, the more significant its contribution to the disruption.The distribution of disruption types and causes in the dataset can also affect the ranking of feature contributions.Therefore, the variations of the feature contributed to 'disruptive' or 'nondisruptive' should be more important than the ranking of the feature contributions.
An evaluation method has been designed to assess whether the knowledge learned by models is similar based on the global interpretability results of SHAP.The core logic of this evaluation method is the counting of features with similar change patterns.Since the full data trained EAST self-tokamak model was trained using a larger amount of EAST data and performed the best, the learned knowledge about EAST disruptions is more comprehensive and accurate.Therefore, the three crosstokamak models will be compared using the full data trained EAST model as the benchmark.If the variations of any feature in the cross-tokamak model contributed to 'disruptive' or 'non-disruptive' is the same as that in the benchmark model, then that feature scores positive one point.Such as, the greater the value of the feature v_loop, the higher its contribution to the disruption in the self-tokamak model.The trend of the feature contribution is also consistent in the S-CORAL model.Thus, for the S-CORAL model, the feature v_loop scores a positive one point.On the contrary, if the variations of any feature in the cross-tokamak model contributed to 'disruption' or 'non-disruption' is not the same as that in the selftokamak model, then that feature scores negative one point.For example, in the self-tokamak model, the larger the value of the feature d Z , the higher its contribution to the 'disruptive'.In the S-CORAL model, the value of d Z does not contribute to either 'disruptive' or 'non-disruptive', which is also not the same as it in the benchmark model.Therefore, for the S-CORAL model, the feature d Z is given a negative one point.Then, we added the score of each feature to evaluate which model is more similar to the benchmark model.
Figure 11(a) also shows all the feature scores of three crosstokamak models by the similarity evaluation with the selftokamak model.Columns 1, 2, 3 of the table show the scores of models mixing data, U-CORAL, S-CORAL on each feature, respectively.Except for the first and last rows, each row corresponds to each feature in figure 11(b).The last row of the table with the red background colour shows the total score.The rank of total score indicates that the S-CORAL model (scores 7) is the most similar to the self-tokamak model.While, the mixing data model scores −3 and U-CORAL model scores 3 The skewness of the array signals is a measure of the asymmetry of the distribution of the array signals about their mean [24], which can be approximately regarded as reflecting the displacement extent of the plasma measured by the diagnostics.The score of d r , AXUV skew , SXR skew , and DEN skew in all three cross-tokamak models are −1.It reflects that the plasma deviates in different directions from the centre when approaching disruption on EAST and J-TEXT.In the previous research, the interpretable study of J-TEXT proves that the plasma usually tends to shift towards the low field side (LFS) when approaching disruption (even if it is salvageable).However, the plasma tends to shift towards the high-field side (HFS) approaching disruption on EAST.This might be related to the differences in plasma control systems between J-TEXT and EAST.

SHAP analysis on disruption causes
The SHAP analysis could also be used to identify the disruption causes.Our previous work has provided a possible experimental analysis direction based on the SHAP value time evolution in a discharge on the density limit disruption affected by RMPs.In this section, we will show a disruptive discharge example of S-CORAL model to analyse the disruption causes.Figure 12 shows the predicted result of a disruptive discharge # 86485 in EAST.The disruption time is 4.005 s and the predicted time is 3.448 s with the threshold of 0.58, which is selected in the manuscript.The cause of the disruption was impurities inducing MHD instabilities further leading to a final disruption, which is a typical type of disruption on EAST [20,21].At about 3.435 s, impurities caused a significant increase in v_loop.At this time, the SHAP value of v_loop also triggered a higher value, result in the model alarm at the 0.58 threshold.At about 3.585 s, impurities also caused the increase both in v_loop and Mir_Vpp.The SHAP value of v_loop and Mir_Vpp both went higher.The predicted result went to close 1.Therefore, if lower FPR is required, the model threshold should be set higher.Even if the threshold of the model will be higher than 0.58, the model could also alarm this disruption with an enough warning time.The example shows that the transferred model S-CORAL could not only predict disruption but also correctly recognizes the disruption causes.

Summary and future plan
This paper introduced a novel approach to predict disruption in a future tokamak only using a few discharges based on DA.This approach is a light, interpretable and few data required cross-tokamak approach.It is the first attempt of applying DA in the task of cross-tokamak disruption prediction.
Cross-tokamak disruption prediction based on DA aligns a few data from the future tokamak (target domain) and large amount of data from existing tokamak (source domain) to train a machine learning model in the existing tokamak.We selected J-TEXT and EAST to simulate the existing and future tokamak, respectively.PGFE, originally designed as a feature extractor for J-TEXT, has now been successfully implemented on EAST.Moreover, it has achieved a high-performance EAST self-tokamak model (AUC = 0.936, TPR = 93.33%,FPR = 16.11%) using a large amount of data from EAST.This demonstrates that PGFE possesses the adaptability to be transferred to other tokamaks.PGFE can extract the less device-specific features, which established a solid foundation for cross-tokamak disruption prediction.However, difference in tokamaks might result in decision boundaries on disruption.Therefore, CORAL as a DA algorithm is used to transfer the disruption prediction model from J-TEXT to EAST.In this paper, CORAL is improved into an algorithm that is more suitable for the disruption prediction task, call supervised CORAL (S-CORAL).The original CORAL, on the other hand, is referred to as unsupervised CORAL (U-CORAL) in this paper.With limited EAST data (100 nondisruptive discharges and 10 disruptive discharges), the commonly used mixing data method fails to achieve good performance (AUC = 0.764, TPR = 73.33%,FPR = 27.78%)only using PGFE to align features.Using U-CORAL can enhance the performance of disruption prediction on EAST with the TPR of 90.56%, FPR of 46.67% and AUC value of 0.797.Using S-CORAL further improves the disruption prediction performance on future tokamak with the TPR of 90%, FPR of 25.56% and AUC value of 0.89.The interpretability study shows the reason that why S-CORAL model could perform the best in the three cross-tokamak models in this paper.From the analysis of the data distribution, S-CORAL brings the transformation of the data distribution closer to EAST than to J-TEXT.Moreover, SHAP analysis was done on both the EAST self-tokamak model as well as all three cross-tokamak models.We propose an assessment method for evaluating whether a model has learned a trend of similar features using SHAP analysis.It is found that the S-CORAL model (scores 7) learned knowledge more similar to the EAST self-tokamak model than other two models (mixing data model scores −3 and unsupervised CORAL model scores 3).Based on the SHAP analysis, we hypothesize that differences in the control systems of different tokamaks may affect the transfer effects of the disruption prediction models.
Although this paper proposes a light, interpretable and few data required cross-tokamak approach, it still need to be improved.(1) Only the J-TEXT and EAST are used to test this cross-tokamak disruption prediction approach.Data from more tokamaks would be beneficial for validation and improvement of this approach.(2) PGFE still could not extract generalized normalized features, although it has been successfully applied on EAST.Therefore, the improvement of PGFE applicable to most tokamaks requires continuous and in-depth research.(3) PGFE could be improved or upgraded to better align features from different tokamaks.We need to understand more about the differences in the physics, size, configurations and control system between J-TEXT and EAST.(4) The performance of self-tokamak model still needs to be improved.The FPR of the model needs to be further reduced to ensure the economics of future tokamak operations.High FPR can cause a significant reduction in discharge efficiency.(5) PGFE is not only a feature alignment algorithm for decision tree, but also could be applied in deep learning.(6) The cross-tokamak disruption prediction models should require fewer data from the future tokamak (target domain), such as zero-shot test.Therefore, our team would like to first improve PGFE and try to applied on other tokamaks.We will also explore possible cross-tokamak disruption prediction approaches with fewer data from the future tokamak.

Figure 1 .
Figure 1.The structure of the cross-tokamak disruption prediction based on DA.The pink modules represent the source (existing) tokamak, the orange modules represent the target (future) tokamak.The blue module is the feature extractor, PGFE.The purple module is the DA algorithm, CORAL.The grey module is the classifier, DART.The conch module is the explainer, SHAP.

Figure 2 .
Figure 2. The flowchart of S-CORAL and U-CORAL.The colours of the modules are consistent with figure 1.

Figure 3 .
Figure 3.The ROC curves of the self-tokamak model of J-TEXT and EAST.The FPR axis is from 0% to 50%.The TPR axis is from 50% to 100%.The yellow, orange and navy-blue line represents the ROC curves of J-TEXT, EAST model in this work and IDP-PGFE in the previous work, respectively.

Figure 4 .
Figure 4.The accumulated percentage of disruption predicted versus warning time.The model threshold is 0.56 for J-TEXT model (yellow) and the model threshold is 0.93 for EAST model (orange).The red dashed line represents the accumulated percentage of disruption predicted equals to 90%.The light blue dashed lines represent the warning time of 0.01 s (10 ms), 0.03 s (30 ms), 0.05 s (50 ms), 0.3 s (300 ms) and 1.5 s (1500 ms).
. The light-blue line represents the ROC curves of the J-TEXT benchmark model by zero-shot test with the AUC value of 0.642.The yellow line represents the ROC curves of the mixing data model with the AUC value of 0.764.The green line represents the ROC curves of the U-CORAL model with the AUC value of 0.797.The navy-blue line represents the ROC curves of the S-CORAL model with the AUC value of 0.890.The orange line still represents the EAST model with the AUC value of 0.936.Similar to previous studies on cross-tokamak disruption prediction, the mixing data model does improve the prediction performance compared to direct zero-shot testing (AUC value from 0.642 to 0.764).However, for cross-tokamak

Figure 5 .
Figure 5.The ROC curves of the models of five cases.Five coloured lines represent the models of five cases, respectively (case 1-light-blue, case 2-yellow, case 3-green, case 4-navy-blue and case 5-orange).

Figure 6 .
Figure 6.The accumulated percentage of disruption predicted versus warning time.The model threshold is 0.93 for EAST self-tokamak model (orange), the model is 0.01 for mixing data model (yellow), the model threshold is 0.31 for U-CORAL model (green) and the model threshold is 0.58 for S-CORAL model (navy-blue).The red dashed line represents the accumulated percentage of disruption predicted equals to 90%.The light blue dashed lines represent the warning time of 0.03 s (30 ms) and 1.5 s (1500 ms).

Figure 7 .
Figure 7.The ROC curves of the models when using different numbers of disruptive discharges.The dark red, green, navy-blue, yellow, orange and light-blue lines represent using 10, 8, 5, 3, 1 and 0 disruptive discharges, respectively.

Figure 8 .
Figure 8.The AUC value versus the number of disruptive discharges of S-CORAL.

Figure 11 .
Figure 11.(a) The scores of three cross-tokamak models by the similarity evaluation with the self-tokamak model.The last row of the table with the red background colour shows the total score.(b) The SHAP value of different features and their relations with feature value of the full data trained EAST self-tokamak model.The width of bar in the SHAP result represent the number of the samples.The colormap represents the feature value of each feature, red means high value and blue means low value.

Figure 12 .
Figure 12.The SHAP analysis for a disruptive discharge.(a) The navy-blue line represents the plasma current.The dark red line represents the predict result.(b) The light blue line represents the raw signal of the Mirnov probe.The orange line represents both the feature value in (c) and SHAP value in (d) of Mir_Vpp, respectively.The green line represents both the feature value in (c) and SHAP value in (d) of v_loop, respectively.

Table 1 .
[24]riptions and symbols of all the features for J-TEXT and EAST[24].

Table 2 .
Split of datasets of the predictor.

Table 3 .
Five models of cross-tokamak disruption prediction.

Table 4 .
The detailed categories for the evaluation of disruption prediction.