Gaussian process-based quasi-coherent noise suppression in magnetic confinement devices with superconductors

Gaussian process (GP)-based technique suppressing quasi-coherent noises, i.e. structured noises, is developed which is more effective than conventional denoising techniques such as using frequency-domain filters. Superconducting devices like KSTAR, EAST, JT-60SA and ITER require separate sets of normal conducting magnetic coils inside the tokamak vacuum vessels to achieve a prompt control of fusion-grade plasmas in response to various fast and abrupt plasma activities such as vertical displacement events. Hence, these in-vessel control coils are typically operated with high-frequency switching power supplies which generate quasi-coherent noises. Semi-conductor based bolometers in KSTAR, for instance, are vulnerable to the quasi-coherent noise that makes a tomographic reconstruction for the 2D poloidal radiation map with the noise-contaminated signals flawed. By modeling the quasi-coherent properties of the noise as multivariate Gaussian distribution and generating the kernel function for the GP solely based on the measurements, the proposed method is able to suppress the noise whose performance is superior to the conventional filtering schemes. The method not only suggests an estimate of the denoised signal but also informs the consistent (with the measurements) uncertainty of the estimate at a level smaller than the standard deviation of the quasi-coherent noise. Performance of the method is confirmed with synthetic data containing the quasi-coherent noises, and it is applied to the measured data obtained by the KSTAR bolometers.


Introduction
Modern fusion devices, such as KSTAR [1,2], EAST [3], JT60-SA [4], and ITER [5,6], use superconductors for their primary magnetic coils, i.e. toroidal and poloidal field coils. These coils are typically placed outside the tokamak vacuum vessels for reasons like liquid helium cooling to maintain the superconducting states. Consequently, plasma controllability with the superconducting magnetic coils is limited, primarily due to their relatively slow response and the shielding effect of the conducting vacuum vessels.
To resolve the limitation on the plasma controllability, invessel control coils made of normal conductors are installed and used for active and fast feedback controls in case that undesirable abrupt plasma events such as the vertical displacement event [7][8][9] occur in superconducting tokamaks [10][11][12][13][14][15][16]. As the fast feedback controls are required, these invessel control coils are typically driven by high-frequency switching power supplies which generate noise in the form of quasi-coherent spikes. While quasi-coherent noise interferes with devices sensitive to electromagnetic fluctuations, such as the microwave cavity [17] and magnetic probe [18], the primary example addressed here is the semiconductorbased bolometers [19] installed in KSTAR. In this work, the quasi-coherent noise refers to a type of noise that exhibits some degree of regularity or periodicity, but not highly structured as in the case of fully coherent ones. Suppressing such a noise can be challenging as its properties are neither entirely random nor predictable. In the plasma physics domain, there was a similar attempt of separating coherent and incoherent structures of turbulence data using orthogonal wavelets [20].
We present an innovative and robust method that suppresses the quasi-coherent noise based on Gaussian process (GP) [21]. The method provides not only an estimate of the denoised signal but also the estimate of the uncertainty consistent with the measurements, which are critical information for solving illposed or ill-conditioned problems such as tomographic reconstructions, at a level smaller than the standard deviation of the quasi-coherent noise. Our proposed GP-based denoising method requires no prior knowledge on structures of the noise, and necessary information are fully obtained from the measured data themselves. Our method outperforms conventional denoising techniques such as frequency-domain filters and is general, thus applicable to any measured data suffering from quasi-coherent noises.
Note that the GP achieves non-parametric regression using a multivariate normal distribution and a covariance kernel function taking into account of correlations determined by the data distance. The GP is a widely adopted non-parametric model in the fusion community when a parametric model describing measured signals is not suitable or limited such as plasma profile fittings [22][23][24][25][26] or tomographic reconstructions [27][28][29][30].
We, first, verify the developed GP-based denoising method using noisy synthetic data that consist of known benchmark signals and the quasi-coherent noise taken from the actual KSTAR experimental data. Subsequently, we apply the denoising method to real data obtained by a KSTAR diagnostic system, and the semiconductor-based bolometers [19], measuring radiated power released during a plasma disruption and its mitigation, are selected as exemplary targets in this work since they are heavily influenced by the quasicoherent noise. This practical application exhibits significant improvements in data analyses as the method provides quantitatively both the estimate of the denoised signal and its uncertainty.
To further motivate necessity of our developed method, let us briefly describe the bolometer system [19] in KSTAR. It consists of a poloidal filtered AXUV (absolute extreme ultraviolet) photodiodes array (PFAA) with the sampling frequency of 200 kHz and is installed inside KSTAR vacuum vessel (see figure 1) to secure an appropriate viewing angle for a tomographic reconstruction of the 2D (poloidal cross-section) map of the radiated power during plasma disruption events. It picks up the quasi-coherent noise generated by the high-frequency switching power supplies as the in-vessel cables connected from the AXUV arrays act as antennas. Figure 2(a) is an example of the measured radiation data from PFAA_OD28, the 28th channel of the PFAA whose line of sight passes through the center of the plasmas, during KSTAR shot #29320 where a plasma disruption was induced by the shattered pellet injection [17]. The accompanying red line shows the rapid decrease (within ∼5 ms) of the plasma current due to the disruption. (a) Example of the measured radiation data from PFAA_OD28 during the plasma disruption in KSTAR shot #29320 with the temporal evolution of the plasma current in red. (b) and (c) are during the disruption and after the plasma is turned off, respectively, showing how the quasi-coherent noise generated by the high-frequency switching power supplies contaminates the measurements.
During the disruption as shown in figure 2(b), the average SNR (signal-to-noise ratio) is a mere 7, and even at the point of the strongest radiation, it reaches only about 30. Such levels of the SNR poses a challenge for the tomographic reconstruction as we demand estimation of the radiated powers on approximately 1000 pixels in a poloidal cross-section with about 40 lines of sight. Note that as the radiations produced during a plasma disruption is extremely intense, the pinhole diameter is set to 0.5 mm to prevent the signal saturation. Therefore, increasing the SNR with a larger pinhole diameter is not practical because it puts a heavy burden on the AXUV photodiodes.
This raises our motivation to develop a noise-suppression scheme. As will be discussed and shown, the noise picked up by the bolometer system is quasi-coherent, and we can obtain detailed characteristics of the noise (for instance, see figure 3) solely based on the data after the plasma is turned off as shown in figure 2(c) because the high-frequency switching power supplies are still operating during plasma-off phases. Such information allows us to model the quasi-coherent noise within the GP frame.
This work is structured as follows. In section 2, we describe how we model the denoised signal and the quasi-coherent noise using the GP. Basically, the GP model of the denoised signal relies on the argument of Bayesian Occam's razor, while that of the noise is obtained using the auto-covariance function. We, then, demonstrate our proposed GP-based denoising method against various synthetic data in section 3.1, which is followed by application to real KSTAR data in section 3.2. Our conclusion is presented in section 4.

GP for the denoised signal and the quasi-coherent noise
The GP regression is a nonparametric Bayesian regression with an assumption that data follow GP, that is defined as a collection of random variables exhibiting a joint Gaussian distribution [21]. Space and/or time are typically set as the domain of the GP, and we use time as the domain in this study since our method is developed for time-series signals although it is not limited to time. We represent the domain by a set of points T = {t 1 , t 2 , . . . , t n }, with each point t i in T corresponding to a random variable f(t i ). If the random vector f(T) = {f(t 1 ), f(t 2 ), . . . , f(t n )} exhibits a multivariate Gaussian distribution, then f(T) is a GP.
The GP can be characterized with a multivariate Gaussian distribution determined by the mean vector µ(T) and the covariance matrix K(T, T) as following: where the ijth element of the covariance matrix, namely K ij , is set by a kernel function, k(t i , t j ), with the input points t i and t j . Here, N (µ, Σ) simply means a random sample drawn from a normal distribution whose mean and variance are µ and Σ, respectively. The mean function µ(T) can assume various forms but generally takes averages of the data samples of interest or simply zeros. One of the most widely and commonly used forms for a kernel function is the squared exponential (SE) function [31], which is defined as with the hyperparameters σ f (overall scale) and λ t (length scale), representing a magnitude of the signal dispersion and a length scale of the smoothness, respectively. Consequently, smaller λ t causes the covariance to decrease rapidly with the data distance, resulting in a fine structure among the data; in other words, high frequency or high wavenumber components are allowed.
In this study, we model that the measured noisy signal y(T) is the sum of a noise-free signal f(T) and a noise process ϵ(T), y = f + ϵ, with all signals being GP. This represents an additive GP [32]. The noise, by definition, has a mean value of 0, and we denote the corresponding covariance matrix as Σ. Then, the joint probability distribution of y(T), f(T), and ϵ(T) is [32]  here the superscript T is the usual transpose operator. Before making any observations, the prior mean values of y and f , denoted as µ, are usually assumed to be a null vector, 0 [22]. Note that if the noise process is considered to be Gaussian white noise, then its covariance matrix, which we denote as Σ GW , is equal to the identity matrix multiplied by the variance of the noise σ 2 ϵ I. The posterior GP, which describes the probability density function (PDF) of the noise-free signal f(T) conditioned on the measured noisy signal y(T), can be obtained by using the conditional multivariate Gaussian distribution, as [31,32] p(f|y, T) = N (µ f|y , Σ f|y ), with where µ f|y is the predicted mean (or the estimate) of the noisefree signal, while Σ f|y is the covariance matrix of the prediction. The diagonal elements of Σ f|y is used to estimate the uncertainty associated with the mean. Thus, our final goal of this study is finding µ f|y and Σ f|y as they represent the noise-free (or denoised) signal and its associated uncertainty given the measured noisy signal with the kernel functions in the GP. The task, then, becomes how we can determine the measurement-based covariance matrices K and Σ for the denoised signal f and the quasi-coherent noise ϵ, respectively, which we describe in detail below.

GP model of the noise-free signal f with Bayesian
Occam's razor: determining K The Bayes' theorem dictates that the posterior distribution of the denoised signal f given the noisy measurements y and hyperparameters θ consisting of σ f and λ t in the GP domain T is and the covariance matrix K is completely determined by the hyperparameters θ since we use the SE function (equation (2)) as the kernel function for the noise-free signal f(T).
In addition, we can also construct a Bayesian representation for the hyperparameters as Thus, we can see that the GP model for f(T), i.e. the covariance matrix K, can be obtained through the hyperparameter optimization using a marginal likelihood maximization in equation (7), which corresponds to maximizing the evidence p(y|θ, T). This is because equation (8) tell us that the posterior distribution for the hyperparameters p(θ|y, T) is proportional to the likelihood p(y|θ, T), which is the evidence in equation (7), as we take the prior distribution for the hyperparameters p(θ|T) to be flat owing to the fact that we do not have any prior information on the hyperparameters. This is basically an application of Bayesian model selection based upon Occam's Razor [33], that allows us to determine a model, i.e. hyperparameters for our case, based on the measured noisy data taking into account of their complexity.
To compute the marginal likelihood p(y|θ, T) we integrate the product of the likelihood and the prior in equation (7) over all possible GP values (or models), that is ant the hyperparameters θ = {σ f , λ t } maximizing this quantity determine the kernel function, thus the covariance matrix K. By evaluating the marginal likelihood we are able to balance the data fit capability against its complexity, thereby preventing an overfitting problem while preserving appropriateness of the model for the measured data. Finally, the log marginal likelihood of the additive GP can be estimated as following [34]: The first term in the RHS quantifies goodness of the data fit, measuring the agreement between the observed data and the model predictions. The second term represents the complexity penalty, that is measuring the smoothness of the predictions and penalizing models with high complexities that can lead to overfitting. The third term is a normalization constant that depends on the number of the data points N and ensures that the log marginal likelihood is on the same scale for different models.
By calculating the marginal likelihood using only the noise GP model and comparing it with equation (10), it can be confirmed whether this method is effective in separating noise. If the covariance matrix of the marginal likelihood uses only Σ instead of (K + Σ), this serves as a measure of how well the noise model alone can explain the signal. If the marginal likelihood calculated using only Σ is similar to or larger than the marginal likelihood value obtained using (K + Σ), it may be difficult to apply this method. Conversely, if the marginal likelihood calculated using (K + Σ) is significantly larger than the marginal likelihood derived using the noise model alone, it implies that this method can be usefully applied.
This ends the description on how we find the GP model that best describes the statistical properties of the denoised signal based on Bayesian Occam's Razor.

GP modeling of the noise using autocovariance function
We now turn our attention to describe how the covariance matrix Σ for the quasi-coherent noise ε is obtained. Modeling the quasi-coherent noise as the GP presents challenges when employing an analytic kernel function due to the complexity of the noise characteristics, which may not be sufficiently captured by a simple analytic form.
On the other hand, if sufficiently large amount of experimental data containing the noise alone are available to analyze the characteristics of the noise, then we can construct the GP model of the noise, i.e. the covariance matrix Σ, numerically solely based on the data. Indeed, we have ample quasicoherent noise data from the bolometer system in KSTAR without the presence of the signals caused by plasmas. In fact, most of the diagnostic systems, if not all, do have noise data before and/or after plasma discharges in fusion devices. Our numerical approach for the covariance matrix is based on an assumption that the noise can be modeled as a stationary GP, which is a stochastic process where the distribution of the process at any given time depends only on the time difference between any two points, rather than on the absolute time itself. In such a case, we argue that the auto-covariance function of the noise data can be used to numerically generate the covariance matrix Σ.
Suppose that we have M i.i.d. (independent and identically distributed) samples, that is x m ∼ N (µ, Σ), where x m is a vector containing data drawn from the specified normal distribution with the subscript m running from 1 to M, meaning M repeated identical experiments. The maximum likelihood estimation (MLE) of the covariance matrix is, then, given by [35] This approach is similar to that of an ensemble average operator, which means that we require a large number of independent and identical experiments. We can, in fact, simplify the estimation of Σ MLE by realizing that it can be constructed with the auto-covariance function for a wide-sense stationary stochastic process [36]. Such a simplification causes no logical contradictions since the noise is modeled as a stationary GP.
To compute the auto-covariance function of the noise, R ϵ , let us denote the measured noise as where M is now the number of data points, which was the number of independent experiments in calculating Σ MLE above. Then, we have where k is also known as the time-delay in the context of statistical analyses, and it ranges from (−N + 1) to (N − 1). Of course, M must be significantly larger than N to achieve reasonable degree of accuracy, and N sets up the length of the covariance matrix.
With the auto-covariance function R ϵ , we can generate the covariance matrix for the GP model of the quasi-coherent noise. R ϵ (k) is the ij th element of the covariance matrix at time- We denote Σ Rϵ to represent the covariance matrix Σ determined by R ϵ , and it is clear that Σ Rϵ is a Toeplitz matrix with the values of: Σ Rϵ is a symmetric positive semidefinite matrix [37], so inverting the matrix may be numerically unstable. By adding a small positive value to the diagonal elements which is a common technique to improve the numerical stability of inverting a matrix, the resulting matrix is guaranteed to be positive definite. Thus, we add one-hundredth of R ϵ (0) to the diagonal elements of the matrix. We show an example of the extracted noise characteristics by considering the time period when the plasmas are completely turned off. Figure 3(a) shows the raw signal from PFAA_OD28, the 28th channel of the PFAA, from KSTAR shot #29320. Since there are no plasmas during this time interval, the signal contains only the noise information. Figures 3(b) and (c) are the auto-covariance and the autopower spectrum, respectively, which clearly show the quasicoherency of the noise with the 4 kHz-interval harmonics corresponding to the frequency of the switching power supplies. Having structured and non-Gaussian white noise, we see that the noise removal with a frequency-domain filtering scheme may not be applicable. Figure 3(d) shows the histogram density of the measured noise with an excellent Gaussian fit which demonstrate that the noise can be regarded as a sample drawn from a Gaussian distribution. Furthermore, figure 3(e), which is the histogram as a function of ϵ(t i ) and ϵ(t i+1 ), indicates that there exists an indisputable finite degree to which the noise aligns with the bivariate normal distribution. Note that the noise properties shown in figures 3(b)-(e) are obtained by using the measured data from 5.1 to 5.2 s of the KSTAR shot #29320.
We find that these characteristics of the quasi-coherent noise is consistent with the properties of the stationary GP, supporting usage of the auto-covariance function R ϵ to generate the covariance matrix Σ for the GP model of the noise, i.e. letting Σ = Σ Rϵ . Figure 4 shows the calculated Σ Rϵ for four different channels of the PFAA as examples. Notice that the covariance matrices are structured due to the quasi-coherency and that different channels have different covariance matrices.

Evaluation and application of the developed GP-based denoising method
As a reminder, our ultimate goal with the developed GP-based denoising method is to obtain the posterior GP p( f|y, T) with the predicted mean µ f|y , which is the denoised signal, and the covariance matrix of the prediction Σ f|y as in equations (4)-(6) purely based on the measurements such that µ f|y is as close to the true noise-free signal as possible with the reliable quantification of the uncertainty using Σ f|y that are consistent with measurements.
To assess the developed GP-based denoising method, we generate various synthetic data containing the quasi-coherent noise. We show that it outperforms conventional frequencydomain filtering schemes and that the estimated uncertainty of the denoised signal is reliable, consistent with the data and smaller than the standard deviation of the quasi-coherent noise. With such a confirmation, we apply our method to real KSTAR diagnostic data and compare the results obtained from various GP models.

Evaluation of the GP-based denoising method with synthetic data
Similar to what have been done for evaluating the reliability and/or applicability of statistical techniques using synthetic data or simulation data [20,[38][39][40][41][42][43] in fusion communities, we examine the developed GP-based denoising method with synthetic data. We generate the synthetic data using what are known as Blocks, Bumps and HeaviSine functions [44], which are commonly used to evaluate noise removal methods in signals processing. These types of functions are used to generate benchmark signals in various signal processing [20,[44][45][46][47] and machine learning algorithms [48,49], such as wavelet denoising methods and convolutional auto-encoding deep learning-based methods, due to their ability to present challenges such as non-smoothness, multiple local optima or abrupt changes.
The Blocks function is a piece-wise constant function characterized as,   (14), to assess the developed GP-based denoising method. Randomly selected real quasi-coherent noise data obtained from one of the AXUV array, i.e. PFAA_OD28, from 5.2 s to 5.5 s of KSTAR shot #29320 with a length of 10 ms are shown in the middle panels. The bottom panels show the noisy synthetic data which are sums of the true synthetic data and the noise data.
where t i and h i determine the location and height of a jump, respectively, and sgn is the sign function. The Blocks function exhibits abrupt changes, i.e. jumps, at specified points as shown in figure 5(a). This property makes it as a useful benchmark signal to test optimization and signal processing algorithms that are sensitive to such discontinuities. The Bumps function is a multi-modal function, containing multiple peaks, generated by the following expression: Here, the roles of t i and h i are same as for the Blocks function, while w i sets up the width of a peak. The Bumps function is non-linear and non-smooth, and it has multiple local optima as shown in figure 5(b). Note that the values of t i , h i and w i for both the Blocks and the Bumps functions are taken from the work done by Donoho and Johnstone [44] which introduced these functions. The HeaviSine function is a combination of a sine wave and a step function, characterized by its abrupt changes in amplitudes as This function, shown in figure 5(c), is useful for examining algorithms that are sensitive to sudden changes in the amplitude as it contains both smooth and abrupt changes. These benchmark signals act as the ground truth or the noise-free signals, and we add the quasi-coherent noise to them to mimic the measured noisy data. The quasi-coherent noise is randomly taken from the actual PFAA data of KSTAR shot #29320, i.e. PFAA_OD28, from 5.2 s to 5.5 s with a length of 10 ms, corresponding to 2000 data points, which is the number of data points we generate for the benchmark signals. Note that we generate the covariance matrix Σ Rϵ for the GP model of the noise by using the data from 5.1 s to 5.2 s (see section 2.2). We have chosen, on purpose, a different time-interval of the noise signal, i.e. from 5.2 s to 5.5 s, for the evaluation of the method to confirm generality of the covariance matrix. Figure 5 shows the true synthetic data, the quasi-coherent noise data and the noisy synthetic data from top to bottom panels, respectively.
To be able to calculate the posterior GP p(f|y, T), we also need to obtain the covariance matrix K for the GP model of the denoised signal, while we have Σ = Σ Rϵ (see equations (4)-(6)). As discussed in section 2.1, K is completely determined by the SE kernel function (equation (2)) with the hyperparameters σ f and λ t . Thus, using equation (10) we calculate the log marginal likelihood log p(y|θ, T) solely based on the generated three types of synthetic noisy data. Figure 6 shows the log-PDFs as a function of the hyperparameters. The black dots indicate where the PDFs are maximum, and we use the corresponding values of σ f and λ t to generate the covariance matrix K. Notice that the Blocks and the HeaviSine functions have the similar overall scales σ f , while the Blocks have a smaller length scale λ t . Additionally, the Bumps have the smallest overall scale σ f , while the length scale λ t is similar to that of the Bumps. These results are consistent with our intuition by looking at their structures as shown in figure 5. We emphasize that these hyperparameters are found based only on the noisy data without having any prior knowledge on the noisefree signals.
We now have all the necessary information to estimate the noise-free signal given the noisy data, that is to estimate the predicted mean µ f|y (equation (5)). Figure 7 shows the results of suppressing the quasi-coherent noise using our GPbased method for (a) the Blocks, (b) the Bumps and (c) the HeaviSine. It is evident that the estimated denoised signals (red line) from the noisy signals (cyan line) are very similar to the ground truth (blue dotted line). A modest discrepancy can be seen for a case of the HeaviSine function at two discontinuous points for a short period of time. This is owing to the fact that the GP model favors rather a large length scale λ t associated with the overall sinusoidal structure in the data. In fact, the log-PDF map shown in figure 6(c) shows that such discrepancy may exist as it has a wider contour compared to the other functions. Thus, if one has prior information that certain measured data are composed of a large scale structure with an abrupt change, similar to the HeaviSine function, then one could adjust the length scale to be slightly smaller. Overall, we see that the performance of the predictions using our method is outstanding.
To be quantitative on measuring the performance level of our method, we employ a commonly used figure of merit in signal processing, that is the root mean square (RMS) error defined as RMS = (1/N) 2 with the number of data points N, and f i andf i are the noise-free (ground truth) and the denoised signals, respectively. In estimating the RMS values for our GP-based method, we also estimate them for a conventional frequency-domain filtering scheme. Comparing with the filtering scheme can be tricky as one can consider a low-pass, a high-pass or even a band-pass filter. On the other hand, utilizing a high-pass or a band-pass filter means that certain prior information on the noise-free signal is available, while our GP-based method works without such information. We, therefore, examine a low-pass filter in this study to be compared with our method. To be thorough on the comparison, we scan the cut-off frequency of the low pass filter from 0.1 kHz to 50.1 kHz. Figure 8 shows the RMS values as a function of the cutoff frequency of the low-pass filter for (a) the Blocks, (b) the Bumps and (c) HeaviSine. As an indicator, the blue horizontal dashed lines reflect the standard deviations of the added quasi-coherent noises. The red horizontal dashed lines are the results from our GP-based method, and they, of course, do not depend on the cut-off frequency. The green lines are the results from the low-pass filter, and the green dots indicate where the RMS values are minimum. It is clear that our method provides the smallest possible RMS value that can be obtained from the optimal cut-off frequency. Considering that we do not have complete knowledge on the ground truth of the measured signals in real world cases, the optimal cutoff frequency can never be found. Therefore, we argue that our GP-based denoising method outperforms the conventional filtering schemes.
Another advantage of our GP-based denoising method is that it naturally quantifies the uncertainty of the predicted estimate consistent with the measurements, which is not typically available in other types of denoising schemes. This is possible because our method uses a GP model based upon the Bayesian inference. Figure 9 shows, for (a) the Blocks, (b) the Bumps and (c) the HeaviSine, the prediction errors (red line) computed as |f i −f i | and the uncertainty (black dashed line), which is estimated as twice the square root of the diagonal elements of Σ f|y in equation (6). The blue horizontal dashed lines indicate levels of the standard deviations of the quasicoherent noises. We see that the prediction errors are mostly below the uncertainty level. Thus, we confirm that our GPbased denoising method provides a reliable estimate of the uncertainty, which is smaller than the standard deviation of the quasi-coherent noise.
We have shown, by utilizing various types of synthetic data, that the developed GP-based denoising method can generate all the necessary information solely based on the measured noisy data without any prior knowledge on the noise-free signals and on the noise characteristics, so that it can predict the ground truth superior to a conventional low-pass filter scheme as well as provide a reliable uncertainty information. We, now, apply our method to real-world data obtained from KSTAR.

Application of the developed GP-based denoising method
We apply our GP-based denoising method to the measured data obtained by the PFAA_OD28, which is one of the channels in the AXUV array [19] installed inside the KSTAR vacuum vessel. The data are taken during a deliberately triggered plasma disruption by the shattered pellet injections [17] in KSTAR shot #29320.
We have a few different choices on how we generate the covariance matrix Σ for the GP model of the quasi-coherent noise, which consequently affects a choice of the hyperparameters for the covariance matrix K to model the noise-free signal because the log marginal likelihood for estimating the hyperparameters depends on Σ (see equation (10)). It, then, ultimately influences the predicted mean of the noise-free signal µ f|y as well as the covariance matrix of the prediction Σ f|y as in equations (4)- (6).
To illustrate such model selection effects, let us consider three different GP models for the noise. We form the first model for the noise by pretending that it is a Gaussian white noise, i.e. random noise, meaning that we let Σ = Σ GW where the off-diagonal terms of Σ are zeros with all the diagonal terms equal to R ϵ (0). Then, the hyperparameters for the noise-free signal are consistently found with Σ GW using equation (10), which turns out to be σ f = 6.6 × 10 −2 V and λ t = 1.6 × 10 −4 s. Figure 10(a) shows the predicted denoised signal (orange line) and its associated uncertainty (orange shade) together with the raw noisy data (blue line). We see that this GP model lacks the ability to capture an abrupt Figure 10. Three different results of the developed GP-based denoising method applied to experimental data measured by PFAA_OD28, the 28th channel of the AXUV array, from KSTAR shot #29320. (a) Gaussian white noise, which is inadequate for the quasi-coherent noise, is used to model the measured noise and to obtain the hyperparameters, (b) the noise is modeled as Gaussian white, while the hyperparameters are forcefully fixed to those values obtained by the quasi-coherent noise model, which introduces logical inconsistency, and (c) the quais-coherent noise model is properly used, and the hyperparameters obtained consistently. Orange line (shade) is the denoised signal (uncertainty), and the blue line is the raw measured data. The green circle indicates the abrupt change in the measured data. change around t = 5.0255 s marked with a green circle. This is attributed to the inappropriate values of the hyperparameters caused by inadequate modeling of the noise, i.e. modeling the quasi-coherent noise with a random Gaussian white noise. Figure 2(b) shows the abrupt change more clearly, and we argue that it is not a part of the noise since such a similar feature is also seen from other channels of the AXUV arrays. Note that unlike the synthetic data we do not know what the true noise-free signal is; therefore, to provide a safe margin we use the five times the predicted uncertainty compared to what is discussed in section 3.1.
In order to inspect situations where one might already have knowledge about the hyperparameters of the GP kernel function representing the signal due to a priori information, but where noise information is not properly provided, let us build the second model as follows. We, again, let Σ = Σ GW by assuming the Gaussian white noise, but with different values of the hyperparameters, that is σ f = 5.6 × 10 −2 V and λ t = 4.6 × 10 −5 s which are obtained from equation (10) by setting the quasi-coherent noise model properly, i.e. letting Σ = Σ Rϵ . We see that the length scale λ t is shorter by a factor of 3.4 compared to the first model, thus we expect that this model may be able to capture the abrupt change. Figure 10(b) is the result from the second model, and the model, indeed, is able to follow the abrupt change even with the incorrect covariance matrix Σ for the noise. Nonetheless, this is, in fact, an irrational model because it is logically and mathematically inconsistent in a sense that the covariance matrix Σ for the noise is different between for finding the hyperparameters and for finding the predicted mean (denoised signal) and its associated uncertainty, i.e. Σ = Σ Rϵ for the former and Σ = Σ GW for the latter. Such inconsistency has resulted in a rather large uncertainty as shown in figure 10(b).
Finally, the third model incorporates all the available information in a consistent way by letting Σ = Σ Rϵ for finding the hyperparameters, the predicted mean (denoised signal) and the uncertainty, whose result is shown in figure 10(c). We see that this model predicts the abrupt change and closely matches the measured noisy data with a relatively smaller uncertainty compared to the first and second models. As we have discussed in section 3.1, the uncertainty prediction is reliable; thus, it is beneficial to have such a consistent (with the measurements) constraint for performing a tomographic reconstruction based on the data measured by the AXUV array.
Lastly, we present auto-power spectra, in figure 11, of the measured noisy data (blue) and the denoised signals predicted by the aforementioned three different GP models in orange, green and red lines corresponding to the signals from Figure 11. Auto-power spectra of the measured noisy data (blue) and the denoised signals using the three different GP models in orange, green and red lines corresponding to the signals from figures 10(a)-(c), respectively. figures 10(a)-(c), respectively. Inspecting the auto-power spectrum of the noisy data (blue), one may hastily apply a lowpass filter with the cut-off frequency around 4-5 kHz which may generate a similar result to that with an assumption of the Gaussian white noise (orange line). We know that this fails to capture the abrupt change as in figure 10(a). With the proper hyperparameters, we see that higher frequency components are preserved as in green and red lines, while the red line, which is the fully consistent model, endeavors further into the highest allowed (by the GP model) frequency components.
Evaluation of the GP-based denoising method applied to the real measured data demonstrates that it is capable of suppressing the quasi-coherent noise without a loss of highfrequency information up to where the specified GP model allows. With the GP model consistently constructed based only on measured data, we can extract more information from the noisy data compared to frequency-domain filtering schemes and attain uncertainty of the denoised signal.

Conclusion
We have proposed and developed a new innovative noise suppression method by utilizing GP. While the developed GPbased denoising method can be applied to any kinds of noises, it is particularly effective and useful when nature of the noise is quasi-coherent. Such a noise is typically associated with highfrequency switching power supplies used for in-vessel control coils installed in superconducting magnetic confinement devices to control undesirable abrupt plasma instabilities.
Unlike a white noise, a quasi-coherent noise is difficult to be removed with a conventional frequency-domain filtering scheme. We have shown, using noisy synthetic data generated with various challenging benchmark signals, that the developed GP-based denoising method is superior to the filtering schemes in that our method suppresses the noise up to the level where the filtering schemes can achieve the best only if the true noise-free signal is known which is never possible with experimentally measured noisy data. Furthermore, our method also quantifies reliable and consistent (with the measurement) uncertainty of the denoised signal, which is another advantage as many of other denoising methods do not provide such information.
Our method is based solely on the measured noisy data without any assumptions except that the noise-free signal and the noise follow GP. This is yet another advantage because it means that no prior knowledge on the measurements, such as a bandwidth of the noise-free signal and external electromagnetic interferences, is necessary for our method. Such information are all consistently obtained from the measurement itself.
The developed GP-based denoising method basically consists of two important covariance matrices, where one is for characterizing the noise-free signal and the other is for the noise. The hyperparameters associated with the SE kernel function to construct the covariance matrix describing the GP model of the noise-free signal are found using Bayesian model selection based on Occam's Razor. For the covariance matrix characterizing the GP model of the noise, we utilize the auto-covariance function calculated based on the noise measurements, which means that our method is applicable to any kinds of noises as long as noise measurements are available. Using the auto-covariance function requires a condition that the noise is a stationary GP. Most of measured noises satisfy the condition, and, in fact, if they are not stationary GPs, then they may be not noises.
Application of our method to experimentally measured data from KSTAR promises benefits of our method. For instance, much of the knowledge in physics about fusion plasmas is obtained by solving ill-conditioned and/or ill-posed problems. A tomographic reconstruction for the poloidal radiation map using bolometers and a plasma equilibrium reconstruction with various magnetic probes are typical examples. Such problems heavily rely on the measurement constraints which means that fair and objective predictions of the noise-free signal as well as reliable and consistent uncertainty are critical, and our method is capable of doing so. Furthermore, there exists a rapid increase of interests on the integrated data analysis (IDA) based on Bayes' theorem in fusion communities, and using our method as a pre-processor can enhance IDA performance. Our method can also support validating results from numerical simulations more judiciously against measurements.