Data-driven modeling of noise time series with convolutional generative adversarial networks

Random noise arising from physical processes is an inherent characteristic of measurements and a limiting factor for most signal processing and data analysis tasks. Given the recent interest in generative adversarial networks (GANs) for data-driven modeling, it is important to determine to what extent GANs can faithfully reproduce noise in target data sets. In this paper, we present an empirical investigation that aims to shed light on this issue for time series. Namely, we assess two general-purpose GANs for time series that are based on the popular deep convolutional GAN architecture, a direct time-series model and an image-based model that uses a short-time Fourier transform data representation. The GAN models are trained and quantitatively evaluated using distributions of simulated noise time series with known ground-truth parameters. Target time series distributions include a broad range of noise types commonly encountered in physical measurements, electronics, and communication systems: band-limited thermal noise, power law noise, shot noise, and impulsive noise. We find that GANs are capable of learning many noise types, although they predictably struggle when the GAN architecture is not well suited to some aspects of the noise, e.g. impulsive time-series with extreme outliers. Our findings provide insights into the capabilities and potential limitations of current approaches to time-series GANs and highlight areas for further research. In addition, our battery of tests provides a useful benchmark to aid the development of deep generative models for time series.


I. INTRODUCTION
Noise, commonly defined as an unwanted, irregular disturbance, is a fundamental aspect of all real-world signals that arises from a multitude of natural and man-made sources.Furthermore, the unpredictable, stochastic nature of noise makes it a significant impediment to measurement, data analysis, and signal processing.Efforts to understand, mitigate, and harness the effects of noise over the last century have led to the extensive development of many physical and mathematical models, e.g., see [1]- [4] for overviews.
With recent advances in computational hardware and measurement equipment, it is now possible to collect, store, process, and analyze much larger quantities of data than previously possible.Consequently, flexible, data-driven methods for signal modeling and processing are increasingly becoming feasible in many areas of science and engineering [5].One form of datadriven modeling that has rapidly progressed in recent years is deep generative modeling, a type of unsupervised machine learning that uses deep neural networks to implicitly represent complicated, high-dimensional data distributions defined by a target training set [6]- [9].Deep generative models, and most notably, generative adversarial networks (GANs), have been used to successfully synthesize highly-realistic images, audio, video, and text [10]- [13].Moreover, because deep generative models can learn unknown, unstructured high-dimensional target distributions, they represent a potentially powerful class of methods for many data analysis and signal processing problems [6]- [9].
Since noise is a key component of realistic signals, and given the flourishing interest in GANs in particular, it is important to ask: To what extent are GANs capable of learning various noise types?Answering this question provides insight into potential applications and limitations of GANs and related generative models.
Prior related works includes the application of GANs to image denoising [14]- [18], image noise adaption [19], [20], image texture synthesis [21]- [23], and underwater acoustic noise modeling [24].Although the aforementioned investigations provide some evidence that GANs can learn noise and related distributions, they are limited to particular classes of noise and domainspecific applications.Further, because most prior related studies focus on images, they give little insight into time series, which are of primary interest in many domains.
The literature on GANs for time series has predominantly focused on audio applications.Many time-series GAN models leverage prior work on GANs for images by training the generator to produce an image-domain, time-frequency representation, such as a spectrogram, which is then mapped into a time series, e.g., [25]- [30].Additionally, there has been some work on GANs that directly model time series, e.g., using recurrent neural networks [31], [32], or convolutional neural networks [25], [33].
In the present work, we empirically investigate the ability of general-purpose GANs for time series to learn noise modeled as a real-valued, discrete-time random process.Namely, as outlined in Section II, we examine four wide-ranging classes of noise commonly encountered in physical measurement, electronics, and communication: band-limited thermal noise, power law noise, shot noise, and impulsive noise.Within each noise class, we consider multiple random process models over a broad range of parameter values.The mathematical noise models that we consider include stationary, nonstationary, Gaussian, non-Gaussian, and long-memory random processes.
Our evaluations focus on two complementary GAN models for time series based on the popular deep convolutional generative adversarial network (DCGAN) [34] architecture: a direct time-series model, WaveGAN [25], and an image-domain model that uses a complex-valued, short-time Fourier transform (STFT) representation of the time-series [28], [30].Details are provided in Section III.The GAN architectures were selected for their general-purpose nature, relative simplicity, and straightforward implementation.A prior investigation assessed the effectiveness of these GAN models for synthetic baseband communication signals in the presence of additive white noise and signal distortions arising from stochastic communication channels [35].
Given the extraordinary number and breadth of noise models [1]- [4] and GAN architectures [10]- [13], it is not feasible to examine all possibilities, and our investigations are necessarily limited in scope.In particular, we do not aim to comprehensively evaluate all published GAN models for time series or to propose a single GAN architecture that works optimally for all noise types.Instead, our goal is to assess the effectiveness of simple, general-purpose, convolutional GAN models for time series.Nonetheless, to our knowledge, this investigation is the most extensive appraisal of GAN performance across a wide range of noise models thus far.
Our empirical studies yield new insights into the capabilities and potential limitations of current approaches to time-series GANs and highlight areas for further research.In addition, our battery of tests provides a useful benchmark to aid future developments.Python software implementing our experiments and evaluations as well as training datasets and results are publicly available [36], [37].

II. NOISE MODELS AND SIMULATION METHODS
In this section, we review the classical noise models and the simulation methods used to generate target distributions for our experiments.Specific parameter choices for our synthetic noise data sets are given in Section VI.
The mathematical models presented here were selected because they cover disparate, well-known noise types and because there are accurate, computationally efficient methods for simulation and parameter estimation.This set of noise models is not comprehensive, and descriptions of additional types of noise can be found in [1]- [4].For simplicity, we focus on real-valued random processes.
Below, using standard notation, we denote the set of real numbers as R and the set of integers as Z.All simulated time series were 4096 samples long, which provided a good balance between realism and computational complexity.Throughout, unitless quantities, e.g., time, are used.

A. Band-Limited Thermal Noise
Thermal noise, also called Johnson-Nyquist noise, arises from the thermal motion of charge carriers inside an electrical conductor [1]- [3].Thermal noise is commonly modeled as a zero-mean white process, i.e., a sequence of independent, identically distributed (i.i.d.) random variables with zero mean and finite variance [4], [38].In the case of radio frequency electronics, thermal noise is band-limited by system components.For this reason, band-limited (or filtered) thermal noise is of interest in many contexts [4], [39].
To simulate band-limited thermal noise, we first generated a white standard normal sequence and then filtered it with a digital bandpass filter.Specifically, we applied a 40th order digital Butterworth filter, implemented using cascaded secondorder sections and zero-phase filtering [40].Frequency responses for the eight bandpass filters used to generate our target distributions are shown in Figure 1.

B. Power Law Noise
Power law noise, also called colored, fractional, or fractal noise, arises in electronics as well as a diverse array of other physical phenomena [1]- [3], [41].Power law noise noise is characterized by a power spectral density (PSD), S(f ), that is proportional to a power of frequency, f , at low frequencies, i.e., S(f ) ∝ |f | η , where η is a real number.Specific integer powers are often associated with "colors of noise", e.g., random processes with η = −1, 0, and 1 are called pink, white, and blue noise, respectively [42,Ch. 3].When η is near −1, the process is also known as 1/f noise, flicker noise, or excess noise [2], [41].
We consider two well-known mathematical models for power law noise: fractional Gaussian noise (FGN) and fractional Brownian motion (FBM) [43].FGN can be interpreted as a generalization of discrete-time white Gaussian noise, and FBM can be interpreted as a generalization of the continuous-time Brownian motion (or Wiener) process [43].The above models arise in the study of self-similar (or fractal) processes, as well as so-called long memory (or persistent) processes [38], [44], [45].
To synthesize discrete-time FGN and FBM, we implemented the exact approach of Perrin et al. [47], which utilizes the fast circulant embedding method [48] to generate FGN and applies cumulative summation to obtain discrete-time FBM.All FGN simulations set σ 2 Y = 1.Examples of synthetic FBM time series are shown in Figure 2 (Left) for H = 0.2, 0.5, and 0.8.It can be seen that as the Hurst index increases, FBM tends to deviate further from the origin.

C. Shot Noise
Shot noise, also called Poisson noise or photon noise, arises from the random arrival of discrete charge carriers in electronics and photons in optics [1]- [4].Shot noise can be modeled using a filtered Poisson process of the form Pulse Type p(t) where N (t), the number of events in the interval (0, t], is a homogeneous Poisson point process with event rate ν and event times, {τ n } [3], [4], [49]- [51].If N (t) = 0, then the sum is taken to be zero.Above, p(t) is a deterministic pulse function, and the pulse amplitudes, {A n }, are independent, identically distributed, and independent of N (t).For a finite time interval of length T , the number of events, N is Poisson distributed with mean νT , and the event times, {τ 1 , τ 2 , . . ., τ N }, are uniformly distributed on the interval [49, p. 140].Following Theodorsen et al. [51], we assumed that the pulse amplitudes follow an exponential distribution with mean β.We considered two pulse functions, one-sided exponential and Gaussian, taken from Howard [4, p. 506].Table I summarizes the pulse functions, where u(t) denotes the unit step function equal to one for t ≥ 0 and zero otherwise, and σ d is a pulse duration parameter.Table I also lists the integrals 2 dt of each pulse function, which are used for the event rate estimator introduced in Section V-C.
For a finite time interval, (0, T ], the mean and autocovariance of the shot noise process are time-dependent, approaching steady-state values as t, T → ∞ [4].Therefore, to approximate a weak-sense stationary discrete-time shot noise process, we generated a length 2L process of duration T = (2L − 1)∆ t ≫ σ d and then discarded the first L samples.Namely, defining a discrete-time grid t m = m∆t, for m = 0, 1, . . ., 2L − 1, we drew N from a Poisson distribution with mean νT , where T = (2L − 1)∆t.Next, we randomly generated N integers {m 1 , m 2 , . . ., m N } from a discrete uniform distribution on [0, 2L − 1] and drew {A 1 , A 2 , . . ., A N } independently from an exponential distribution with mean β.Then, we formed the impulse sequence where δ m,mn is a Kronecker delta function, and performed the discrete convolution of f [m] with the sampled pulse function, p(t m ), retaining the 2L samples in the middle of the convolution result.Last, we discarded the first L samples to remove any transients and approximate a steady-state realization of a length L discrete-time shot noise process.The validity of the steady-state simulated shot noise time series was verified by checking that there was close agreement between the empirical autocovariance function and the theoretical asymptotic autocovariance function [4].For all shot noise simulations, we set

D. Impulsive Noise
Impulsive noise, consisting of random, large bursts of short duration arising from either naturally occurring or man-made sources, is a limiting factor for many communication scenarios [52]- [55], including wireless [56], [57], digital subscriber line [58], [59], power line [60], [61], and undersea acoustic environments [62], [63].Many models for impulsive noise have been developed; see Shongwe et al. [64] for an overview.We focused on two well-studied impulsive noise models that were straightforward to implement and evaluate: the Bernoulli-Gaussian and symmetric alpha-stable models, described below.These models both define non-Gaussian, memoryless, white processes with a power spectrum that is constant across all frequencies.Impulse noise models with memory have also been proposed, e.g., see [61], [64], but such models are outside the scope of the present study.
A simple impulsive noise model that has been applied in many contexts is the Bernoulli-Gaussian (BG) model [52], [54], [55], [61], independently defined at each discrete time step as where N w and N i are independent, zero-mean, normal random variables with variances σ 2 w and σ 2 i , respectively, and B is a Bernoulli random variable with mean p, i.e., the probability that B = 1 is p, where p is called the impulse probability.Above, N w corresponds to a thermal noise background and N i is intermittent impulsive noise.The probability density function (PDF) for X BG is the Gaussian mixture where N (x; µ, σ 2 ) denotes the PDF for a normal distribution with mean µ and variance σ 2 .We simulated independent BG noise at each time step using equation ( 5  Another popular model for impulsive noise is the symmetric α-stable (SαS) family of distributions, a subclass of the stable (a.k.a.Levy α-stable) family of distributions, which are used to model heavy-tailed, non-Gaussian phenomena [53], [65]- [68].The PDF of a SαS distribution can be succinctly expressed in terms of its characteristic function as where is the characteristic exponent, γ > 0 is the scale parameter, and δ ∈ R is the location parameter. 1 SαS distribution is said to be "standard" if δ = 0 and γ = 1.The special cases α = 1 and α = 2 correspond to Cauchy and normal distributions, respectively.As α decreases, the PDF has a sharper peak and the tails become heavier [65], [66].
We considered discrete time SαS processes where the value at each time step is an independent standard SαS random variable with parameter α.To simulate standard SαS variates, we used the 'pylevy' Python module [69], which implements a method of Chambers et al. [70] for generating stable random variables; see also [65].Example time series are shown in Figure 3 (Right) for three values of α.Corresponding PDFs are plotted on a logarithmic scale on the right side of Figure 4.
Comparing the example time series plots in Figure 3, we see that the range of BG noise is fairly consistent across different impulse probabilities, p.On the other hand, the range of SαS noise varies by several orders of magnitude for different values of the characteristic exponent, α.These observations are consistent with the corresponding PDFs shown in Figure 4. Namely, because BG noise is a mixture of two Gaussian distributions, BG noise has rapidly decaying "light" tails, whereas SαS noise has slowly decaying "heavy" tails with a higher probability of extreme values [71].

III. GAN MODELS
We implemented two CNN-based GAN models for our experiments that are based on the widely used DCGAN model [34]: a 1-D convolutional model trained directly on time series, called WaveGAN [25] and a 2-D convolutional model trained on the complex-valued STFT.Both models were designed to generate time series of length 4096.Details on these models are given below.We start with a brief introduction to GANs.

A. Basic GAN Theory
Since the introduction of GANs 2014, research on GANs and related deep generative modeling frameworks has developed quickly and spawned a large literature.For reviews, see [6]- [13].
Given a training set drawn from a high-dimensional target distribution, p d , e.g., consisting of images or time series, the basic idea of a GAN is to train two deep neural networks, a generator network, G, and a discriminator network, D, together dynamically.The generator generates samples from a generator distribution, p g , where the aim is to match the target distribution.The discriminator seeks to assess the realism of generated samples, i.e., determine if samples are 'real' or 'fake.'The generator is fed a random vector, z, drawn from a specified latent distribution, e.g., multivariate uniform, which it maps to a generated sample, G(z).The discriminator maps sample data, x, to the probability that the sample belongs to the target distribution, D(x).
The generator and discriminator networks are typically trained using a backpropagation implementation of stochastic gradient descent with a specified loss, or objective function [72].Many different approaches to training GANs have been investigated to avoid failure modes such as inadequate convergence and mode collapse, where the generator output in insufficiently diverse.The GAN models that we investigated were trained with the widely-used Wasserstein GAN loss with gradient penalty [73], which seeks to minimize the Wasserstein distance between the generated distribution and the target distribution [74].Specifically, training aimed to minimize the objective function where Here, E[•] denotes mathematical expectation, a tilde (∼) indicates that a random variable follows a specified distribution, and U [0, 1] denotes the uniform distribution over the unit interval.Following the implementation recommendations of Gulrajani et al. [73], we set the gradient penalty coefficient, λ, equal to 10.Additional training details are provided in Section IV.

B. WaveGAN
WaveGAN [25] is a direct time series GAN designed for audio generation based on a 1-D flattened version of the 2-D DCGAN model [34].Tables II and III outline our implementation of the WaveGAN generator and discriminator, respectively.In these tables, Dense, Conv 1-D and Transpose Conv 1-D, denote dense fully connected layers, one-dimensional convolutional layers, and transposed convolutional layers, respectively.Also, Tanh, ReLU, and LReLU indicate hyperbolic-tangent (Tanh), rectified linear unit (ReLU), and leaky rectified linear unit (LReLU) activation functions.The filter dimensions for convolutional layers correspond to kernel length, number of input channels, and number of output channels, respectively.Similarly, the filter dimensions for the dense layers correspond to input length and output length, respectively.The first output shape dimension, n, denotes the batch size.Compared to the original WaveGAN model, which was designed to produce time series of length 16 384 our only modification was to change the dense layer to support the 4096 length of our synthetic noise waveforms.
In the discriminator, WaveGAN includes an additional operation, called "phase shuffle," consisting of a random circular shift on the activation output of each convolutional layer.Our implementation applied a random circular shift between −2 and 2 time steps, as recommended by Donahue et al. [25].
The discrete STFT for a real-valued time series is calculated by dividing the time series into shorter segments of equal length, multiplying by a window function, and then calculating the one-sided discrete Fourier transform (DFT) on each segment [75], [76].Unless stated otherwise, we used a Hann window of length 128 with 50% segment overlap, which for a 4096 length time series produces a (one-sided) STFT with dimensions of 65 × 65.In this case, the constant-overlap-add (COLA) constraint is satisfied, and the STFT can be inverted to obtain a time series of the original length [75].
Tables IV and V outline the architectures for the STFT-GAN generator and discriminator, respectively, which are composed of five 2-D convolutional layers with 5 × 5 kernels.The notation in the tables is similar to that used previously, with Conv 2-D and Transpose Conv 2-D indicating two-dimensional convolutional and transposed convolutional layers, and n denoting the batch size.Because the discrete STFT is complex-valued, each STFT entry requires two channels, corresponding to the real and imaginary parts, respectively.
Consistent with the original Wasserstein GAN implementation [74], WaveGAN was trained with an imbalanced discriminatorgenerator update rule, where the discriminator weights were updated five times for each generator update.In contrast, STFT-GAN was trained with a balanced discriminator-generator update rule, where the discriminator weights were updated once for each generator update.The balanced update rule for STFT-GAN was selected based on limited tests carried out for a prior study [35], where we found that balanced updates yielded improved convergence for STFT-GAN.
Each model was trained with a target data set of size 2 14 = 16 384, for 500 epochs with a batch size of 128.These parameter values were found to be sufficient for convergent training across all experiments.The data accompanying this paper [37] include GAN training history files as well as plots of GAN loss and discriminator output during training.
Prior to training, target distribution training sets were scaled using feature min-max scaling, which scales each feature, i.e., time sample or pixel value2 , to the interval [−1, 1], the range of the hyperbolic tangent output activation of the generator.Specifically, minimum and maximum values of each feature were estimated over the training set of size 16 384.Because the generator's output activation is a hyperbolic tangent function, the raw generated data was in the range [-1, 1].Raw generated data was rescaled using the inverse feature min-max transformation with the minimum and maximum values estimated from the training set.Therefore, the range of generated data was restricted to the range of the training dataset.

B. Quantile Data Transformation for Impulsive Noise
As we will see later, the impulsive noise types were particularly challenging for our baseline GAN models.Consequently, for the impulsive noise types, we also investigated replacing the feature min-max scaling of the target data with a quantile transformation [78, Sec.7.4.1]applied independently to each channel to make the data approximately follow a standard normal distribution.The motivations for this transformation were twofold: (1) it ensured that the distribution of each channel was unimodal with "light" tails [71], and (2) it effectively limited the impact of outliers.
We implemented the quantile transformation using the "quantile transform" method in the scikit-learn Python library [79].This method is based on the formula Y = F −1 Y (F X (X)), where X is an input random variable with continuous cumulative distribution function (CDF) F X (x), and Y is an output random variable with desired continuous CDF F Y (y).In our case, F Y (y) is the CDF for a standard normal distribution.The transformation formula follows from the fact that the random variable F X (X) has a uniform distribution on the interval [0, 1] [78, Sec.7.4.1].In practice, to apply this method to a sample of X, F X is replaced by the empirical CDF.
The quantile transformation for a given training set was estimated using 1024 uniformly-spaced quantiles for each target distribution channel.Any data values exceeding the most extreme quantiles were clipped to those values.For WaveGAN, the transformation was fit directly to the time series values, whereas for STFT-GAN, the transformation was fit on the real and imaginary channels of the target STFT distribution separately.For both models, a scaled-tanh activation was used at the end of the generator to limit the absolute-maximum value of generated data to the absolute maximum of the target quantile-transformed distribution.Finally, the inverse quantile transformation was applied to each channel of the generated data to return it to the original range.
While the quantile transformation method is included in the commonly used scikit-learn Python library, to our knowledge, it has not been previously examined as a preprocessing step for GAN training.

V. EVALUATION METHODS
Performance evaluation of generative models, and GANs in particular, is a difficult problem and an active research area.Recent developments are summarized in two review papers by Borji [80], [81].Two important aspects of generative model quality are fidelity, i.e., the degree of realism in generated samples, and diversity, i.e., how well generated samples capture the full range of variation of the target distribution [80], [82].
We assessed fidelity and diversity using general-purpose metrics introduced by Naeem et al. [82], described below.In addition, we further evaluated generative fidelity in terms of median power spectral density (PSD) and characteristic parameters for each noise type.Evaluations for each noise type were conducted using test sets of size 4096 from the target and generated time series distributions.In particular, the target distribution test sets were synthesized independently from the training sets.

A. Density and Coverage Metrics
In an effort to to address shortcomings of other evaluation measures, Naeem et al. [82] proposed general-purpose metrics named density and coverage to assess generative model fidelity and diversity, respectively.
Suppose that a suitable metric space for the data is identified, and denote test samples of real (target) data as X 1 , X 2 , . . ., X N and fake (generated) data as Y 1 , Y 2 , . . ., Y M .For a given real data sample, X i , let NND k (X i ) be the distance from X i to the kth nearest neighbor among the real data sample excluding itself, and let B(x, r) denote the ball centered at x with radius r.Also, let I[S] be the indicator function that equals one if the proposition S is true and zero otherwise.
For a given fake sample, Y j , Naeem at al. [82] define density as the expected number of real sample neighborhoods that contain Y j divided by the expected number of such neighborhoods when the target and generated distributions are the same.Namely, for a given test sample, Naeem at al. propose the estimator3 density = 1 kM where division by kM ensures that E[ density] = 1 when the real and fake distributions are the same [82, Lemma 1].Note that while density is always greater than or equal to zero, it may be greater than one, depending on the density of real data around the fake data.Density values close to one indicate excellent generative model fidelity.On the other hand, values near zero indicate poor fidelity.Naeem et al. [82] do not comment on how to interpret density values much larger than one, so additional assessments of generative fidelity are likely needed in that circumstance.
To evaluate generative diversity, Naeem et al. [82] define coverage as the fraction of real samples whose neighborhoods contain at least one fake sample.For a given test sample, Naeem at al. estimate coverage as Because coverage is essentially the probability that a real sample is "close" to a fake sample, it is bounded between zero and one.
Moreover, they propose that the hyperparameter, k, should be selected to ensure that the expected value of the coverage estimator is close to one when the real and fake distributions are the same.In our evaluations, we used test sets of size M = N = 4096 and implemented the above density and coverage estimators with k = 10, implying that E[ coverage] ≈ 0.999 when the target and generated distributions are identical.Application of the above density and coverage metrics requires defining a suitable measure of distance between data points.We chose to use a normalized version of dynamic time warping (DTW) distance [83], [84], a widely-used general-purpose distance measure for time series indexing, classification and clustering [84]- [86] that is considered a "standard" elastic distance measure in the data mining community [86].To compute DTW distances between time series, we used the "fast" methods from the 'dtaidistance' python package [87], setting the maximal warping window size to 32.The window size parameter was selected based on computational feasibility considerations and limited preliminary experiments.To obtain a robust distance measure that was insensitive to data scaling, each time series was first normalized by its maximum absolute value prior to DTW estimation.Estimated DTW distances were then normalized by the window size to ensure values between 0 and 1.
Normalized DTW distances were computed as described above between each target time series as well as between each target and generated time series in the test sets of size 4096.Subsequently, density and coverage were estimated using Eq. ( 9) and Eq. ( 10), respectively, with k = 10.
Approximate 95% confidence intervals for the density metric were estimated using the percentile bootstrap method [88], where bootstrap resampling with replacement was performed over the generated test sample 10 000 times.Limited experiments indicated that additionally bootstrapping over the target distribution sample resulted in bootstrap estimates that were uniformly lower than the original point estimate, so bootstrap resampling was therefore restricted to the generated sample only.Approximate 95% confidence intervals for the coverage metric were estimated using the classical Wilson score method for a binomial proportion [89].
To our knowledge, the combination of the density and coverage metrics above with DTW distance for time series, as well as the procedures for confidence intervals, have not been previously proposed and are novel.

B. Power Spectral Density
The median PSD for each test set was estimated with the multitaper method, a versatile nonparametric approach [90], [91].Specifically, we used the implementation in the Python 'Spectrum' package [92] with the time half-bandwidth parameter set to N W = 4, the first k = 7 Slepian sequences, the FFT length set to 4096, and the fast 'eigen' method for result weighting.These parameter choices are typical and were found to yield consistent results.After applying the multitaper method to estimate the PSD for each time series in the test set, we calculated the median value in each frequency bin.The uncertainties in the median PSD estimate across the test set were negligible in the context of our evaluations.This procedure was carried out on both target and generated distributions across all noise types.
Denote the one-sided median PSDs for the target and generated distributions as P t (f d ) and P g (f d ), respectively, where f d ∈ [0, 0.5] is normalized digital frequency with units of cycles per sample.To evaluate the faithfulness of P g relative to P t , we used a one-sided version of Georgiou's "geodesic distance" for power spectra [93], defined as4 d g (P g , P t ) = 0.5 In our evaluations, we used a natural logarithm, but the choice of logarithm is arbitrary.The geodesic distance can be interpreted as the length of a geodesic connecting points on a manifold of PSDs [94].Technically, d g is a pseudo-metric, because it is insensitive to scaling, i.e., d g (P g , P t ) = d g (P g , κP t ) for any κ > 0 [93].Because the first term depends on the difference of log-transformed power spectra, it reflects differences in areas of both low and high power spectral density.We estimated the geodisic PSD distance by approximating the above formula on a discrete frequency grid.

C. Noise Model Parameters
For each noise type, except for band-limited thermal noise, we assessed how well the generated time series distribution matched the target distribution in terms of characteristic noise parameters.Later, boxplots are used to compare distributions of estimated noise parameters for target and generated time series distributions.Boxplots of parameter estimates for target distributions with known ground truth characterize the inherent bias and variability of the estimators and hence provide a basis for assessing generated data.
For power law noise distributions, we evaluated the accuracy of the the Hurst index, H, using the well-studied "discrete variations" method [95]- [97] implemented with a second-order difference filter.
Under the assumption that the shot noise pulse amplitudes follow an exponential distribution, which is true for our target distributions, we assessed the shot noise event rate, ν, using the (apparently novel) estimator where μX and σ2 X are the estimated mean and variance of the shot noise time series, and where 2 dt are integrals of the known pulse function, p(t); see Table I.A derivation is given in the Appendix.For each of the impulsive noise models, we evaluated two characteristic parameters.Namely, for BG noise, we assessed the impulse probability, p, and the scale parameter ratio, θ = σ 2 w + σ 2 i /σ w , which measures the relative dispersion of the mixture components; see equation (6).The BG parameters were estimated by fitting a two-component Gaussian mixture model using the iterative expectation maximization method implemented in the scikit-learn Python library [79].To assess SαS noise, we estimated the characteristic exponent, α, and the scale parameter, γ, using the "fast" methods of Tsihrintzis and Nikias [98].

A. Band-limited Thermal Noise
The eight digital bandpass filters shown in Figure 1 were used to simulate eight target data sets of band-limited thermal noise, where each data set contained noise limited to a single band.DTW density and coverage results are plotted in Figure 5 and estimated geodesic PSD distance is plotted in Figure 6 (Left), where the bands are ordered in terms of increasing center frequency.It is evident that STFT-GAN yielded uniformly better density, coverage, and PSD fidelity than WaveGAN.Median estimated PSDs for band number 3 are shown in Figure 6 (Right); other bands are similar.We see that STFT-GAN more closely tracked the target PSD out-of-band, whereas WaveGAN suffered from a limited dynamic range.

B. Power Law Noise
The power law noise models from Section II-B were evaluated for target Hurst indices of H = 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, and 0.95.DTW density and coverage results are shown in Figure 7. PSD distance results and boxplots of estimated Hurst indices are given in Figure 8.For FGN, STFT-GAN performed well, achieving near-ideal density and coverage as well as excellent median PSD fidelity and Hurst indices.On the other hand, WaveGAN generally performed poorly on the density and coverage metrics, except for H = 0.95, and also exhibited inferior median PSD fidelity.
For FBM, in addition to the baseline 65 × 65 STFT size, we also tested an STFT dimension of 129 × 65, resulting from a window segment length of 256 with 75% overlap.We denote the baseline and modified models as STFT-GAN (65 × 65) and STFT-GAN (129×65), respectively.Examining DTW density and coverage results, performance was generally excellent except for the largest Hurst indices of 0.9 and 0.95, where all models exhibited a drop-off in DTW coverage, indicating inadequate sample diversity.In terms of median PSD distance and estimated Hurst indices, STFT-GAN (129 × 65), which had higher frequency resolution than the baseline model, achieved superior PSD and Hurst index fidelity over the full parameter range.Figure 9 compares the median PSDs for the H = 0.9 case, illustrating better PSD accuracy for STFT-GAN (129 × 65) at low frequencies.
Example target and generated time series for FBM with H = 0.5, which corresponds to the classical Brownian motion process, are plotted in Figure 10.Qualitatively, the example generated time series are consistent with a Brownian motion process.

C. Shot Noise
Target distributions defined by the shot noise model described in Section II-C were assessed with the two pulse types in Table I for event rates ν = 0.25, 0.5, 0.75, 1.0, 1.25, 1.5, 1.75, 2.0, 2.25, 2.5, 2.75, and 3.0.DTW density and coverage results are shown in Figure 11.PSD distance results and estimated event rate boxplots are given in Figure 12.
For shot noise with the one-sided exponential pulse type, WaveGAN exhibited very good DTW density and coverage, while STFT-GAN did poorly on those metrics.Both models had excellent median PSD fidelity and similar event rate performance.
On the other hand, for shot noise with the smoother Gaussian pulse type, STFT-GAN performed better overall than WaveGAN, with STFT-GAN exhibiting exhibiting excellent DTW density, coverage, and PSD distance.In this case, WaveGAN achieved worse fidelity as measured both by DTW density and PSD distance.
Figure 13 compares median PSDs for the two pulse types when the target event rate is ν = 1.These plots illustrate the inability of WaveGAN to recover the larger PSD dynamic range for the Gaussian pulse type.
Representative example target and generated time series for one-sided exponential shot noise for a target event rate of ν = 0.25 are plotted in Figure 14.From this figure, it can be seen that WaveGAN correctly learned to generate non-negative shot noise time series, while STFT-GAN generated time series with occasional small negative values.

D. Impulsive Noise
Last, the ability of the two GAN models to learn impulsive noise defined by the BG and SαS models described in Section II-D was evaluated.Specifically, BG noise with σ w = 0.1 and σ i = 1, i.e., a scale parameter ratio of θ = σ 2 w + σ 2 i /σ w ≈ 10.05, was assessed for impulse probabilities of p = 0.01, 0.05, 0.1, 0.15, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, and 0.9.Standard SαS noise with location and scale parameters equal to zero and one, respectively, was evaluated for characteristic exponents α = 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3, 1.4, and 1.5.As described in Section IV, the WaveGAN and STFT-GAN models were trained with the two different preprocessing schemes described in Section IV: (1) a baseline implementation using feature min-max scaling and (2) an implementation applying a quantile data transformation, which transforms each channel to an approximate standard normal distribution.
Figures 15 and 16 present the aggregate results for Bernoulli-Gaussian noise.Both GAN models performed poorly with feature min-max scaling, as seen from the DTW density and coverage plots as well as the estimated impulse impulse probability and scale parameter boxplots.In particular, DTW coverage indicates that both models experienced partial or full mode collapse for all but the largest target impulse probabilities, p.
WaveGAN clearly improved with the quantile data transformation, exhibiting excellent DTW coverage, although the DTW density metric was abnormally large. 5Also, WaveGAN accurately recovered the target impulse probability and scale ratio across most scenarios, except for the extreme p = 0.01 case, where the dispersion in scale ratio was very large.By contrast, the quantile data transformation did not appear to improve STFT-GAN performance.
Example target and generated time series for GANs with the quantile data transformation are shown in Figure 17 for the case of p = 0.05 BG noise.From these plots, it is evident that STFT-GAN failed to recover the correct background noise level relative to the impulsive component, while WaveGAN better matched the target distribution.Aggregate results for SαS noise are given in Figures 18 and 19.The GAN models with feature min-max scaling suffered from mode-collapse during training across all tests, as evidenced by the near-zero DTW coverage results.By contrast, the quantile data transformation preprocessing step enabled WaveGAN to avoid mode-collapse during training, whereas STFT-GAN still suffered from poor diversity, as measured by DTW coverage.In terms of the fidelity metrics, WaveGAN clearly outperformed STFT-GAN, although the dispersion in the characteristic exponent was unacceptably large.
Example target and generated time series for GANs trained with a quantile data transformation on SαS noise with α = 1.0 are shown in Figure 20.From these plots, we see that WaveGAN produced short-duration impulses, whereas STFT-GAN produced impulses that were not as localized in time.These observations are consistent with the PSD distance results.Further, both models often produced time series with maximum impulse amplitudes that were too large, supporting the finding that they did not consistently recover the target characteristic exponent.

Fig. 1 :
Fig.1: Frequency response of each digital filter used to simulate band-limited thermal noise, indexed 0 through 7.

Fig. 2 :
Fig. 2: Left: Example fractional Brownian motion time series with H = 0.2, 0.5, and 0.8 from top to bottom.Right: Example shot noise time series with one-sided exponential pulse type and event rates ν = 0.25, 0.5, and 2 from top to bottom.

and ∆t = 0. 1 .
Example synthetic shot noise time series with a one-sided exponential pulse function and event rates ν = 0.25, 0.5, and 2 are shown in Figure 2 (Right).
) with σ w = 0.1 and σ i = 1.Example time series are shown in Figure 3 (Left) for p = 0.01, 0.05, and 0.1.Corresponding PDFs are plotted on a logarithmic scale on the left side of Figure 4.

TABLE I :
Pulse functions used to simulate synthetic shot noise.

TABLE IV :
STFT-GAN generator architecture