Generative adversarial networks for data-scarce radiative heat transfer applications

Generative adversarial networks (GANs) are one of the most robust and versatile techniques in the field of generative artificial intelligence. In this work, we report on an application of GANs in the domain of synthetic spectral data generation for data-scarce radiative heat transfer applications, an area where their use has not been previously reported. We demonstrate the proposed approach by applying it to an illustrative problem within the realm of near-field radiative heat transfer involving a multilayered hyperbolic metamaterial. We find that a successful generation of spectral data requires two modifications to conventional GANs: (i) the introduction of Wasserstein GANs (WGANs) to avoid mode collapse, and, (ii) the conditioning of WGANs to obtain accurate labels for the generated data. We show that a simple feed-forward neural network (FFNN), when augmented with data generated by a CWGAN, enhances significantly its performance under conditions of limited data availability. In addition, we show that CWGANs can act as a surrogate model with improved performance in the low-data regime with respect to simple FFNNs. Overall, this work contributes to highlight the potential of generative machine learning algorithms in scientific applications beyond image generation and optimization.


Introduction
Machine learning, a rapidly expanding area within computer science, focuses on advancing the foundations and technology that enables machines to learn from data [1][2][3][4].Deep learning, on the other hand, is a subset of machine learning techniques that uses artificial neural networks (ANNs) to model and solve complex data-driven problems, such as image and speech recognition [5,6], natural language processing [7], and autonomous driving [8], among many others.With the current explosion of data and the rapid development of improved hardware and algorithms, machine learning and deep learning are becoming crucial tools in many industries, including healthcare, finance, and manufacturing.
Motivated by this success, machine learning and deep learning techniques are attracting increasing attention from a variety of scientific disciplines beyond computer science, revolutionizing traditional approaches to the modeling and analysis of data-driven scientific problems.In physics, these techniques are employed to tackle complex problems [9], including the representation of quantum many-body wave functions [10], the discovery and identification of phase transitions in condensed-matter systems [11][12][13], the solution of statistical problems [14], the development of novel quantum information technologies [15], the modeling of gravitational waves [16], and the design of nanophotonic devices with novel or improved functionalities [17][18][19][20].Machine learning algorithms have been effectively used for the accelerated discovery and design of new materials and molecules in the fields of materials science and chemistry [21,22], being instrumental in molecular dynamics simulations [23], in predicting chemical reactions [24], and in modeling the quantum mechanical energies of molecules [25].In the realm of biology, the applications of machine learning and deep learning are also vast, including breakthroughs in gene expression prediction tasks, prediction of micro-RNAs targets, and novel single-cell methods [26].Overall, despite significant challenges, such as the interpretability and transferability, machine learning and deep learning are playing an increasingly pivotal role in the advance of a broad variety of scientific research methodologies.
In this context, generative adversarial networks (GANs), a powerful subset of generative machine learning, have emerged as a versatile tool for creating new data instances that closely resemble a given training set [27].This innovative paradigm involves a two-player adversarial setup where a generative ANN strives to produce data instances that are indistinguishable from the training set, while a discriminative network attempts to distinguish between the instances generated by the generative network and the real data [28].The competing nature of these two networks drives the generative network to generate increasingly realistic data, pushing the boundaries of what is achievable with generative models.This technology has been instrumental across a broad range of scientific disciplines, including physics, chemistry, and biology.In physics, GANs have been used for simulating complex systems and predicting outcomes of experiments, with examples in high energy physics [29], condensed matter physics [30][31][32], nanophotonics [33,34], and cosmology [35].In the field of chemistry, GANs have been harnessed to generate novel chemical structures and predict their properties [36], thereby accelerating the process of drug discovery and materials design [37].In biology, GANs have been employed in a variety of tasks, including protein engineering [38] and generate biological imaging data [39].This myriad of applications demonstrate the significant potential of GANs in transforming scientific research by providing a powerful tool for hypothesis generation, experimental design, and data augmentation where empirical data is scarce or expensive to obtain.Despite their significant success in image generation, the use of GANs has been mostly limited to this area.It would be highly desirable to expand their application for the generation of scientific numerical data.In this regard, recent efforts have been devoted to adapt GANs for the generation of synthetic spectral data [40][41][42][43], but the reported methods are often not well suited for application in many scientific areas.
In this work, we introduce a novel application of GANs for synthetic spectral data generation for data-scarce radiative heat transfer applications.We particularly focus on an illustrative problem in the research area of near-field radiative heat transfer, involving a multilayered hyperbolic metamaterial.We explore the use of a Conditional Wasserstein GAN (CWGAN) for data augmentation and investigate its impact on the predictive capabilities of a feed-forward neural network (FFNN).We find that the successful production of spectral data requires two main changes to standard GANs.Firstly, the implementation of Wasserstein GANs (WGANs) is necessary to counteract mode collapse, and secondly, these WGANs need to be conditioned to yield accurate labels for the generated data.We demonstrate that a simple FFNN, when augmented with data produced by a CWGAN, notably improves its performance under conditions of data scarcity.Furthermore, we illustrate that CWGANs have the ability to serve as efficient surrogate model in low-data regimes.Overall, this research work contributes to highlighting generative AI algorithms' potential in applications extending beyond the conventional realm of image generation.
This work is organized as follows.In section 2, we review the fundamentals of the primary generative adversarial frameworks underlying this work.Section 3 discusses the basic principles of the particular physical problem we utilize to exemplify our approach.In section 4, we present and discuss the results obtained from implementing the generative adversarial method detailed in section 2 to the specific example problem outlined in section 3. Finally, in section 5 we sum up the conclusions of this work.

Generative adversarial route to synthetic spectral data generation
In this section, we review the basics of the three main generative adversarial frameworks that form the basis of this work, namely, GANs, WGANs and CWGANs.Here, we assume that the origin of the real spectra used in these approaches is completely general, coming from any physical, chemical or biological process (in the following sections we discuss the application to a particular problem within the realm of near-field radiative heat transfer).Figure 1 shows a schematic representation of the underlying architecture of a general GAN algorithm.It comprises two main parts: a generator network and a discriminator network (represented as orange and blue rectangles, respectively, in figure 1).The generator and the discriminator are two interconnected networks in the GAN system.The generator generally begins with random noise (z in figure 1) and uses it to create new data samples (new spectra in our application).On the other hand, the discriminator takes these spectra and calculates the likelihood that each one originates from the actual training data set.The two networks have competing objectives.The generator's goal is to create spectra that perfectly mirrors the distribution of the training data.In doing so, it aims at generating spectral data so convincingly authentic that the discriminator cannot tell it apart from the real training data.Meanwhile, the discriminator aims at distinguishing between the actual training data and the data fabricated by the generator.Both networks are trained together in a competition until a Nash-type equilibrium is reached and the training process ends [27].At that stage, the generator is producing spectra that the discriminator can no longer reliably classify as 'real' or 'fake' , signaling the end of the training process.The process begins with the combination of geometrical data (W), and a random vector (z), both of which are fed into the generator.The generator's role is to create a synthetic, or 'fake' , spectrum from these inputs.The generated spectrum is then introduced to the discriminator, along with the real spectra and the geometrical data.The discriminator's task is to discern between the fake and real spectra.The conclusions drawn by the discriminator are subsequently used to guide the training of both the generator and discriminator networks, enhancing their ability to generate realistic spectra and to distinguish between real and synthetic data, respectively.
Mathematically, the GAN configuration can be formulated as a minimization problem of the generator and discriminator loss functions, suitably written in terms of differentiable functions representing the discriminator and the generator network models, D(x; θ d ) and G(z; θ g ), respectively (θ d and θ g are the parameters of the corresponding ANNs-in what follows, for the sake of clarity, we do not include this dependence in the equations).We start by focusing on the loss function of the discriminator, L D (G, D), which can be written as [27] where E stands for expectation over either the training samples, x, or the input noise variables, z (characterized respectively by a probability distribution p data (x) and a prior distribution p z (z)).From equation (1), and considering that D outputs a single scalar representing a probability, we can see that the training of the discriminator is trying to minimize the likelihood of mistaking a real sample for a fake one or a fake sample for a real one (first and second terms, respectively, of the right-hand side of equation ( 1)).However, there are significant performance issues associated to this original choice of the loss function L D (G, D), including the difficulty to reach the above-mentioned Nash equilibrium state (each network is updated independently, and given the competitive nature of the generator and the discriminator, in general, there is no clear point to stop the training) or the so-called mode collapse (arising when a network fails to generalize accurately to all regions of the training data distribution).To address these issues, a different strategy to train GANs was introduced, the so-called WGANs [44].One of the main modifications of WGANs with respect to the original GAN architecture is the presence of a critic model, C(x), instead of the discriminator model.Importantly, C(x) outputs a score instead of a probability, which, in turn, allows us to define a new loss function for the critic, L C (G, C), given by ( This loss function summarizes well some of the main advantages of the WGANs.WGANs help address the training challenges of GANs by using the Wasserstein distance (or Earth Mover's distance) as the loss function instead of the Jensen-Shannon divergence used in original GANs.In addition, it provides a meaningful loss metric, i.e. the value of the critic in WGANs provides a meaningful measure of the distance between the real and generated data distributions.This is unlike the original GANs where the discriminator's output does not correlate well with the quality of the generated samples.A pivotal aspect of WGANs is the need for the Critic to operate within the set of the so-called 1-Lipschitz functions, a critical component of the model.Lipschitz functions are mathematical functions possessing a property where there exists a real-valued constant such that, for every pair of points, the absolute difference in function values can be bounded by this constant times the absolute difference of input values.When this constant is 1, the functions are known as 1-Lipschitz.The 1-Lipschitz constraint is crucial as it bounds how much a function's output can change with small variations in input, ensuring the function does not change too abruptly.In WGANs, this is vital to ensure that the Critic provides meaningful and stable gradients for the generator to learn from, facilitating a more reliable learning process.Originally, weight clipping was proposed to enforce this 1-Lipschitz condition.However, this sometimes resulted in convergence failure.To counter these issues, a more robust technique known as the gradient penalty was introduced.This method involves adding a loss term to maintain the L2 norm (a measure of the vector length of parameters or weights) of the Critic close to a value of 1 [45].This approach assists in keeping the Critic's function within the 1-Lipschitz constraint, enhancing the stability and performance of WGANs.Incorporating these improvements, the full loss for the critic in a WGAN now reads where || • || 2 is the L2 norm and λ is a weight parameter for the gradient penalty (throughout this work we assume λ = 10, and that x = x + α(G(z) − x) is an interpolated point between a real sample and generated sample on which to calculate the gradient, with α ∈ [0, 1]).As a final remark, proper training of the WGAN requires the critic to be trained ahead of the generator, so that for each training step of the generator the critic is updated n train times (following [44], we chose n train = 5 in all our models).
As for the generator loss function, L G (G, C), an important aspect to realize is that for synthetic spectral data generation our focus is on a regression problem.This implies that we need a greater control over the output than that obtained just by acquiring a random but realistic sample.It is therefore crucial to ensure that the generated data accurately corresponds to the correct system parameters that yield that response.To achieve this, we need to condition the WGAN, leading to the creation of a CWGAN [46].In this work, we will implement that by adding an extra loss term quantifying the mean absolute error (MAE) between the conditioned generated example and the training example corresponding to the ground truth of the condition.Accounting for this conditioning, L G (G, C) can be expressed as where x is the training example corresponding to the system parameters W, and G(z|W) is a generated example conditioned on the same parameters W. The first term in the r.h.s. of equation ( 4) corresponds to the above discussed conditioning procedure, while the second term is associated to the coupling of the generator and the critic.

Illustrative problem: near-field radiative heat transfer spectra in multilayer hyperbolic metamaterials
In this section we provide an overview of the fundamentals of the specific problem we use to illustrate the proposed approach.We have chosen a physical problem in the context of near-field radiative heat transfer involving multilayer hyperbolic metamaterials [47].Despite the specific character of this class of systems, the chosen problem can be considered both representative of the types of problems that our approach can address effectively and complex enough to showcase the versatility of our method.
In the realm of thermal radiation, a key breakthrough has been achieved with the experimental verification that the limit established by Stefan-Boltzmann's law for radiative heat transfer between two bodies can be significantly exceeded by reducing their distance to less than the thermal wavelength λ Th (∼10 µm at room temperature) [48].At such near-field distances, the heat transfer process is enhanced through evanescent waves (photon tunneling), an effect not encompassed by Stefan-Boltzmann's conventional framework.This mechanism governs the near-field radiative heat transfer (NFRHT) when the intervening separations are exceedingly narrow [49][50][51][52].To further augment NFRHT, numerous strategies have been introduced, with a particular emphasis on exploiting surface modes found within multilayered structures.The creation of hyperbolic metamaterials, achieved by alternating layers of dielectric and metallic materials, has garnered considerable attention [53][54][55][56][57][58][59][60][61][62].It has been demonstrated that hybridizing surface modes across metal-dielectric interfaces significantly enhances NFRHT, surpassing that observed with two parallel infinite plates [60].
The model system considered in this work is sketched in figure 2(a).It consists of two identical multilayered structures separated by a gap d 0 .Each structure comprises N layers, alternating between a metal with permittivity ϵ m and a lossless dielectric with permittivity ϵ d (ϵ d = 1 is assumed throughout this work).
The metal layers are characterized by a Drude model permittivity: , where ϵ ∞ represents the infinite frequency permittivity, ω p is the plasma frequency, and γ is the damping rate.We set the parameters ϵ ∞ = 1, ω p = 2.5 × 10 14 rad s −2 , and γ = 1 × 10 12 rad s −2 , which yield a surface plasmon frequency that closely matches the surface phonon polariton frequency observed at the interface between SiC and vacuum.The dimension of each layer, indexed as i, is represented by d i , which can vary within a predefined range (details to be provided subsequently).
To study the radiative heat transfer across the gap of the system, we adopt the analysis of [47,60] and use the well-established theory of fluctuational electrodynamics [63,64], with a particular emphasis on near-field effects.In such a regime, radiative heat transfer takes place primarily through TM-or p-polarized evanescent waves.The heat transfer coefficient (HTC) between the structures, representing the linear radiative thermal conductance per unit area, can be expressed as [65] where ω is the frequency and k is the magnitude of the wave vector parallel to the surface planes.Here, Θ(ω, T) = hω e hω/k B T −1 is the average thermal energy of a mode at frequency ω (T is the temperature).The transmission of p-polarized evanescent modes is given by [60] where the function r p (ω, k) represens the Fresnel reflection coefficient for p-polarized evanescent waves, transitioning from vacuum to a layer, and q 0 = √ k 2 − ω 2 /c 2 is the normal component of the wave vector in vacuum.The coefficients were numerically computed using a scattering matrix method [66].Although we considered the impact of s-polarized modes, they are negligible for the gap sizes investigated.
The significance of studying near-field radiative heat transfer (NFRHT) in these layered structures lies in the ability to tailor heat exchange by adjusting layer thicknesses.Specifically, in Drude metal parallel plates, NFRHT is predominantly influenced by two cavity surface modes [60], emerging from the hybridization of Table 1.Thicknesses combinations of the representative samples of the spectral heat transfer coefficient, hω, shown in figure 2(c dielectric) with a 10 nm vacuum gap at T = 300 K.This enhancement over a traditional bulk system is substantial across various gap distances, as detailed in [60].
Of special interest for this work is the spectral HTC, h ω , defined as the HTC per unit of frequency: h = ´∞ 0 h ω dω.To create the initial dataset of real spectra (which later on will be augmented by our generative adversarial approach), we apply the above-described theoretical framework to compute a total of 6561 h ω spectra.The thicknesses d i of each layer were varied between 5 and 20 nm, and every spectrum contains 200 frequency points in the range ω ∈ [0.3, 3] × 10 14 rad s -1 .As discussed in the next section, that dataset of spectra will be split in different proportions to become training and test sets.Figure 2(c) shows several representative samples of h ω spectra, corresponding to the following thicknesses combinations listed in table 1.
The spectra displayed in figure 2(c) show the broad variety of spectral features that can be obtained from the studied system (from double broad peaks with very narrow resonances in between, to single narrow peaks, or to two resonant peaks separated by a gap).This set of spectra essentially serves as a comprehensive blueprint for the whole adversarial approach, guiding it on the characteristics and features that should be exhibited in the synthetic data.Hence, this allows the proposed approach to capture a wider range of underlying patterns and relationships, which in turn allows it to generate a more realistic and diverse array of synthetic spectral data.
In this work we leverage the robust theoretical models of radiative heat transfer, which have been validated by experimental data across various studies (see, for instance [49][50][51][52]).The fidelity of these models provides a solid foundation for the datasets used to train our GANs, underscoring the reliability of our generated data.While we recognize the merit of experimental validation, our focus is on the application of GANs to address specific challenges within data-scarce scenarios in this field.

Results and discussion
We proceed in this section to report on the results obtained when applying the generative adversarial approach summarized in section 2 to the specific illustrative problem described in section 3. We begin by providing a strong quantitative justification of the necessity of using a trained CWGAN for this problem, instead of a plain CGAN (Conditional GAN-the details of the specific architecture used in each case are provided below).Figures 3(a) and (b) show, respectively, the ability to reproduce the training set of a trained CGAN and a trained CWGAN projecting the data on two dimensions via a principal component analysis (PCA) [67], which retains most of the training data structure due to its >90% reproduction rate (RR).
The PCA calculations shown in figure 3 were done as follows.First, we performed a singular value decomposition (SVD) of the covariance matrix Σ: [U, S, V] = SVD(Σ) with Σ = 1 m X T X [68].Here, U and V are unitary matrices, S is the singular value matrix, m is the total number of data examples and X is the data matrix, containing in each column one data example x.To obtain the reduced 2-dimensional (2D) representation of the data, we calculated a reduced matrix U reduced retaining the first 2 columns of the U matrix obtained from the SVD, and use it as the projection matrix, x = U T reduced X, where x is the 2-dimensional representation of the data.The RR of the whole 2D-PCA analysis is then defined as the ratio of the first 2 singular values over all the N singular values obtained: The Conditional GAN and the Conditional WGAN share the same architectural design, the specifics of which are detailed further on.Both networks underwent identical training conditions, with the complete spectral dataset partitioned into 80% for training and 20% for validation.The key distinguishing factor lies in their respective loss functions (L D (G, D) for CGAN and L C (G, C) for CWGAN).The CGAN employs equation ( 1) along with a generator loss which is the sum of the first term in the r.h.s. of equation ( 4) (the conditional term) and the second term in the r.h.s. of equation ( 1) with opposite sign.Meanwhile, the CWGAN utilizes the loss defined in equation ( 4) for the generator and equation (3) for the critic.Notably, as depicted in figure 3(a), the CGAN manifests evident signs of mode collapse, rendering it unable to reproduce most examples beyond the principal cluster structures in the PCA.In contrast, the CWGAN shows the capability of replicating most of the complexities within the training set.These results can be considered as a novel illustration, in the context of synthetic generation of spectral data, of the key role played by Wasserstein's loss function to create more robust generative adversarial approaches.
Figure 4 summarizes the specific CWGAN architecture we have found as most efficient for the studied problem.The generator (left side of figure 4) is composed of 4 hidden fully connected layers of an increasing number of neurons in each layer (we consider 50, 100, 150, and 200 neurons for the four layers-represented as green blocks, labeled as FC1-FC4, in figure 4).The generator model takes the condition as input (the 8 values for the thicknesses of the layers) and returns a generated h ω spectrum (sampled by 200 frequencies).Consistently with the results in [69,70], we found that we did not need to feed a random data distribution z into the generator for operation: 20% dropout layers between all layers of the generator (i.e.connections between two consecutive neurons are dropped with a 20% chance) provide enough variability for this task (the dropout operations are represented as orange blocks in figure 4).On the other hand, the critic (right side of figure 4) takes both a sample spectrum and the condition (either generated or from the training distribution) in parallel processing lines of a single hidden layer (with 150 and 50 neurons, respectively-FC5 and FC6 in figure 4).Then, it concatenates and processes the information through two additional hidden layers (FC7 and FC8 in figure 4, with 100 and 50 neurons, respectively) to output a single number, the score.All hidden layers feature a scaled exponential linear unit (selu) activation function.In all models discussed in this work, as a pre-processing step, we will also calculate the logarithm of the spectra, subtract the mean of both the input parameters and the spectra, and divide both the system parameters and the spectra by the standard deviation (this normalization operation is represented by grey blocks in figure 4).Finally in both the generator and the critic a final linear activation layer is added to ensure the output has the correct size (red blocks in figure 4).
Note that, in the development of our CWGAN tailored for multilayer hyperbolic metamaterials, particular attention has been paid to the physical viability of the generated spectra.The training dataset comprises theoretically sound validated spectra, ensuring a foundation rooted in physical reality.In addition, the CWGAN architecture is designed to integrate physical parameters, which govern the generation process It is constructed of four fully connected layers (FCs) (green blocks), dropout layers placed after each FC layer (orange blocks) and a final linear activation layer (red block).The critic section simultaneously accepts HTC spectra produced by the generator, as well as true spectra, and the geometrical parameters as inputs, providing a scalar score as output.The critic model consists of a unique FC layer for each parallel input branch followed by a concatenation layer that combines information from both input branches into a single vector.Afterwards, two more FC layers and a final linear activation layer produce the score.In both the generator and critic sections, normalization of the input data is applied as a preprocessing step (grey blocks).
to conform with the constraints of hyperbolic metamaterials, which further guarantees the generation of spectra that are physically viable.
In this context, we acknowledge various applications of GANs in spectral data generation across different domains.Thus, for instance, de Oliveira et al [43] indicated the limited utility of GANs compared to supervised generative networks in generating artificial gamma ray spectra, a finding that contrasts with our demonstration of GANs' efficiency in synthesizing spectral data for radiative heat transfer.Audebert et al [40] showcased the generation of annotated hyperspectral samples using GANs, yet their broad categorization approach differs from our precise conditioning using continuous variables.Teng et al [41] expanded a laser-induced breakdown spectroscopy spectral database with GANs without specific conditioning, unlike our method which integrates accurate geometrical parameters.Lastly, Zhu et al [42] employed boundary equilibrium generative adversarial networks for biological sample spectra synthesis, utilizing an autoencoder discriminator and lacking conditioning, a method not suited for our application.These studies collectively highlight the importance of domain-specific GAN adaptations, affirming that a method effective in one context may not translate directly to another due to fundamental differences in application scope, requirements and objectives.
We focus now on analyzing the evolution with the training steps of the loss functions of the discriminator and the generator (L C (G, C) and L G (G, C), respectively), as obtained by training the architecture displayed in  4)).As described above, for large values of the training step (≳10 4 ), this latter term necessarily follows its counterpart term in the critic loss function (so, as also pointed out above, their difference is maintained and a stationary value of L C (G, C) is reached).
Therefore, since we do not expect L G (G, C) to reach a stationary value, MAE becomes the key magnitude to monitor the quality of the learning process of the generator (note that that this is also in accordance with the overall goal of the proposed application: creating artificial spectral samples indistinguishable from the real ones).Indeed, as shown in inset of figure 5(b), the MAE computed for the studied problem converges to values <0.2 for the larger training step values considered in our calculations.
Next, we proceed to quantify our model's proficiency in reproducing the spectral heat transfer coefficient (HTC) of the studied physical system, based on the model architecture illustrated in figure 4. For this analysis, our focus is on two distinctive metrics, concentrating on two different aspects of the spectra.The first, the per-point relative mean error (L point ), gauges the model's capability to accurately represent each point in the spectrum through a relative error analysis.The second, the integral relative mean error (L integ ), is predominantly influenced by the model's accuracy in reproducing the primary spectral characteristics, notably the resonances.Their respective definitions are as follows: ´( where N is the number of spectra, m the number of points in each spectrum, y i is a data example and ŷi is the corresponding generated example by the model.Note that we undo all pre-processing operations to perform these calculations, and that all integrals are performed over frequency points.We start our analysis by establishing a baseline outcome for comparison using a simple FFNN, comprised of five hidden layers, each hosting 200 neurons, characterized by a selu activation function, and a final linear activation layer.An analogous network has been previously shown to effectively model this identical physical system given ample data [47] -consequently, we anticipate that this particular network will serve as a suitable benchmark.Another important aspect is the apparent similarities between the generator and the FFNN.We aim at demonstrating that this specific simple design, given a sufficiently large dataset, can accurately replicate the spectral characteristics inherent to the type of systems under study, without resorting to any data augmentation technique.To accomplish this, we split the original dataset, which consists of 6561 h ω spectra, allocating 80% for training set and the remaining 20% for the validation set.Our findings reveal that after 50 000 training iterations (epochs), and using the Adam optimization algorithm with a steady learning rate of 3 × 10 −4 , this simple neural network is proficient in replicating the system's spectra, reaching values of L point and L integ for the validation set of 3.61% and 1.45%, respectively (note that these results are comparable to those found in [47]).
Then we proceed to assess the CWGAN capability as numerical engine for spectral data augmentation.To do that, we follow a two-step approach.First, once the CWGAN has been trained, we use it to generate a number of additional spectra to include on the training set.Second, we retrain the above-described FFNN with this new data set to create an augmented FFNN.In this work, we generate a total of 10 000 new spectra for this data augmentation process, which are added to the training set.This particular amount of additional spectra was chosen after observing that our results are converged when adding to the original dataset a number of extra spectra in the range 5000-10 000.Once the generator has been trained, this augmentation of the original dataset is computationally inexpensive.Note also that once the generator has been trained, it can be isolated from the overall model and employed for inverse design tasks, adhering to the methodology delineated in our prior research [47].
Following this approach, first, we found that when using the training/validation split considered until now (80%-20%), the FFNN and the augmented FFNN have similar performance, both in terms of per-point relative mean accuracy and integral relative mean error (respective error values of approximately 3.6% and 1.5% are obtained for both models).This suggests that, as anticipated, using a sufficiently large dataset minimizes the impact of augmenting the original dataset, making any data augmentation barely noticeable.However, that conclusion changes when a different data scenario is considered.To modify the amount of data available to the models, we increasingly reduced the size of the training set by transferring part of it to the validation set, and retrained both the simple and augmented FFNNs from scratch (different values of the split training/validation were considered in the range 80%-20%-1%-99% -note that the data augmentation is done separately for each split ratio).
Figure 6 summarizes the main results of this analysis using the two error metrics considered in this work (panels (a) and (b) correspond to L integ and L point , respectively-blue dots (lines) display the results for the simple FFNN, whereas black dots (lines) correspond to the CWGAN-augmented FFNN).As seen, both in terms of L integ and L point , the simple FFNN and the augmented FFNN have similar performance when the validation set fraction is smaller than approximately 70% (we can therefore ascribe that interval to a sufficiently large training set scenario).However, in the case of L integ (figure 6(a)), further increasing of the validation set fraction (i.e.further reduction of the training set size) leads to the augmented FFNN to perform increasingly better with training data reduction than the simple FFNN .As observed, values of L integ = 13.2% are achieved using a simple FFNN in the limit case of a validation set fraction of 99%, whereas an augmented FFNN reduces that error close to a than a half (L integ = 6.8%).The performance improvement is not so dramatic in the case of the per-point relative mean error (figure 6(b)), but we still observe slight improvements in the low-data scenario (for the extreme case of a validation set fraction of 99%, L point = 18.2% is obtained for the simple FFNN, while the augmented FFNN leads to L point = 16.8%).Overall, these numerical results suggest that models for radiative heat transfer trained on synthetic data created by the generator can be more accurate, under conditions of data scarcity, than models trained exclusively on real data.This superior performance of synthetically trained models has been also reported in the context of video representations based on synthetic data (see for instance [71]).
Finally, to conclude the present analysis, we compare the performance as surrogate model to generate h ω spectra of the trained CWGAN with that of the simple and augmented FFNNs.This application arises from the conditioning modification we introduced in this work to conventional GANs (see equation ( 4) and the corresponding discussion in section 2).This conditioning allows us to accurately keep track of the geometrical parameters associated to each spectra created by the generator of our CWGAN model, so once the generator is trained, it can be decoupled from the whole architecture and used as a standalone surrogate model.The obtained numerical results are also displayed in figure 6 (red points and red lines corresponds to the results of a CWGAN used as surrogate model).As observed, for the integral relative mean error (figure 6(a)) the CWGAN displays worse performance than both the simple and the augmented FFNNs when the validation set fraction is below 70% approximately (i.e. the data regime that we previously labeled as of sufficiently large training set scenario for both the simple and the augmented FFNNs).However, in the low-data regime (values of the validation set fraction larger than 70%) we observe that the CWGAN surrogate gradually improves the results of a simple FFNN, until fully reproducing the improvement reached by an augmented FFNN in the extreme case of 99% validation set fraction.Regarding the per-point relative mean error (figure 6(b)), the CWGAN surrogate displays a significantly worse performance in comparison to both the simple and the augmented FFNNs for the validation set fractions below 70%, but from that point on, it starts converging to the results of an augmented FFNN in the low-data regime.Ultimately, this comparison between the surrogate role of all considered models reinforces our previous conclusion that the CWGAN shows higher resilience to the reduction of training data than the simple FFNN, becoming also in this context an efficient architecture in low-data scenarios.

Conclusions
In this work we have studied the application of GANs for synthetic spectral data generation in the domain of radiative heat transfer, an area where their use has not been previously reported.Our main focus has been an illustrative problem in the domain of near-field radiative heat transfer involving a multilayered hyperbolic metamaterial.We have analyzed the use of a CWGAN for data augmentation and studied its influence on the predictive capacities of a FFNN.Our results reveal that generating spectral data effectively entails two main modifications to traditional GANs: firstly, incorporating WGANs to prevent mode collapse, and secondly, conditioning these WGANs to secure accurate labels for the generated data.It is demonstrated that a basic FFNN, augmented with data yielded by a CWGAN, substantially improves its efficiency in scenarios where data is scarce.Moreover, we present that CWGANs can function as a superior surrogate model when data is limited.Overall, this work aims at contributing to the research area of generative AI algorithms' applicability beyond the conventional field of image generation.We believe that our findings contribute to the understanding and applicability of generative AI algorithms in data-constrained contexts and could stimulate further research work in the application of generative AI in a variety of scientific scenarios.

Figure 1 .
Figure1.Schematic representation of a conditional generative adversarial network (CGAN) employed to produce synthetic spectral data.The process begins with the combination of geometrical data (W), and a random vector (z), both of which are fed into the generator.The generator's role is to create a synthetic, or 'fake' , spectrum from these inputs.The generated spectrum is then introduced to the discriminator, along with the real spectra and the geometrical data.The discriminator's task is to discern between the fake and real spectra.The conclusions drawn by the discriminator are subsequently used to guide the training of both the generator and discriminator networks, enhancing their ability to generate realistic spectra and to distinguish between real and synthetic data, respectively.

Figure 2 .
Figure 2. (a) Schematic representation of the physical system under study.It features a multilayered hyperbolic metamaterial comprised of two identical structures, each alternating between metal (depicted in grey) and dielectric (blue) layers that extend infinitely.These two systems are separated by a distance d0 = 10 nm.Every layer has a thickness d i , and both systems are backed by a metallic substrate.(b) Transmission of evanescent waves as a function of the frequency (ω) and the parallel wavevector (k), considering the case where there are 8 layers per system and both d i = d0 = 10 nm.The transmission pattern exhibits a series of lines nearing unity, resulting from the hybridization of surface plasmon polaritons (SPPs) within the metallic layers.(c) Sample spectra of heat transfer coefficients, hω, for several representative combinations of layer thicknesses, which are specified in table 1.
polaritons (SPPs) of the two metal-vacuum interfaces and yielding two near-unity contours in τ p (ω, k).Introducing internal layers of specific thicknesses allows for NFRHT enhancement through multiple surface states, as shown in figure 2(b) for an N = 8 layer structure (4 metallic and 4

Figure 3 .
Figure 3. Reproduction of the training set using (a) a CGAN, showing in grey the training set and in green the CGAN reproduction of the set; and (b) the CWGAN, using the same color scheme.The results were obtained by applying PCA to the corresponding spectra, and plotting the first two components, z1 and z2 (with a >90% reproduction rate).

Figure 4 .
Figure 4. Schematics of the architecture used for the Conditional Wasserstein Generative Adversarial Network (CWGAN) employed in this work.Both the generator and critic sections are shown in the left and right parts, respectively, of the figure.As shown, the generator accepts geometrical parameters as input and produces a heat transfer coefficient (HTC) spectrum as output.It is constructed of four fully connected layers (FCs) (green blocks), dropout layers placed after each FC layer (orange blocks) and a final linear activation layer (red block).The critic section simultaneously accepts HTC spectra produced by the generator, as well as true spectra, and the geometrical parameters as inputs, providing a scalar score as output.The critic model consists of a unique FC layer for each parallel input branch followed by a concatenation layer that combines information from both input branches into a single vector.Afterwards, two more FC layers and a final linear activation layer produce the score.In both the generator and critic sections, normalization of the input data is applied as a preprocessing step (grey blocks).

Figure 5 .
Figure 5. Evolution of the critic and generator loss functions during the training process (LC(G, C) and LG(G, C), panels (a) and (b), respectively).Inset of panel (a) shows dependence on the training steps of three contributions of LC(G, C) (see equation (3) in the main text).Inset of panel (b) corresponds to evolution with the training step of the mean absolute error (MAE) associated to the synthetic spectra produced by the generator.Notice that in our framework the critic evaluates the loss 5 times per generator loss evaluation.

Figure 6 .
Figure 6.(a) Evolution of the integral relative mean error for the validation set as a function of the validation set fraction of the total data.(b) Same as in (a), but now using the per-point relative mean error metric.In both (a) and (b), blue and black dots (lines) show the results for a simple FFNN and a CWGAN-augmented FFNN, respectively, whereas red dots (lines) correspond to the results obtained by using the CWGAN as surrogate model.