Predicting materials properties with generative models: applying generative adversarial networks for heat flux generation

In the realm of materials science, the integration of machine learning techniques has ushered in a transformative era. This study delves into the innovative application of generative adversarial networks (GANs) for generating heat flux data, a pivotal step in predicting lattice thermal conductivity within metallic materials. Leveraging GANs, this research explores the generation of meaningful heat flux data, which has a high degree of similarity with that calculated by molecular dynamics simulations. This study demonstrates the potential of artificial intelligence (AI) in understanding the complex physical meaning of data in materials science. By harnessing the power of such AI to generate data that is previously attainable only through experiments or simulations, new opportunities arise for exploring and predicting properties of materials.


Introduction
In recent years, the advent of big data and artificial intelligence (AI) has revolutionized numerous fields, including materials science.It is increasingly evident that the paradigm of big-data-driven science will significantly shape the trajectory of materials science, with methodologies like machine learning (ML) gaining substantial traction within this field [1][2][3].In essence, ML applied in materials science aims to quantitatively predict material properties from existing databases, albeit often at the expense of reduced physical insights [4].Concurrently, with the development in progress in * Author to whom any correspondence should be addressed.
Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence.Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.high-performance computing, molecular dynamics (MD) simulation as a computational technique widely used in materials science to investigate the behavior of atoms or molecules at the atomic level, has emerged as a powerful tool in materials science research [5].By leveraging the efficiency of MD simulation in generating substantial amounts of physically meaningful data on desirable properties, the combination of MD simulation with ML approaches has emerged as a powerful method for property prediction in materials science [6][7][8][9][10][11].
ML approaches can be broadly classified into supervised learning and unsupervised learning.Supervised learning involves training a model using labeled data, where input features are paired with corresponding target values.This allows the model to learn the relationship between the features and the target variable, enabling it to make predictions on new, unseen data.Supervised learning has shown promise in predicting and classifying material properties, as well as assisting in materials design.For example, Yang et al [6] utilized ML with training data from MD simulations to predict the Young's modulus of silicate glasses.Fukuya and Shibuta [7] employed a three-dimensional convolutional neural network to successfully identify solid and liquid atoms in the solid-liquid biphasic system of various elements in MD simulation.On the other hand, unsupervised learning involves analyzing unlabeled data to uncover hidden patterns, structures, or clusters without predefined target variables or labels.This type of learning is particularly useful for exploratory analysis, data visualization, and dimensionality reduction.Tsunawaki et al [8] applied unsupervised ML algorithms, specifically hierarchical clustering, to classify coordinates on the reaction pathway in a specific cluster of the dendrogram.This approach utilized structural and electronic characteristics obtained from MD simulations of catalytic reactions on metal nanoparticles.
Generative model is a specific branch of ML model that focuses on creating new data samples.It enables the generation of new content, such as images, voices, text, and even scientific data, based on patterns and structures learned from existing datasets.In the field of materials science, generative model has shown great potential in generating valuable data and predictions.Kawada et al [12] developed MD-GAN (MD-Generative Adversarial Network) which combines generative model with MD simulations to predict long-term dynamics from short-term data.By training a generative model on a limited dataset of MD simulations, MD-GAN can generate realistic and informative trajectories that extend beyond the limited simulation time.Moreover, Sase and Shibuta [13] recently proposed a novel method to predict multi-atom cooperative phenomena based on the a deep generative model in combination with recurrent neural network.In this approach, time evolution of latent variables derived by encoding MD data using variational autoencoder can predict the microstructure that cannot be reproduced on the time scale of MD simulations.These new approaches have the potential to significantly reduce the computational cost of performing long-term simulations.
In this study, we propose a novel approach to apply a generative model for predicting thermal properties from generated data.From MD simulation, we can obtain the heat flux data and then calculate the lattice thermal conductivity using the heat flux data, according to the Green-Kubo formula [14,15].In this study, we apply a specific generative model algorithm called generative adversarial networks (GANs) [16] to generate the heat flux akin to MD simulation data and use generated data to calculate the lattice thermal conductivity of metallic materials.It is worth noting that in metallic materials, heat flux is not solely generated by vibrations of lattice but also by the movement of free electrons [17].Because MD simulations make use of Newton's equations of motion to describe the trajectory of atoms [18], the thermal conductivity calculated in MD simulations is limited to lattice thermal conductivity, and therefore, it underestimates the actual thermal conductivity of the metallic materials [19].If we could accurately determine the heat flux within the metallic material from MD simulations, we would be able to precisely calculate its thermal conductivity.Unfortunately, this information is not readily available.On the other hand, considering an alternative perspective, we hypothesize that generative models could potentially generate the true heat flux, thereby enabling the calculation of the correct thermal conductivity.This study is meant as a first step toward the larger goal of generating true heat flux and predicting true thermal conductivity using a generative model.

MD simulation
Equilibrium molecular dynamics (EMD) provides an efficient method for calculating the thermal conductivity by MD simulation [19][20][21].EMD simulation makes use of Green-Kubo formula, which establishes a relationship between the transport coefficient of a non-equilibrium process and the fluctuation of the corresponding physical quantity in the equilibrium state.According to Green-Kubo formula, thermal conductivity of materials can be expressed as following equation: where κ uv represents the thermal conductivity tensor, V denotes the volume of the simulation system, k B is the Boltzmann constant, T represents the absolute temperature of the system, the the heat flux autocorrelation function (HFACF) C uv (t) is calculated as The heat flux J is defined as the time derivative of the energy density moment and can be calculated using the following equation: where e i represents the energy (sum of potential and kinetic energy) of atom i, v i represents the velocity of atom i, F ij represents the relative force between atom i and j, and r ij represents the relative position between atom i and j, all of which can be obtained from MD simulation.For a three-dimensional isotropic system, the off-diagonal components of the thermal conductivity tensor are zero, allowing the thermal conductivity to be taken as the average value of the diagonal components as follows: In this study, we used the EMD simulation to calculate the lattice thermal conductivity of two representative metals aluminum and copper at 300 K.The embedded atom method potentials in [22] are employed as interatomic potentials for them.The simulations are performed using Large-scale Atomic/Molecular Massive Parallel Simulator (LAMMPS) [23].Simulation systems consisted of 14 × 14 × 14 unit cells, comprising 10 976 atoms.The velocity-Verlet method is used to integrate the classical equation of motion with a time step of 1.0 fs and the Nose-Hoover thermostat [24,25] is employed to control temperature.The system was firstly simulated in an number of atoms, volume and temperature (NVT)-constant ensemble to attain an equilibrium structure and then in an number of atoms, volume and total energy (NVE)-constant ensemble for 9.8 × 10 7 steps (98 nanoseconds), where the heat flux is calculated every 10 steps.For calculating HFACF, the calculated heat flux is further divided into equal segments in time series, where each segment has a certain length of 100 picoseconds.The correlation time is from 0 to 10 picoseconds or 20 picoseconds for each segment.The average of all calculations is considered as the final thermal conductivity result.

GANs
GANs are deep learning generative models that are capable of generating highly realistic data.The basic component of GAN is the neural network.The neural network is a widely used discriminative model in ML that can establish complex relationships between input variables and the target variable [26].A neural network composed of interconnected layers of neurons.The first layer is the input layer, which corresponds to each feature in the input variable.The middle layers are the hidden layers.A neural network typically consists of one or more hidden layers, where one layer is responsible for extracting new features from the input variables and serving as the new input for the subsequent layers.The hidden layers enable the network to learn and represent increasingly complex relationships within the data.The final layer is the output layer, which provides continuous prediction values in regression or class probabilities in classification tasks.Each neuron functions like a perceptron [27], involving a linear connection function and a nonlinear activation function.Subsequently, the backpropagation algorithm [28] is employed to calculate the errors of each node in a backward manner, starting from the output layer and moving towards the input layer.This process allows for the adjustment of all the weights associated with each node.The iteration continues until the weight parameters converge to a reasonable value, ensuring the optimal performance of the network.
A GAN consists of two neural networks: a generator network and a discriminator network.The generator network takes random noise as input and produces generated data, while the discriminator network determines whether the data is real or generated, as depicted in figure 1.These two networks engage in a competitive process where the discriminator network becomes more discerning over time, and the data generated by the generator network becomes increasingly realistic.In this study, GAN is employed to generate time-series heat flux data in MD simulations, which involve high-dimensional data characterized by complex probability distributions.The original heat flux data consists of time-series measurements with a total length of 9.8 × 10 6 data points (9.8 × 10 7 steps, 98 nanoseconds) in three different directions, respectively.To create the training dataset, we divide this large data into smaller time-series segments of length 1.0 × 10 3 .As a result, the entire dataset is reshaped into a matrix with dimensions of (2.94 × 10 6 , 1.0 × 10 3 ), where each row corresponds to a segment of heat flux data.The architecture of the generator network and discriminator network is illustrated in figure 2. Each neural network consists of one input layer, one hidden layer and one output layer.In order to generate data which is analogous to heat flux data in the generator network, the input is noise data whose length (or dimension) is 1.0 × 10 3 .The hidden layer consists 128 neurons and ReLU function is used for the propagation from input layer to hidden layer.The output of generator network is data with length of 1.0 × 10 3 , which is seen as generated heat flux data.The discriminator network is used to discriminate the similarity between the generated data and the real heat flux data.The input of discriminator network is the generated data with length of 1.0 × 10 3 .Similarly, ReLU function and a hidden layer consisting 128 neurons is included in the process of propagation.Sigmoid function is used for the propagation from hidden layer to output layer in order to output a discriminant probability between 0 and 1.Furthermore, we employ binary cross-entropy (BCE) loss as the appropriate loss function in GAN.Additionally, we utilize the RMSprop optimization algorithm to optimize the network's parameters.Technical details of neural network are given in appendix A. The PyTorch (version 1.13.1), an open source ML library for Python, is used for the implementation of the GAN model.All computations are performed on Google Colaboratory, which allows the execution of Python code through a browser.

Lattice thermal conductivity from MD simulation
Figure 3 shows heat flux of aluminum at 300 K from 0 to 100 picoseconds calculated by MD simulation, normalized HFACF and running thermal conductivity.The correlation time ranges from 0 to 10 picosecond in one calculation as shown in grey lines and the average of all calculations converges to approximately 7 W mK −1 within the autocorrelation time as shown in the red line.Similarly, figure 4 show MD simulation results of copper, and the correlation time ranges from 0 to 20 picosecond in one calculation as shown in grey lines and the average of all calculations converges to approximately 14 W mK −1 within the autocorrelation time as shown in the red line.Note that there are inherent discrepancy between lattice thermal conductivity from MD simulation and experimentally measured thermal conductivity for metal materials [19].In metallic materials, heat is predominantly carried by both phonons and electrons, and thus the total thermal conductivity should consist of two components: phonon (lattice) thermal conductivity and electronic thermal conductivity.However, MD simulation can only capture the phonon component since it employs classical Newton's equations of motion to describe atomic trajectories, neglecting explicit consideration of the electronic contribution.Therefore, the observed discrepancies between MD simulation results and experimental values are not indicative of inaccuracies in the interatomic potentials but rather reflect the inherent limitations of MD simulations in capturing the electronic contribution.However, this is not the focus to be investigated in this study but discussed elsewhere [19].Our objective in this study is to explore the possibility of using generative models to generate data with real physical meaning, such as heat flux.This approach serves as the basis of generating more perfect heat flux data to accurately predict the thermal conductivity of metal materials than MD simulation in further study.

Generation of heat flux by GAN
In figures 5 and 6, representative data of heat flux of aluminum and copper respectively generated by the GAN are compared with those obtained from MD simulation.The generated data not only exhibits similar magnitudes but also captures the vibrational patterns with high similarity.Furthermore, we connect every ten consecutive segments of the generated data to form a longer time series of 1.0 × 10 4 length (equivalent  to 100 ps).The connected heat flux, compared to the one obtained from MD simulation, a shown in figure 7, for aluminum and copper respectively.Heat flux may appear to fluctuate irregularly, but it exhibits underlying regular patterns or periodicities.These periods are crucial for autocorrelation calculations, as they capture the essential characteristics leading to the final thermal conductivity results.
One effective approach to identifying and addressing these hidden periods is through the use of fast Fourier transform (FFT).The corresponding FFT result of heat flux data in figure 7       vibrations with similar amplitudes.Furthermore, the FFT results quantitatively demonstrate their similarity in vibration frequencies.

Prediction of lattice thermal conductivity from generated data
To further verify the similarity between the heat flux generated by GAN and the heat flux obtained by MD calculation, we employ the Green-Kubo formula to calculate the thermal conductivity making use of heat flux generated by GAN with the connected heat flux data of 100 picoseconds with a correlation time of 10 picoseconds for aluminum and 20 picoseconds for copper and then taking averages of all calculations.The comparison between the computed lattice thermal conductivity with generate data and that from the MD simulation is presented in figure 9. Remarkably, it is found that not only the final converged values are close to each other (about 7 W mK −1 for Al and 13-14 W mK −1 for Cu), but the convergence processes also demonstrate a striking similarity between the MD simulation results and GAN results.These results highlight that there is no fundamental difference between the heat flux generated by the GAN and the heat flux computed through MD simulations.They serve the same purpose and exhibit similar functionality.The reported values of lattice thermal conductivity from first-principles calculations [29,30] are approximately 6 W mK −1 for Al and about 17 W mK −1 for Cu, which closely align with the values obtained in this study.Note that it is difficult to determine only the lattice thermal conductivity from experiments except at very low temperatures [31,32].The GAN not only captures the external characteristics of the heat flux but also comprehends its underlying significance, namely, its periodic vibrations and the correlation function that determines the material's thermal conductivity.

Conclusions
In this study, we delved into the application of the generative model for generating heat flux data which we initially calculated through MD simulation.Our results revealed that the heat flux generated by the GAN closely resembled the heat flux obtained through MD simulations.The GAN not only captured the external characteristics of the heat flux but also exhibited an understanding of its fundamental properties, including its periodic vibrations and the autocorrelation function governing thermal conductivity.The significance lies in the fact that we have enabled AI to comprehend key concepts in materials science, such as heat flux and thermal conductivity.Moving forward, leveraging data from true thermal conductivity for supervised learning and enhancing existing models holds the potential to bring us closer to generating actual heat flux and achieving precise predictions of thermal conductivity in metallic materials or other important properties that are typically obtained through experiment.

Figure 2 .
Figure 2. The architecture of generator network and discriminator network for generating heat flux data.

Figure 3 .
Figure 3. Calculation of lattice thermal conductivity of Al at 300 K. (a) Heat flux calculated by MD simulation; (b) normalized HFACF and (c) running thermal conductivity.Grey and red lines represent the results of one calculation and their averages, respectively.

Figure 4 .
Figure 4. Calculation of lattice thermal conductivity of Cu at 300 K. (a) Heat flux calculated by MD simulation; (b) normalized HFACF and (c) running thermal conductivity.Grey and red lines represent the results of one calculation and their averages, respectively.
is displayed in figure 8, respectively.Details of FFT are shown in appendix B. The magnitude in the FFT represents the strength of specific frequency of the heat flux data.The direct comparison of heat flux and the analysis of FFT result indicate that the generated and MD derived heat flux exhibit seemingly random

Figure 5 .
Figure 5. Representative data segments of heat flux of Al (a) generated by GAN and (b) from MD simulation.

Figure 6 .
Figure 6.Representative data segments of heat flux of Cu (a) generated by GAN and (b) from MD simulation.

Figure 7 .
Figure 7. Time series of heat flux of 100 ps generated by GAN (orange) and from MD simulation (blue) for (a) Al and (b) Cu.

Figure 8 .
Figure 8. Fast Fourier transform spectra of heat flux generated by GAN (orange) and from MD simulation (blue) for (a) Al and (b) Cu.

Figure 9 .
Figure 9. Lattice thermal conductivity estimated from heat flux generated by GAN (orange) and from MD simulation (blue) for (a) Al and (b) Cu.