Simulating lattice thermal conductivity in semiconducting materials using high-dimensional neural network potential

Emi Minamitani; Masayoshi Ogura; Satoshi Watanabe

doi:10.7567/1882-0786/ab36bc

Heat generation in semiconducting materials has become a critical problem in modern nanoscale electronics. As the size of electric devices decreases, the power density and device temperature increase, which is one of the major factors contributing to the degradation of device performance and reliability.¹⁾ To design semiconductor materials with better thermal manageability, efficient methods for the theoretical simulation of thermal conductivity are required.

The main carrier of heat in semiconductors is the phonon, which is a quantum of lattice vibration. Current methods of simulating lattice thermal conductivity can be classified into three categories:^2,3) (1) anharmonic lattice dynamics (ALD) in combination with phonon transport calculation using the Boltzmann transport equation (BTE) and Fourier's law,^4–8) (2) equilibrium molecular dynamics (EMD) using the Green–Kubo formula,^9–11) and (3) direct evaluation of the heat flux by nonequilibrium molecular dynamics (NEMD).^12,13) These theoretical frameworks require the accurate prediction of interatomic force in the solid. Density functional theory (DFT) calculation is one of the most well-established techniques for accurate force prediction, including that of the effect of changes in the electronic state with atomic displacement. However, the high computational cost limits the application of DFT calculation in thermal conductivity simulations. Although the combination of DFT calculations with ALD⁷⁾ or EMD¹⁴⁾ approaches has been successful in accurate prediction of the lattice thermal conductivities of semiconductor crystals, the application of this technique to systems with more complex structures such as defective or disordered ones is not realistic because of the rapid increase in computational cost with increasing system size. Direct combination of NEMD with DFT calculation is almost impossible because the system size used in NEMD simulations must be much larger to reproduce a reasonable temperature gradient.¹³⁾ Computational time can be reduced by using the empirical potential, but the accuracy is insufficient compared to that of DFT-based calculations.⁸⁾ An alternative simulation technique that can resolve the trade-off between the accuracy of force prediction and computational cost is urgently needed.

The application of machine learning techniques to solid-state physics has rapidly developed in recent years.^15,16) These techniques are promising for thermal conductivity simulation.¹⁷⁾ As shown in early works by Behler et al.^18,19) and subsequent studies,^20–26) the high-dimensional neural network potential (HDNNP) can describe the relation between the total energy of a system and its atomic arrangement. It is naturally expected that the force acting on atoms can also be described by the HDNNP because the derivative of the total energy with respect to the atomic displacement gives the force. Force prediction in semiconducting materials using the HDNNP or other machine learning techniques has already been reported in several studies,^27–29) and the phonon dispersions have been well reproduced in these reports.^28,29) However, the accuracy of the force prediction is limited to the order of 100 meV Å⁻¹, which is much higher than the accuracy of DFT calculations (a few tens of millielectronvolts per angstrom) required to simulate thermal conductivity. To the best of our knowledge, the HDNNP has never been applied to thermal conductivity simulation. Here, we show that much higher accuracy can be obtained by training HDNNP parameters with a focus on force fitting. We chose crystalline Si and GaN as representative semiconducting materials with one and two atom types, respectively. In both systems, we obtained an HDNNP that can predict the force with the accuracy of DFT. The obtained thermal conductivities are within 5.4% of those calculated by DFT.

In this study, we adopted the HDNNP model developed by Behler et al.,^16,17) in which the total energy of the system ( ${E}_{{\rm{tot}}}$ ) is expressed as the sum of the energy contributions from each atom ( ${E}_{i}$ ), i.e., ${E}_{{\rm{t}}{\rm{o}}{\rm{t}}}=\displaystyle {\sum }_{i}{E}_{i}.$ Here, we neglect the effect of the long-range electrostatic potential, and ${E}_{i}$ is determined by the local atomic environment, which is described by the cutoff and symmetry functions. The cutoff function determines the sphere of the local environment, whereas the symmetry functions represent the radial and angular distributions of neighboring atoms. The following cutoff function is used:

$\begin{eqnarray}&&\begin{array}{r}\begin{array}{r}{f}_{{\rm{c}}}\left({R}_{ij}\right)=\left\{\begin{array}{c}{\tanh }^{3}\left(1-{\textstyle {\textstyle \tfrac{{R}_{ij}}{{R}_{{\rm{c}}}}}}\right)\,{\rm{f}}{\rm{o}}{\rm{r}}\,\,{R}_{ij}\leqslant {R}_{{\rm{c}}}\\ \,0\,{\rm{f}}{\rm{o}}{\rm{r}}\,\,{R}_{ij}\gt {R}_{{\rm{c}}}\end{array}\right.,\end{array}\end{array}\end{eqnarray} \tag{ 1 }$

where ${R}_{{\rm{c}}}$ is the cutoff distance, and ${R}_{ij}$ is the interatomic distance. For the symmetry functions, we use the following three types of functional forms:

$\begin{eqnarray}&&{G}_{i}^{1}=\displaystyle \sum _{j\ne i}^{{N}_{{\rm{a}}{\rm{t}}{\rm{o}}{\rm{m}}}}{f}_{{\rm{c}}}\left({R}_{ij}\right),\end{eqnarray} \tag{ 2 }$

$\begin{eqnarray}&&{G}_{i}^{2}=\displaystyle \sum _{j\ne i}^{{N}_{{\rm{a}}{\rm{t}}{\rm{o}}{\rm{m}}}}{e}^{-\eta {\left({R}_{ij}-{R}_{{\rm{s}}}\right)}^{2}}{f}_{{\rm{c}}}\left({R}_{ij}\right),\end{eqnarray} \tag{ 3 }$

$\begin{eqnarray}&&{G}_{i}^{4}={2}^{1-\zeta }\displaystyle \sum _{j\ne i}^{{N}_{{\rm{a}}{\rm{t}}{\rm{o}}{\rm{m}}}}\displaystyle \sum _{k\ne i,j}^{{N}_{{\rm{a}}{\rm{t}}{\rm{o}}{\rm{m}}}}\left[{\left(1+\lambda \cos {\theta }_{ijk}\right)}^{\zeta }{e}^{-\eta \left({R}_{ij}^{2}+{R}_{ik}^{2}\right)}\,{f}_{{\rm{c}}}\left({R}_{ij}\right){f}_{{\rm{c}}}\left({R}_{ik}\right)\right].\,\end{eqnarray} \tag{ 4 }$

Here, ${N}_{{\rm{atom}}}$ represents the total number of atoms inside the cutoff sphere. The hyperparameters ${R}_{{\rm{c}}},{R}_{{\rm{s}}},\lambda ,\zeta ,$ and η, which should be set before the HDNNP is trained, must be tuned for better prediction performance. We manually tuned these hyperparameters. The setting used in this study was the one that returned the least root mean square error (RMSE) among the ∼20 patterns of tests for GaN and the ∼10 patterns for Si. The values of the hyperparameters used in this work are summarized in the supplementary material, available online at stacks.iop.org/APEX/12/095001/mmedia.

As a simple example, let us consider a neural network (NN) comprising a single hidden layer with ${N}_{{\rm{n}}}$ nodes and ${N}_{{\rm{s}}}$ input nodes associated with the symmetry functions. The atomic energy and force with respect to the atomic displacement along the Cartesian coordinate ${R}_{i}^{\nu }\left(\nu =x,y,z\right)$ output by the NN ( ${E}_{i}$ and ${F}_{i}^{\nu },$ respectively) are given by the following expressions:

$\begin{eqnarray}&&{E}_{i}={f}_{{\rm{a}}}^{2}\left[{w}_{01}^{2}+\displaystyle \sum _{j=1}^{{N}_{{\rm{n}}}}{w}_{j1}^{2}{f}_{{\rm{a}}}^{1}\left({w}_{0j}^{1}+\left\{\displaystyle \sum _{\mu =1}^{{N}_{{\rm{s}}}}{w}_{\mu j}^{1}{G}_{i}^{\mu }\right\}\right)\right],\end{eqnarray} \tag{ 5 }$

$\begin{eqnarray}&&{F}_{i}^{\nu }=-\displaystyle \frac{{\rm{\partial }}{E}_{{\rm{t}}{\rm{o}}{\rm{t}}}}{{\rm{\partial }}{R}_{i}^{\nu }}=-\displaystyle \sum _{j=1}^{N}\displaystyle \frac{{\rm{\partial }}{E}_{j}}{{\rm{\partial }}{R}_{i}^{\nu }}=-\displaystyle \sum _{j=1}^{N}\displaystyle \sum _{\mu =1}^{{N}_{{\rm{s}}}}\displaystyle \frac{{\rm{\partial }}{G}_{j}^{\mu }}{{\rm{\partial }}{R}_{i}^{\nu }}\displaystyle \frac{{\rm{\partial }}{E}_{j}}{{\rm{\partial }}{G}_{j}^{\mu }},\end{eqnarray} \tag{ 6 }$

where ${w}_{0j}^{k},$ ${w}_{\mu \ne 0j}^{k},$ and ${f}_{{\rm{a}}}^{k}$ are the bias, weight, and activation functions in the kth layer, respectively. N represents the total number of atoms in the system. These equations for atomic energy and force can be extended to deeper NNs and the subnet structure in the HDNNP.

The above HDNNP model was implemented in our homemade code, which is publicly accessible via a Github repository.³⁰⁾ Training of the weights and biases in the NN by back-propagation and the differentiation required for the force calculation were implemented using the Chainer^31,32) library. The loss function for the training procedure is defined as follows:

$\begin{eqnarray}&&L=\alpha \times {\rm{RMSE}}\left(\left\{{F}_{i}^{\nu }\right\}\right)+\left(1-\alpha \right)\times {\rm{RMSE}}\left(\left\{{E}_{i}\right\}\right),\end{eqnarray} \tag{ 7 }$

where RMSE({}) is the sum of the RMSEs between the HDNNP prediction and DFT training data. Note that the units for the RMSE of the energy and force are eV atom⁻¹ and eV Å⁻¹, respectively. Here, the loss function L is evaluated using the unitless RMSE values.

The training data sets for Si and GaN were generated from a combination of a classical molecular dynamics (MD) simulation using LAMMPS^33,34) and DFT calculation using the Vienna Ab Initio Simulation Package (VASP).^35–38) A crucial point for good training of HDNNP is the randomness of atomic configurations in the training data, which requires a long MD simulation and sparse sampling of MD trajectories. Because of this sparseness, performing the entire procedure using DFT is not efficient. Instead, we performed classical MD simulations with a well-established interatomic potential and then evaluated the energies and forces by DFT for structures sampled from the trajectories of classical MD. We first conducted a classical MD simulation using the Stillinger–Weber potential³⁹⁾ using LAMMPS for 10 000 steps at 0.001 ps time intervals at several temperatures. Here, we focus on the thermal properties; thus, we need an HDNNP that well expresses the forces acting on atoms in the atomic arrangements that may appear during thermal vibration. The data for configurations with a large atomic displacement and/or bond breaking are redundant for this purpose. Therefore, we selected temperatures between room temperature and the melting temperature for the classical MD simulation. For Si, we chose 300, 600, and 900 K, and for GaN, we chose 300, 700, 1100, 1500, 1900, 2300, and 2700 K.

Next, we randomly selected 100 snapshots from each MD trajectory and performed DFT calculations for the corresponding atomic configurations without structure relaxation by VASP with the projector augmented-wave potential.⁴⁰⁾ The LDA exchange–correlation functional was used for Si, and GGA-PBE⁴¹⁾ was used for GaN. The cutoff energy of the plane wave basis was set to 550 eV. In all these MD and DFT calculations, we chose supercell sizes of 64 atoms for Si and 32 atoms for GaN (see the supplementary material). To increase the variation in the atomic configurations in the training data and to extend the flexibility of HDNNP,⁴²⁾ we also included data obtained using slightly different lattice constants. The total numbers of atomic configurations in the DFT data sets were 2100 and 3500 for Si and GaN, respectively.

In the training procedure using the above data set, we used an NN topology with two hidden layers. The number of nodes in each layer was set to 500, and the hyperbolic tangent was adopted as the activation function. Adaptive moment estimation was chosen as the optimization algorithm.⁴³⁾

In Fig. 1, we compare the forces in the Si and GaN systems predicted by DFT and HDNNP. During training, 90% of the DFT data sets were used as training data, and the remaining 10% were used as validation data. We set the parameter α = 0.99 in the loss function in Eq. (7) so that the RMSE of the force is dominant in the training procedure. The final RMSE of force prediction in the validation data was 25.5 meV Å⁻¹ for Si and 37.8 meV Å⁻¹ for GaN. Even at a very low weight of the loss function, the final RMSE of the total energy prediction for the validation data reaches 32.7 meV atom⁻¹ for Si and 66.5 meV atom⁻¹ for GaN (see the supplementary material). We found that a lower value of α increases the RMSE of the force prediction, which indicates that focusing on force itself during training is important for more accurate force prediction.

Next, to check that the force accuracy is sufficient for the simulation of phonon-related thermal properties, we evaluated the phonon dispersions in Si and GaN crystals by a combination of HDNNP and phonopy.⁴⁴⁾ We used HDNNP to predict the forces for the irreducible displacement patterns given by phonopy. For comparison, we also performed a phonon dispersion calculation using a combination of VASP and phonopy.

The phonon dispersion curves obtained using HDNNP agree well with the DFT calculation results and previous reports^45,46) for both Si and GaN, as shown in Fig. 2. Note that we focus on the comparison between the HDNNP and DFT results, and we do not include the non-analytic correction for LO–TO splitting.⁴⁷⁾ Thus, the frequencies of the highest LO mode in GaN largely differ from the experimental values.

**Fig. 2.** (Color online) Comparison of phonon dispersions in (a) Si and (b) GaN obtained by HDNNP and DFT calculations. The supercell sizes for calculations of the second-order force constant were set to 2 × 2 × 2 for the conventional unit cell in Si and 3 × 3 × 2 for the primitive unit cell in GaN. The experimental data for comparison were obtained from Ref. 46 for Si and Ref. 45 for GaN.
Download figure:
Standard image High-resolution image

Next, we simulated the lattice thermal conductivity based on ALD by combining HDNNP and the phono3py package,⁴⁾ using a procedure similar to that for the calculation of phonon dispersion. The irreducible displacements for evaluating the third-order potential for three phonon processes were obtained using phono3py, and the forces acting on atoms in each displacement pattern were predicted using HDNNP. The phonon–phonon interaction strength and the corresponding lifetime were extracted from these force data; then, the lattice thermal conductivity (κ) was calculated using the single-mode relaxation time approximation (RTA) of the linearized phonon BTE. Figure 3 shows a comparison of the temperature dependence of the thermal conductivity obtained from the force predictions of HDNNP and VASP calculations. We note that in both calculations, the thermal conductivities in Si and GaN are underestimated compared with previous reports (approximately 150 W m⁻¹ K⁻¹ and 400 W m⁻¹ K⁻¹ at 300 K, respectively^8,48)). Three factors can be listed as plausible reasons for the underestimation: i) the small supercell size in the thermal conductivity simulation; ii) the DFT calculation setup, including the choice of pseudopotentials, exchange–correlation functional, and lattice constants; and iii) the RTA in the phonon BTE solution (iterative treatment is required to obtain the exact thermal conductivity^8,48)). Here, we focus on the reproducibility of the DFT calculation by the HDNNP; thus, we do not discuss this point in detail. The calculation results from HDNNP and DFT under the same simulation conditions for both Si and GaN are in good agreement, indicating the strong potential of HDNNP for application in thermal conductivity simulations. The deviation from the DFT calculation results is within 1% at 200 to 500 K for Si and within 5.4% from 200 to 1000 K for GaN. For example, for Si, the κ value at 300 K is estimated to be 110.4 W m⁻¹ K⁻¹ from the HDNNP results and 112.1 W m⁻¹ K⁻¹ from the DFT results. For GaN, the thermal conductivities along the in-plane direction, ${\kappa }_{\parallel },$ and along the out-of-plane direction, ${\kappa }_{\perp },$ at 300 K are 275.5 and 309.7 W m⁻¹ K⁻¹, respectively. These values are in good agreement with the DFT calculation results, which are ${\kappa }_{\parallel }$ = 274.2 and ${\kappa }_{\perp }$ = 325.5 W m⁻¹ K⁻¹.

**Fig. 3.** (Color online) Comparison of thermal conductivities (a) in Si, (b) along the in-plane (100) direction in GaN, and (c) along the out-of-plane (001) direction in GaN obtained by HDNNP and DFT calculations. The supercell sizes for calculations of the third-order force constant were set to 2 × 2 × 2 for the conventional unit cell in Si and 2 × 2 × 2 for the primitive unit cell in GaN. In the calculation of the linearized Boltzmann equation with the RTA, the Brillouin zone was sampled on an 11 × 11 × 11 mesh in all cases.
Download figure:
Standard image High-resolution image

Finally, we comment on the computational time required to calculate the thermal conductivity using DFT and HDNNP. For Si, the DFT calculation for a total of 111 displacement patterns required a total of 24 cores of two CPUs (Intel^® Xeon^® E5-2680v3) and 4.5 h, which reduced to 1 core of a CPU and 10 s for the HDNNP calculation. For GaN, the DFT calculations for a total of 582 displacements required 24 cores of the CPUs and 21.0 h, which reduced to 24 cores of the CPUs and 1.5 min for the HDNNP calculation. The computational time for the thermal conductivity simulation itself became less than 1/800 on using HDNNP. On the other hand, to obtain reliable HDNNP, considerable time is required to prepare the training data sets and tune the hyperparameters. In this study, the generation of the data set required 24 cores of the CPUs and 140.4 h for Si and 126.8 h for GaN. Hyperparameter tuning took 24 cores of the CPUs and ∼40 h for Si and ∼50 h for GaN. Thus, if we compute the thermal conductivity in perfect bulk crystals, HDNNP is not an efficient method. The situation changes when we treat a system with defects and/or impurities. Currently, the thermal conductivity of systems with defects and impurities is evaluated based on the first-order perturbation⁴⁹⁾ or ab initio Green's function technique^50–52) which can include the defect/impurity effect on harmonic interatomic force constants (IFCs). In contrast, the direct extension of DFT-based ALD for such systems is cumbersome because of the high computational cost of anharmonic IFCs. For example, the DFT calculation of anharmonic IFCs for a system with a single defect/impurity in a 64-atom supercell of Si requires calculations for ∼6418 displacement patterns, which will take 24 cores of CPUs and ∼430 h. If we can construct the data set including defects with reasonable computational time, and train an HDNNP that accurately reproduces the forces in such systems, the HDNNP will become beneficial for both Green's function based techniques and the evaluation of anharmonic IFCs. This would provide an alternative efficient method to investigate the thermal properties in complex systems with high accuracy. The flexibility and extensibility of the HDNNP in this direction remain to be elucidated.

Acknowledgments

The following financial support is acknowledged: JST PRESTO JPMJPR17I7 (E. M.), the JST "Materials Research by Information Integration" Initiative (MI2I) of the Support Program for Starting Up Innovation Hub (S. W.), and MEXT KAKENHI 17H05330 (S. W. and E. M.). We thank Prof. Togo for fruitful discussions on the connection between HDNNP and phonopy/phono3py. The calculations were performed using the computer facilities of the Institute for Solid State Physics, Information Technology Center, the University of Tokyo, and RIKEN (HOKUSAI GreatWave).

Simulating lattice thermal conductivity in semiconducting materials using high-dimensional neural network potential

Article metrics

Submit

Author e-mails

Author affiliations

ORCID iDs

Dates

Abstract

Acknowledgments

Simulating lattice thermal conductivity in semiconducting materials using high-dimensional neural network potential

Article metrics

Submit

Share this article

Author e-mails

Author affiliations

ORCID iDs

Dates

Abstract

Acknowledgments