Development of a Bayesian inference method for the analysis of X-ray reflectivity data

X-ray reflectivity (XRR) is an experimental method used in various fields of materials science to investigate the physical properties of solid surfaces and the structure of interfaces. However, it is difficult to evaluate the reliability of the estimates obtained with this method. In this study, we propose a method for analyzing XRR data using Bayesian inference. Bayesian inference allows the uncertainty of the estimate to be evaluated, which also allows the measurement limit to be evaluated, and also shows that estimation is possible even in noisy situations.


Introduction
Semiconductor devices, supporting the advancement of the electronics society, are currently the key material for establishing Internet of Things and Artificial Intelligence societies.These devices are often constructed with multilayer thin films to optimize their performance.][3] Thus, the precise evaluation of these parameters is imperative.
X-ray reflectivity (XRR) is an experimental method used to study the properties of surfaces and interfaces.In this method, X-rays are irradiated at small angles onto the surface of a thin film, and the intensity of the reflected X-rays is measured. 4,5)][8] In the conventional analysis of XRR, an appropriate film structure model is assumed, and the model is fitted to experimental data by optimization using the least-squares method. 9,10)Leastsquares fitting may converge to a local solution depending on the initial values, and convergence to the correct values of the model parameters requires an appropriate film structure model and the selection of good initial values. 11)Previous studies have proposed estimation methods using genetic algorithms (GAs) as a way to avoid local solutions. 12,13)However, it is difficult to evaluate the reliability of the obtained estimates with the GA approach.This issue affects not only the correctness of parameter estimates, but also the correctness of the assumed film structure.
Therefore, we propose a new method for XRR analysis based on Bayesian inference.In the framework of Bayesian inference, the posterior probability distribution of the model parameters can be obtained given the data, which allows us to not only evaluate the uncertainty in the estimation results, but also to select the correct film structure model. 14)The posterior probability distribution can be estimated by exchange Monte Carlo sampling, which enables global estimation independent of initial values.Bayesian inference is also known to be robust against noisy situations and can be used to discuss the limits of measurement that can be estimated. 15)n this study, we conduct experiments under various noisy situations and show that robust estimation is possible.

Thin-film structure model and process of XRR data generation
The theoretical expression for the XRR was proposed by the following equation: 16,17) where α is the incidence angle of X-rays and q is a parameter of the film structure model.A 0 ˜and B 0 ˜are the amplitudes of the incidence and reflected components of the electric field in the vacuum layer, respectively.The relationship between the jth layer and the j + 1 layer is expressed as, ´-- ´- ´-- where n j is the refractive index of layer j, t j is the thickness of layer j, and k 0 is the wave number in a vacuum.The reflectance can be obtained by solving graded Eq. ( 2) up to the surface, assuming that the electric fields in the substrate are A 1 N ˜= and B 0 N ˜= .However, the actual measurement data for reflected X-rays are count data.Consequently, in this paper, the reflection intensity is obtained instead of reflectivity.If the intensity of incidence X-rays is I 0 , the theoretical expression for the reflection intensity is, Since the count data follow a Poisson distribution and Eq. ( 8) corresponds to our expectation, the probability distribution of the reflection intensity y at an angle of incidence α is, In this study, artificial data were generated by calculating Eq. ( 9).The likelihood given a dataset  y , = is expressed using Eq. ( 9) as follows:  p py , .10

Bayesian inference
In Bayesian inference, the prior distribution p(q) and likelihood  p( | ) q of the parameters q are used to obtain the posterior distribution  p( | ) q .The posterior distribution is obtained from Bayes' theorem as follows: q q q = where  p( ) is independent of the parameter q and can be treated as a constant.However, it is difficult to obtain the posterior distribution analytically when the parameters are multidimensional.Therefore, in this study, the posterior distribution is approximated by sampling as follows: Here, the error function  E ( | ) q is defined as, This sampling yields the posterior distribution and also enables point estimation with the parameter set MAP estimate that maximizes the posterior probability.An exchange Monte Carlo method was employed as the sampling method to eliminate the initial value dependence and to perform global sampling. 18)In the exchange Monte Carlo method, multiple Markov chains are run simultaneously, and the states of these Markov chains are exchanged to break away from the local solution of the Markov chains.For the Markov chains, we introduced an inverse temperature parameter where it can be seen that and q(q; β = 0) = p(q).For the inverse temperature parameter, the values of L and γ were determined so that the exchange rate between replicas would be appropriate.

Setup
In this research, multilayers are defined, as shown in Fig. 1.Let the vacuum layer be layer 0, the top surface layer be layer 1 and the substrate layer be layer N. Let the thickness and refractive index of layer j be t j and n j , respectively.The roughness at the interface between layer j − 1 and layer j is σ j .In this study, the film thickness and roughness are estimated as unknown parameters, whereas the other parameters, such as density, are treated as known, i.e. constants.Thus, the parameter set is organized as q = {t 1 , ⋯ , t N−1 , σ 1 , ⋯ , σ N }.Since all parameters are non-negative, a gamma distribution is used as a prior distribution as follows: Here, a i , b i , c i and d i are hyperparameters related to the prior distribution, which can reflect subjective information in the analysis.For incidence X-rays, the wavelength was set to λ = 1.54 Å, which is the same as the wavelength of the Cu Ka line, the wavelength often used for XRR measurements in environments such as laboratories.The Metropolis method was used for sampling at each replica, and adaptive step-size tuning was used for the step size. 19)The number of replicas was set at L = 40, and each replica was sampled 20 000 times, the first 10 000 times of which were discarded as burnin.On average, it took about 30 min to make inferences from a single data set with the above settings.

Experiments and discussion
In this study, three experiments were conducted.The first is an experiment to confirm the usefulness of the proposed method.
The second and third are experiments to test the robustness of the proposed method.Robust estimation by Bayesian inference methods has been suggested by Nagata et al. 15) and we expect to obtain similar results in our research.Therefore, we assumed two situations in which the signal-to-noise ratio of the XRR data would be small, i.e. the noise is high.One is noise due to intensity changes caused by the nature of the Poisson distribution.In general, the signal-to-noise ratio of data following a Poisson distribution is higher for data with higher counting intensity.Thus, when the incidence X-ray intensity is reduced, the estimation accuracy decreases, and in some cases, estimation may become impossible.Assuming this situation, we investigated the estimation limits when the incidence intensity is varied.As a result, we found that regions of high intensity contain a large amount of information.Consequently, in the last experiment, we examined how estimation accuracy changes when we restrict the wide-angle region, where the intensity decreases due to spectral attenuation.If it is found that estimation is possible, even in a limited data region on the small-angle side, it is expected that the measurement efficiency will improve.I, Figs. 2 and 3. Figure 2 shows that the fitting to the  Figure 4 shows that the fitting is also successful for sample B, and Table II and Fig. 5 shows that estimated values are close to the true values and the sampling is concentrated around the true values for both parameters, indicating that the estimation is correct.
These results show that Bayesian inference methods are effective in the analysis of XRR data.© 2024 The Author(s).Published on behalf of The Japan Society of Applied Physics by IOP Publishing Ltd

Estimation experiment assuming high measurement noise
The two samples A and B described in Sect.3.1 were used again in this experiment, at three different incidence intensities of 1e7, 1e5 and 1e3 [counts].First, the results for sample A are presented.The settings for the hyperparameters of the prior distribution and the inverse temperature are the same as those in Sect.3.1.From Table III and Fig. 6, it can be considered that both estimation and fitting were successful for the incidence intensity of 1e7 with errors less than 0.08% relative to the true values and for that of 1e5, which is close to those of laboratory equipment, 20) with errors less than 1.54%.On the other hand, in the case of the incidence intensity of 1e3, although  017005-5 © 2024 The Author(s).Published on behalf of The Japan Society of Applied Physics by IOP Publishing Ltd the fitting itself seems to be successful, the estimated value deviates slightly from the true value, particularly for the roughness of the second layer, σ 2 , which has a relative error of more than 6% from the true value.Consequently, it is difficult to estimate with sufficient accuracy at such a low intensity.This can also be seen in Fig. 7.The posterior distributions of each parameter for the incidence intensities of 1e7, 1e5 and 1e3 are shown in order from the top, and it can be seen that the width of the distribution widens and the accuracy deteriorates as the intensity decreases.Particularly in the case of the incidence intensity of 1e3, the distribution is considerably broadened and the accuracy of the estimation is insufficient.Therefore, there is a boundary between the incidence intensities of 1e5 and 1e3 where the estimation fails.Bayesian inference can also reveal these estimation limits from the shape and statistics of the posterior distribution. 15)he results for sample B are presented next.The settings for the hyperparameters of the prior distribution and the Fig. 9. Marginal posterior distributions of some parameters in the estimation of sample B at each incidence intensity.(a) to (c) are the results when the incidence intensity is 1e7, (d) to (f) are those when the incidence intensity is 1e5, and (g) to (i) are those when the incidence intensity is 1e3.© 2024 The Author(s).Published on behalf of The Japan Society of Applied Physics by IOP Publishing Ltd inverse temperature are the same as in Sect.3.1.Similarly, for sample B, Table IV and Fig. 8 show that both estimation and fitting were successful for incidence intensities of 1e7 and 1e5, but for incidence intensity of 1e3, the estimated values deviated slightly from the true values.In particular, the relative errors for the roughnesses of the second and third layers, σ 2 and σ 3 , respectively, exceed 8% of the true values.Thus, it is difficult to estimate with sufficient accuracy at such low intensities.This can be confirmed from Fig. 9, which shows that the accuracy worsens as the intensity

017005-7
© 2024 The Author(s).Published on behalf of The Japan Society of Applied Physics by IOP Publishing Ltd decreases and the width of the distribution increases.In particular, for the incidence intensity of 1e3, the distribution is considerably broadened, and the accuracy of the estimation is insufficient.Therefore, there is a boundary between the incidence intensities of 1e5 and 1e3 where the estimation fails.Bayesian inference can also reveal these estimation limits from the shape and statistics of the posterior distribution. 15)he above results indicate that the Bayesian inference is so robust that an incidence intensity of 1e5 is sufficient for the  017005-8 © 2024 The Author(s).Published on behalf of The Japan Society of Applied Physics by IOP Publishing Ltd accurate estimation of samples A and B. This intensity is close to the results of laboratory measurements and is sufficient for practical use.Furthermore, it can be inferred that the limit of sufficient estimation accuracy lies between the incidence intensities of 1e5 and 1e3.

Estimation experiment with restricted wide-angle range
Finally, we performed an estimation experiment with a restricted wide-angle range using data with an incidence X-ray intensity of 1e5, which is close to those of laboratory equipment, and maximum incidence angles of 0.5°, 2.0°and 4.0°for samples A and B. The restriction on the maximum angle incidence corresponds to a change in the noise intensity of the data, and the experiments in this section were performed with the expectation that they would contribute to the efficiency of XRR measurements.The settings for the hyperparameters of the prior distribution and the inverse temperature are the same as those in Sect.3.1.Table V and Fig. 10 show that both estimation and fitting were successful for the maximum incidence angles of 4.0 and 2.0°.On the other hand, in the case of the maximum incidence angle of 0.5°, the fitting itself does not seem to be a problem, but the estimated values deviate slightly from the true values, particularly the roughness of the second layer, σ 2 , with a relative error of 20% from the true value, which is not sufficiently accurate.This can also be confirmed from Fig. 11, which shows a considerable spread of posterior distributions for Figs.11(g)-(i), the maximum incidence angle of 0.5°, and the estimation accuracy is insufficient.
The results for Sample B are presented next.Similarly, for sample B, Table VI and Fig. 12 show that both estimation and fitting were successful for the maximum incidence angles of 4.0°and 2.0°.Alternatively, for the maximum incidence angle of 0.5°, the estimated values deviate slightly from the true values, particularly for the roughness of the second layer, σ 2 , with a 90% relative error above the true value, indicating that the estimation fails.This is also confirmed by Fig. 13, which shows a considerable spread of posterior distributions for Figs.13(g)-(i), the maximum incidence angle of 0.5°, and the estimation accuracy is insufficient.
In particular, comparing the results for the maximum incidence angles of 4.0°and 2.0°, it can be seen that there is no significant change in either the estimated value or the width of the posterior distribution.In the case of the maximum incidence angle of 0.5°, the estimation fails for both samples A and B. The above results suggest that there is a condition under which the estimation fails between the maximum incidence angles of 2.0°and 0.5°.Bayesian inference can also reveal these estimation limits from the shape and statistical power of the posterior distribution depending on the measurement conditions.

Conclusion
In this study, we propose a new estimation method based on Bayesian inference for the analysis of XRR data.The proposed method can evaluate the uncertainty of the estimated results by obtaining the posterior distribution of the parameters.In addition, to confirm the usefulness of the proposed method, estimation experiments were conducted under conditions where measurement noise is high.The results show that the Bayesian inference method is robust against noise.Experiments with varying intensity showed that the estimation accuracy is sufficiently high when the intensity is close to the measurement data obtained in the laboratory.Experiments with a restricted wide-angle range also showed high robustness.These features contribute to the efficiency of XRR data analysis.However, in this research, some assumptions were made in the estimation.For example, the number of layers and the refractive index of each layer were known in advance.In the future, it is necessary to remove these assumptions and develop a more practical analysis method.

3. 1 .
Confirmation of usefulnessFirst, to confirm the usefulness of the proposed method, two samples were considered and estimation experiments were conducted for each sample.One sample consists of two layers: the first layer is W and the second layer, the substrate layer, is Si.The film thickness was set at t 1 = 100 Å , and the roughness was set at σ 1 = 5 Å and σ 2 = 5 Å .The other sample consists of four layers: the first layer is W, the second layer is Si, the third layer is W and the fourth layer, the substrate layer, is Si.The film thicknesses were t 1 = 100 Å, t 2 = 40 Å and t 3 = 100 Å, and the roughnesses were σ 1 = 5 Å, σ 2 = 3 Å, σ 3 = 5 Å and σ 4 = 5 Å.We refer to the former as sample A and the latter as sample B. For the estimation of sample A, the incidence intensity was set to I 0 = 1e7 [counts], the hyperparameters of the prior distribution were a 1 = 6, b 1 = 20, c 1 = 6, d 1 = 1, c 2 = 6, and d 2 = 1, and the inverse temperature was set at γ = 1.6.The actual MAP estimates, fits and posterior distributions are shown in Table

Fig. 2 .Fig. 3 . 3 ©
Fig. 2. Fitting by MAP estimates in the estimation of sample A. Blue dots are the artificial data used for estimation and the orange line is the fitting by MAP estimates.

Fig. 4 .
Fig. 4. Fitting by MAP estimates in the estimation of sample B.

Fig. 5 .Fig. 6 .
Fig. 5. Marginal posterior distributions of each parameter in the estimation of sample B.

Fig. 7 .
Fig.7.Marginal posterior distributions of the parameters in the estimation of sample A at each incidence intensity.(a) to (c) are the results when the incidence intensity is 1e7, (d) to (f) are those when the incidence intensity is 1e5, and (g) to (i) are those when the incidence intensity is 1e3.

8 .
Fitting by MAP estimates in the estimation of sample B at each incidence intensity.

Fig. 10 .
Fig.10.Fitting results for A. In each experiment, the blue region was used as the observed data point in the estimation experiment.Gray areas were not used for estimation since they were not observed.

Fig. 11 .
Fig. 11.Marginal posterior distributions of some parameters at each maximum incidence angle in the estimation of sample A. (a) to (c) are the results at the maximum incidence angle of 4.0°, (d) to (f) are those at the maximum incidence angle of 2.0°, and (g) to (i) are those at the maximum incidence angle of 0.5°.

Fig. 12 .
Fig. 12. Fitting results for sample A. Blue data points are used in the analysis as measured data.

Fig. 13 .
Fig. 13.Marginal posterior distributions of some parameters at each maximum incidence angle in the estimation of sample B. (a) to (c) are the results at the maximum incidence angle of 4.0°, (d) to (f) are those at the maximum incidence angle of 2.0°, and (g) to (i) are those at the maximum incidence angle of 0.5°.

Table I .
MAP estimates of the parameters for sample A.

Table II .
MAP estimates of the parameters for sample B.

Table IV .
Comparison of MAP estimates and true values in the estimation of sample B at each incidence intensity.

Table III .
Comparison of MAP estimates and true values in the estimation of sample A at each incidence intensity.

Table VI .
Comparison of MAP estimates and true values at each maximum incidence angle.

Table V .
Comparison of MAP estimates and true values in the estimation of sample A at each maximum incidence angle.
©2024The Author(s).Published on behalf of The Japan Society of Applied Physics by IOP Publishing Ltd