Depth profile determination with confidence intervals from Rutherford backscattering data

A new simulation program for Rutherford backscattering spectroscopy (RBS) together with an adaptive kernel method in the Bayesian probability theory framework was applied to the analysis of RBS data. Reconstructed depth profiles are free from noise-induced ringing, even for strongly overlapping RBS peaks. This has been achieved by the use of the adaptive kernel method, which generates the least informative depth profile according to the data and in addition allows one to calculate the uncertainty of the obtained depth profiles. The method is applied to erosion measurements of carbon samples.


Introduction
Rutherford backscattering is one of the most important and commonly applied techniques in quantitative surface analysis. Its main advantage is that it is fully quantitative. Moreover, in favourable cases, the elemental composition of the sample may be determined to precisions as high as 1% [1]. Often the depth-dependent elemental composition of the sample, the depth profile is at the centre of interest. However, the majority of spectra are complex and do not allow for a simple interpretation.
During the last decade several computer programs for the forward simulation of RBS spectra (direct calculation) assuming a depth profile such as RUMP [2] or SIMNRA [3] have been developed. These programs are well suited to compare the calculated and measured spectra from a sample of known composition. But the usual problem is to infer the depth profile from the measured spectra (inverse calculation). The determination of the depth distribution of sample constituents remains a matter of trial and error with these programs. The user has to prescribe depth profiles of all constituents and has to compare the simulated spectrum calculated from the input profiles with the data. Using repeated adaption the depth profiles are adjusted until a reasonable agreement of simulated and measured data is obtained. Obviously this evaluation procedure has several shortcomings. It is a time consuming and cumbersome task, the accuracy 11.2 of the achieved depth profile is unknown and, in many cases, there is an ambiguity between various depth profiles which fit the data equally well.
Bayes' theorem enables us to overcome this trial and error procedure and to determine the most probable depth profile underlying a measured RBS spectrum. The combination of the adaptive kernel method in the Bayesian framework [4] with an RBS simulation program allows us for the first time to carry out this backward calculation. Together with the ability of calculating the uncertainty of a determined depth profile, this extends the potential of RBS as a tool for quantitative surface analysis.
In the following section the basic ideas of the Bayesian probability theory are outlined. The RBS simulation program is described in section 3 and the experimental details are given in section 4. Subsequently, the results obtained with the presented approach are discussed in section 5, and we conclude in section 6.

Structure of the problem
Depth profile determination from RBS data is an inverse problem which is ill-posed due to energy straggling, noise and limited energy resolution. The general inverse problem can be described by: where D represents the measured data d i , i = 1...N d , n the noise and O a nonlinear operator which transforms the unknown quantity f with its N degrees of freedom into the data space. The operator O includes the model of the Rutherford backscattering process and the apparatus transfer function. In our case it is convenient to split O into The linear operator A represents the convolution of the unblurred spectrum with the apparatus function and the nonlinear operator B the Rutherford scattering process. The aim is to infer f from the knowledge of D, O and the noise statistics.

Bayes inference
Bayesian probability theory provides a self-consistent mathematical tool for such inversion problems. It allows us to exploit any type of testable information, such as experimental data disturbed by noise, or additional knowledge such as expectation values or positivity constraints. In the following we will merely outline the key ideas of the applied adaptive kernel method. The mathematical and numerical details are explained in more detail in [4,5]. Statistical inference in the Bayesian formulation is a calculus based on two axioms: the sum rule and the product rule

11.3
We consider hypotheses H, which we might have reason to formulate in the light of some background information I. The sum rule states that the probability of a hypothesis H being true plus the probability of H, the negation of H, being true add up to 1. The product rule states that the probability for H and D being true, given the background information I, may be expressed as either the probability for H being true conditional on I times the probability for D given that H is true or vice versa. Solving equation (4) for P (H | D, I) we arrive at Bayes' theorem Bayes' theorem describes a type of learning: how the probability for a hypothesis H should be modified on obtaining new information, D. The probability P (H | I) is called the prior probability for H, i.e. the probability that we assign to H being true before new data D become available. P (D | H, I) is the probability for the data given that H is true and is normally called the likelihood. P (D | I) is called evidence and is in our case a normalization constant. Finally P (H | D, I) is the posterior probability for H being true in the light of the newly acquired data D.
A simple consequence of the sum and product rule but nevertheless important ingredient of Bayesian inference is marginalization, which allows us to eliminate a parameter from our calculation that is essential but whose particular value is uninteresting by integrating it out

The likelihood function
In our case, the posterior probability P (f | D, σ, I) for a certain depth profile f depends on the measured RBS data D, the respective error σ, and on further prior knowledge I. Examples for I are the positivity of the concentration values and the known bulk concentrations. The probability that a certain depth profile is true given the data, is connected by Bayes' theorem to the term we are able to calculate and which describes the probability that we would have observed the measured spectrum given a certain depth profile P (D | f, σ, I). The likelihood function P (D | f, σ, I) encodes all the information the data provide about the depth profile. In the present case of a counting experiment with a large number of counts, the Poisson statistics can be replaced by a Gaussian likelihood function, where An estimation for the variance is usually given by σ 2 i = d i , but for an experimentally determined apparatus function a modified σ must be used [6]: A is the best estimate of the apparatus function and ∆ the matrix of the pointwise uncertainties of A. The likelihood variance from the data measurement is increased due to the contribution from the finite precision of the measurement of the apparatus matrix A. A crucial part of solving an ill-posed inversion problem is the choice of the appropriate number of free parameters (N opt ), which is usually unknown in advance. If, on the one hand, the layers the sample is divided in are chosen too small, the reconstruction is done with a resolution which is at least locally too high(N > N opt ). This leads to noise fitting and therefore to oscillating depth profiles. If, on the other hand, the layer thicknesses are too large(N < N opt ), the reconstruction fails in resolving structures which are supported by the measured data.
The adaptive kernel method is a powerful technique for adaptively reducing the effective number of degrees of freedom (eDOF) of a form-free reconstruction. This multi-resolution technique provides a local smoothness level depending on the amount of information in the data.
Smoothness of the depth profile f is imposed through a convolution of a hidden ('pseudo') profile h with a smoothing kernel K [8,9,10]: with a local kernel width b(y) depending on y. We use a Gaussian kernel, If we choose a positive h a positive depth profile f is insured, since K > 0. In the limit b → 0, the kernel K approaches a δ-function, resulting in f = h, which allows us to reconstruct arbitrarily sharp structures. We obtain the conventional result where the depth profile is reconstructed with N degrees of freedom. If we omit the convolution (b → 0) and maximize equation (7) with respect to h we do a usual χ 2 -fit. A measure for the effective number of degrees of freedom remaining after convolution with K is given by with eDOF ∈ [1, N].

Posterior probability
For the calculation of the posterior probability we need, besides the likelihood, the prior distribution and the evidence. The prior distribution P (f | I) quantifies our knowledge about the solution f prior to the measurement of the data D. The prior knowledge we include is that f (the concentrations in the individual layers) is a positive additive distribution function for which the adequate prior is the entropic one P (f | I) ∝ exp(αS) [7], with S being the relative entropy of the depth profile f relative to a default model m. We use a constant m. S penalizes structure relative to the default model, depending on the hyperparameter α. We marginalize the parameters α,h and b using equation (6), since we are only interested in f .

11.5
The posterior probability encodes our complete knowledge about the problem. Therefore the knowledge of P (f | D, σ, I) allows to determine the expectation value f : (14) and the confidence intervals via the variance var The boundary conditions for the concentrations j c ij = 1 and c ij ≥ 0 lead to an asymmetric posterior probability density function. While the mean of the posterior can still be regarded as giving the best estimate, the concept of an error-bar does not seem appropriate in this case, as it implicitly entails the idea of symmetry. So we give as confidence interval the shortest interval [f L , f U ] that encloses the correct value with a probability of 67% Finally we evaluate the multi-dimensional integral with a Markov Chain Monte Carlo integration.

RBS simulation
For RBS the operator B includes the relationship between a given depth profile f and the corresponding ideal RBS spectrum D 0 .

Sample Description
The first step in spectrum synthesis is to subdivide the sample normal to the surface. The sample is divided into layers L i , i = 1...N L , with thicknesses ∆x i . The ∆x i have to be sufficiently small to achieve the resolution supported by the data. For each layer i the concentrations c ij , j = 1...N c , (with N c the number of constituents in the sample) on the layer boundaries have to be assigned. Since the rear side of a layer coincides with the front side of the next layer, one obtains for N L layers (N L + 1) · N c concentration values. This corresponds to N = (N L + 1) · (N c − 1) degrees of freedom, because the concentrations in each layer add up to 1. Inside a layer the concentration profile is linearly interpolated. The spectrum D 0 is composed of the contributions of scattering processes of the constituents in all layers of the sample (figure 1).

Energy loss
The ion beam traverses the layers of the sample before and after the backscattering process. The two dominant processes of beam ion energy loss are the interaction with bound or free electrons in the target ('electronic stopping'), and the interactions of the beam ions with the screened or unscreened nuclei of the target atoms. The electronic stopping power data are taken from Ziegler et al [11]. The nuclear stopping power for helium is calculated with the formula given in [12].
With the stopping power dE(x) dx ij of constituent j inside layer i and concentration c ij , the total stopping power i in layer i can be approximated by Bragg's rule: , is a linear inhomogeneous differential equation with non-constant coefficients, which is solved analytically. The energy loss inside a layer is calculated by a self-consistent iterative algorithm, because the stopping power at the back side of the layer depends on the energy of the ions there and vice versa. First the stopping power f is calculated with the known composition of the layer and the given energy E f at the front of the layer. With this f the energy E b and the stopping power b is then calculated at the back of the layer. Now we make use of equation (17) and calculate successively better approximations for E b and b until a self-consistent value for E b is achieved. The calculated energy E b at the back side is the new entrance energy E f for the next layer. In most cases only 3 to 4 steps are needed for a relative accuracy of 10 −8 . This procedure replaces the usually used stepwise approximation of concentration profiles with a continuous one. Consequently, fewer layers are needed for an accurate description of a smooth profile. Since the computational cost for calculating a depth profile is proportional to N 2 L N c the corresponding reduction of computation time allows to implement an interactive simulation on contemporary hardware. Another advantage is the simple use of tabulated stopping-power data because derivatives of the stopping power are not needed. In contrast, the well established algorithm of Doolittle [2] used in many computer codes requires the second derivative. Since different codes take different ways (spline approximation, second difference etc) to calculate the second derivative, they derive different stopping powers from the same data, leading to difficulties in comparing the respective results.

Kinematics
The energy of the ions at the detector depends in addition to the energy loss due to the stoppingpower on the scattering kinematics. Ions undergoing elastic Coulomb collisions with sample 11.7 atoms are recorded in a detector which is positioned at a fixed angle θ relative to the incident ion beam. The energy E of the backscattered ions depends on the initial energy E, on the mass of the incident ions m 0 , on the mass of their colliding partner M j and the deflection angle θ given in the laboratory system and is given by: According to this equation, ions undergoing a collision with a heavy target atom lose less energy than ions colliding with a target atom of lower atomic mass.

Cross section data
The actual cross section σ deviates from the Rutherford cross section with x = m 0 M j for both low and high energies as well as at small scattering angles. Z 0 and Z j are the nuclear charges of the projectile and the target atom, respectively. The low-energy discrepancy is caused by partial screening of the nuclear charges through the electronic shells [13]. This screening is taken into account by a correction factor from L'Ecuyer [14] C(E): E CM is the energy in the center of mass system. At high energies the cross sections deviate from the Rutherford cross section due to the influence of the nuclear force [15], which is, however, of minor importance in the present case. Deviations from the angular dependence of the Rutherford cross section are negligible for Rutherford backscattering spectroscopy, because of the large scattering angles.

Energy loss straggling
The energy loss of charged particles penetrating the material is accompanied by a spread of the beam energy due to statistical fluctuations of the energy transfer in the various energy loss channels. Since the number of interactions is high, the energy broadening can be approximated by a Gaussian. The program uses the correction of Lindhard and Scharff [16] in conjunction with Bohr's theory of electronic straggling [17]. This correction gives electronic straggling values κ e which are considerably lower than those given by Bohr's theory. For high precision Chu's corrections [18] may be included. For the nuclear energy loss straggling of constituent j inside layer i κ ijn Bohr's theory of nuclear straggling is used: Usually the nuclear energy loss straggling is much smaller than the electronic energy loss straggling. Since we are dealing with Gaussian distributions, the total straggling of a constituent j inside layer i is given by quadratically adding the two independent contributions of electronic and nuclear straggling: For compounds, a simple addition rule, similar to Bragg's rule, for energy loss straggling is used [19]. The straggling in the composite layer i is the quadratic sum of the straggling of each constituent weighted with the concentration c ij : The energy dependence of the stopping power inside a layer results further in a non-stochastic broadening (or squeezing) of the energy distribution of the ion beam. This effect is calculated analytically under the used linear approximation and leads to a correction factor g i . The standard deviation κ i of an ion beam with the initial standard deviation of κ i−1 after passing layer i is given by The contribution of geometrical straggling due to finite beam size and width of the detector aperture is negligible in the RBS geometry, unlike in ERDA experiments [20].

Plural and multiple scattering
A principal assumption used in the simulation of RBS spectra is that the incoming particles suffer only one significant angular deflection, i.e. the Rutherford backscattering event. A particular fraction of the beam, however, undergoes further significant deflections along the incoming or outgoing trajectories. This is called plural scattering. Furthermore, the particles are multiply scattered at small scattering angles. It is possible to calculate plural scattering effects, which is, however, extremely time consuming. Hence the calculation of dual scattering is usually only used to check if plural scattering has a significant influence on the calculated spectrum. Multiple scattering and the resulting energy spread has been recently reviewed by Szilàgy [21], but is not included in the program.

Experiment
The interpretation of RBS data is required for the analysis of erosion measurements of plasma facing materials in the ASDEX Upgrade fusion experiment [22]. The solid inner walls surrounding the plasma are exposed to an intense bombardment by plasma particles because the confinement of the plasma by the magnetic field is not perfect. The surfaces of the inner walls are mainly modified by ion implantation, erosion and by deposition of material from other wall areas. The importance of this problem for planned fusion power plants is emphasized by an erosion analysis for ITER [23]. The modeled gross erosion yield of a carbon divertor could reach a maximum of 5m/burning-year, which is reduced by redeposition to about 0.5m/burningyear according to the simulation. The modeling, however, faces exceptional difficulties due to complex hydrocarbon transport phenomena and the lack of input data (e.g. for low energy sputtering). Therefore, experimental determination of erosion and redeposition rates is necessary to validate the modeling and to improve the quantitative knowledge of the fundamental erosion processes.
To determine carbon erosion rates in the divertor of ASDEX Upgrade, graphite probes covered with a 150 nm layer of 13 C were exposed to single plasma discharges. 13 C was used because chemical erosion is unaffected by isotope substitution and to allow the measurement of redeposited 12 C eroded at other plasma facing components. Furthermore, the electronic stopping power in 13 C and 12 C is the same and so the limited accuracy of the stopping power in the simulation is cancelled. The sample was exposed in the outer divertor of ASDEX Upgrade at the strike point region, which is the point where the outermost closed magnetic flux line touches the plate surface with a corresponding maximum of the power load.
The samples were analysed using RBS with 2.0 MeV 4 He ions before and after plasma exposure with a total plasma exposure time of approximately 4 seconds. The backscattered particles were detected at a scattering angle of Θ = 165 • . The apparatus transfer function A in equation (2) is given pointwise by the RBS spectrum of a thin (0.7 nm) Co film on top of a Si sample (see figure 2). The statistical uncertainty of the measured apparatus transfer function modifies σ i of equations (7) and (8) via equation (9). Notice the deviation from the Gaussian shape at the high energy side (channel no. > 0). The apparatus function is energy dependent, but its shape varies only slowly with the energy. For simplicity we use the same apparatus function for all energies. Figure 3 shows typical spectra before and after plasma exposure. Before plasma exposure the signal from the 13 C layer at 430-580 keV is separated by a gap from the part of the spectrum corresponding to the 12 C bulk material beneath the 13 C layer. After plasma exposure the high energy edge of the 13 C signal has shifted towards lower energies (500 keV). This indicates that there is no longer 13 C at the surface of the sample. The peak at 430 keV is caused by 12 C at the sample surface and by the 13 C fraction below the surface. The difference of the RBS spectra before and after exposure contains the information about the erosion and redeposition yields.

Results
For determining the depth profiles from the measured RBS data a simple χ 2 -fit is insufficient and results in depth profiles with negative concentration values. Even a constrained χ 2 -fit with the boundary conditions c > 0 respectively h > 0 and j c ij = 1 leads to pointwise oscillating depth profiles as shown in figure 4. This is due to the ill-conditioned nature of the inversion problem which results from the spectral broadening by energy straggling, the finite apparatus energy resolution and the counting statistics. Under these circumstances, the data don't contain  Figure 5. 12 C and 13 C distribution before (a) and after (b) plasma exposure with asymmetric confidence intervals using the data shown in figure 3. The sample was divided in 23 layers with a thickness of 150 × 10 15 atoms/cm 2 .
enough information for a stable inversion, even in the present case N = 48 < N d = 221.
For this kind of problems the adaptive kernel method is well suited. The concept of adaptive kernels provides local smoothness which makes the result robust against noise corruption. The statistically significant information content in the data is optimally preserved by the local varying kernel widths. Figure 5(a) shows the reconstructed 12 C and 13 C depth profiles of a sample before plasma exposure with the depth expressed in terms of the areal atom density. The underlying RBS data are given in figure 3 as a solid line. The concentrations in each of the 23 layers add up to one. The surface concentration of 13 C (on the left-hand side) is above 90% and decreases slightly to a depth corresponding to 2000 × 10 15 atoms/cm 2 . The remaining 10%-20% fraction of 12 C is caused by impurities in the coating process. The broad transition between the 13 C layer and the 12 C bulk can be explained by the interface roughness of the virgin sample.
After 4 seconds of plasma exposure the depth profiles have changed dramatically, as shown in figure 5(b), calculated with the RBS data given as a dashed line in figure 3. There is a 12 C layer with a thickness of approximately 250 × 10 15 atoms/cm 2 on top of the 13 C. The maximum concentration of 13 C has decreased, however, the thickness of the 13 C layer is nearly unchanged corresponding to approximately 2500 × 10 15 atoms/cm 2 . Furthermore, there is a continuous fraction of 12 C in the whole sample with a minimum concentration of 20%. The soft transition of the 12 C concentration at the surface to the 12 C concentration inside the 13 C layer can be interpreted by 12 C atoms deposited into the pores of the EK98 graphite used as base material. This conjecture is supported by the small (about 5%) but significant amount (note the very small confidence intervals of 16 O in figure 5(b) of 16 O inside the whole 13 C layer after exposure, because oxygen forms, apart from carbon, the second largest impurity fraction of the ASDEX Upgrade plasma. Inside the deposited layer of 12 C at the surface of the sample the 16 O concentration decreases from 20% at the surface to 5%. A detailed report of the erosion and redeposition in fusion experiments will be presented elsewhere [24]. Notice the smaller size of the confidence 11.12 intervals in figure 5(b) compared to the initial measurement ( figure 5(a)). This is due to the better counting statistics (about 10 times higher ion beam fluence) which is also visible by the lower noise in the underlying spectrum (figure 3, dashed line). Figure 6 shows the RBS data as black dots and the calculated RBS spectrum (solid line) based on the depth profile shown in figure 5(b). The agreement is within the counting statistics. The effective number of degrees of freedom of the depth profiles before and after exposure are 7.2 (N = 24) and 15.3 (N = 48), respectively. This indicates a strong correlation of the concentrations between adjacent layers.

Conclusions
By combining an RBS simulation program with the adaptive kernel method the potential of depth profile reconstruction from RBS spectra has been considerably extended. It is a powerful method for obtaining information from measurements with limited resolution and/or overlapping RBS peaks. The unique property of depth profile reconstructions and simultaneous determination of confidence intervals allows a reliable estimation of the quality of achieved depth profiles.