Artificial neural network assisted spectral scatterometry for grating quality control

Spectral scatterometry is a technique that allows rapid measurements of diffraction efficiencies of diffractive optical elements (DOEs). The analysis of such diffraction efficiencies has traditionally been laborious and time consuming. However, machine learning can be employed to aid in the analysis of measured diffraction efficiencies. In this paper we describe a novel system for providing measurements of multiple measurands rapidly and concurrently using a spectral scatterometer and an artificial neural network (ANN) which is trained utilising transfer learning. The ANN provides values for the pitch, height, and line widths of the DOEs. In addition, an uncertainty evaluation was performed. In the majority of the studied cases, the discrepancies between the values obtained using a scanning electron microscope (SEM) and artificial neural network assisted spectral scatterometer (ANNASS) for the grating parameters were below 5 nm. Furthermore, independent reference samples were used to perform a metrological validation. An expanded uncertainty (k = 2) of 5.3 nm was obtained from the uncertainty evaluation for the measurand height. The height value measurements performed employing ANNASS and SEM are demonstrated to be in agreement within this uncertainty.


Introduction
The photonics industry is steadily improving the performance of innovative projectors, displays, waveguides and augmented reality (AR) headsets.AR relies on diffractive optical element (DOE) components such as wave guide couplers to create a simulated environment that annotates or enhances the real environment so that the user can experience them as one environment [1].A fast and reliable quality control for mass manufacturing of DOEs is needed.The measurands of interest in DOEs include realised period, line width, height, side wall angles, and edge roundings of the produced gratings.Additionally, the edge roughness and undercut can be interesting for quality control.Conventional characterisation techniques, such as atomic force microscopy (AFM) and scanning electron microscopy (SEM) are too slow, cumbersome, or otherwise unsuitable for use in industrial mass production environments.
Scatterometry is a non-destructive technique that analyses the scattering of light from periodic structures to extract information about the physical and dimensional properties of the sample.Scatterometers measure the properties of light that has interacted with a diffractive structure.The most common categories of scatterometers are angular and spectral scatterometers [2].
Angular scatterometers measure the angle and intensity of incoming and/or outgoing light.The angles of the diffraction orders can be used to calculate the period of the grating [3,4].Spectral or spectroscopic scatterometers measure the spectral intensity of light that is reflected or transmitted through a sample [5].It is very fast, and able to withstand vibrations and measure even moving samples [6], but its downside is that the analysis of data can be difficult and laborious as complex inverse calculations or comparisons to library data are required.
Spectral scatterometry is a fast method but does not measure the parameters of the sample directly.Usually, the measured spectrum is compared to a theoretical signature with known grating parameters [2].This is often done using library search or non-linear regression.These methods have their own issues.The library search requires a large set of data, and non-linear regression can be slow.An alternative to these more traditional methods for solving the inverse problem is to use machine learning [7][8][9].It can be used as a more efficient alternative for solving problems that would require complex and time-consuming computation.
An artificial neural network (ANN) was developed and trained to determine the grating parameters from the diffraction efficiencies measured using a spectral scatterometer.Studies where ANNs have been used in the context of scatterometry include [7][8][9][10][11][12][13][14][15].These span a time period from 1999 up to 2022, and the increase in computational capacity during the period is clearly visible.The earlier studies had small ANNs with a single hidden layer, [7] has a deeper network structure with three hidden layers, and the training set size has increased from about 1000 to 400 000.In [9] the ANN structures are more complex, with about 20 layers, and their work utilises angle resolved spectroscopy.The use of transfer learning described in this study is a continuation in this research branch.
A scatterometer similar to a system described by Madsen et al [16] was constructed to demonstrate the capabilities of an ANN in measuring the grating parameters of DOEs.The ANN training was mostly performed using simulated data, and transfer learning was utilised to fine-tune the model.Finally, a comparison to reference samples and a preliminary uncertainty evaluation was performed for the period, step height, and line width.

Samples used for training and validation
A set of 100 diffraction grating were manufactured for development and validation of the ANN assisted scatterometry.Within the set several parameters were varied to provide large enough range for the ANN.Four wafers of fused silica (SiO 2 ) were produced.25 square-shaped DOEs were etched on each wafer, as figure 1 shows.They are arranged in a square pattern with 5 gratings on each axis.The period is varied along one axis, and the line width along the other.The step height is varied between the wafers.The different values of the period are 660 nm, 670 nm, 675 nm, 678 nm, and 680 nm.The widths of the grooves are 300 nm, 310 nm, 320 nm, 330 nm, and 340 nm.The step heights are about 220 nm, 280 nm and 300 nm.Size of a single grating is 2 mm by 2 mm.Unfortunately, one of the sample wafers was broken, so only 75 different DOEs were used in this study.
In the manufacturing process of the samples the SiO 2 wafers were first coated with a thin (30 nm) chromium layer, after which AR-P 6200 resist by Allresist GmbH was spincoated on top.Then the gratings were exposed to electron beam lithography.The resist was then developed using ethyl-3 ethoxypropionate (EEP) and rinsed with isopropyl alcohol and water.Following this, the chromium layer was etched using ICP-RIE process with a chlorine-based recipe.Subsequently the SiO 2 was also etched using the ICP-RIE process (CHF3based recipe).Finally, the remaining resist and chromium residues were removed using dry etching.The geometry of the DOEs on the wafers was designed so that it was possible to cut the wafer in two pieces, where the larger piece would contain the DOEs, and the smaller piece contains the same grating geometries for the purpose of SEM measurements of the parameters.

Spectral scatterometer
The method utilised in this work was spectral reflectance scatterometry.Different wavelengths of light are diffracted differently from a grating depending on the parameters of the grating.The spectral diffraction efficiency, η T , in transmission mode can be calculated from the spectral intensities measured by the scatterometer using formula where I x are the different spectral intensities.I S is measured from the grating, I BG is the black background, and I R is the intensity measured from a blank region of the substrate [5].In reflectance mode, the reflectance R of the substrate needs to be taken into account, and the equation becomes The spectral diffraction efficiencies can then be used to analyse the samples.Figure 2 depicts the scatterometry hardware and a schematic of the components.The scatterometer uses a Prizmatix FC-LED-BBW50 fibre-coupled white LED light source.Figure 3 illustrates the spectrum of this light source.The chosen optical fibre has a diameter of 1 mm, and length of 1.5 m.The light is collimated using a condenser lens and directed through a pair of apertures to an adjustable polariser.Then the polarised light passes through a beam splitter to an Olympus PLN 4X microscope objective which focuses the beam to the sample, where it is reflected back through the objective into a fibre coupler and a fibre to an Ocean Optics Flame-T-VIS-NIR spectrometer.The optical fibre for the spectrometer is 2 m long and the diameter is 200 µm.The design of the scatterometer allows it to be used for both reflected and transmitted beam measurements.The measurements were performed employing a Python program.
This scatterometer should only capture 0th order diffractions, so the collection angle of the objective should be chosen according to the period of the measured grating and the cutoff wavelength.The constraint on the period d can be approximated using the formula where λ min is the minimum wavelength, n is the refractive index of the incident material (air), θ NA is the collection angle of the objective [16].Numerical aperture in air is NA = sin(θ NA ).In our system the numerical aperture is 0.1, so the period of a grating should be smaller than about 1000 nm.
The different components of the scatterometer need to be adjusted for optimal results.The position of the collimation lens, alignment of the polariser, alignment of the fibre mounts, focus distance of the sample, and the placement of the sample all have an effect on the measured spectra.The positioning of the sample and the alignment of the fibres were chosen so that the measured total intensity is the highest.The collimation lens was placed according to the properties of the lens.The light was polarised to TE polarisation relative to the sample.
The changes in the measured diffraction efficiencies were investigated by changing the positions and angles of the components.The changes were a few millimetres for the position of the collimation lens, aperture size, and focus distance.For the polarisation angle, sample tilt, and fibre angles the changes were a few degrees.The largest effects were caused by moving the collimation lens, which caused the diffraction efficiency curve to shift up or down in the wavelength range between 500 nm and 550 nm.Changing the alignment of the fibres mostly changes the intensity evenly across the spectrum.Variations in the other components did not create noticeable effects, at least in such small ranges of variations.The of these investigations were used for determining the sensitivity coefficients for the uncertainty evaluation.
Once the adjustments have been set, performing the measurements is very fast.The integration time of the spectrometer is set to 8 ms.Even with such a short integration time, the beam width needs to be restricted using the apertures to prevent the spectrometer from reaching saturation.The sample holder alignment is the only thing that needs to be adjusted during measurements, and everything else should stay fixed.The transfer learning will be performed using data measured using these adjustments, and if there are changes to the device, the accuracy of the ANN will suffer.Transfer learning takes into account the possible flaws of the device.As long as the errors are not dependent on the samples but only the instrument, the ANN should perform well.

Scatterometer measurements
The measurements for the spectral intensities were done once from a blank section of the wafer for the reference intensity, and once from a beam dump for the background intensity.Then all the DOEs were measured in order without redoing the reference or background measurements.The were saved in hdf5 format, and the files contain the wavelengths, intensities for the reference, the background, and the DOEs, and the diffraction efficiencies calculated for each grating.Figure 4 shows examples of the intensity spectra required to calculate the diffraction efficiency calculated using equation (2).
The measurements were conducted in reflectance mode, so only the light reflected from the sample was recorded.When the beam of light hits the wafer, a portion of it passes through it.Some light is reflected back from the bottom surface of the wafer, causing unwanted interference effects that make the measured spectrum unusable.To prevent the internal reflections, the wafers were attached to a black backplate using index matching gel before measurements.
The samples were positioned manually during the measurements.To ensure the correct rotation angle for TE polarisation, the reflected higher diffraction orders were aligned with a reference mark in the system, as shown in figure 5.The correct lateral position was determined using the measurement software.When the measured reflection mode diffraction efficiency is the smallest, the beam of light is not outside the edges of the grating.The spot size is smaller than the gratings, so when the sample is moved the diffraction efficiency spectrum should not change while the spot is within the area of the grating.Measured data was filtered using a boxcar filter with a width of about 2 nm to reduce the noise.
The measurements were repeated in order to provide more data for the training of the neural network.There had been minor changes in the adjustments of the scatterometer, so the two sets of spectra were slightly different.In addition, some variation is introduced by noise and the positioning done by hand.

Creation and training of the ANN
To quickly analyse scatterometric data and provide grating characteristics, an ANN was developed.The training was performed in two phases.In the first phase, only simulated data was used.As figure 6 shows, the match between simulations and experiments is fairly good but not perfect, so an ANN trained using only simulated data can produce significant deviations in the .
Transfer learning can be used to tune the ANN to a new instrument, sample, or variable [17].If the major parts of the system, excluding the sample tray, were changed or moved, the measurements and the ANN transfer learning would have to be performed again in order to get the best from the ANN.As the transfer learning tunes the ANN for the specific instrument, all the deviations caused by the positioning of the fixed components are taken into account and negated.

Simulated training data
Preliminary training data was created using Rigorous Coupled Wave Analysis (RCWA).It is a powerful computational tool for analysing the interaction between light and periodic structures by solving Maxwell's equations [18].RCWA approximates the shape of the structures using stacked rectangular shapes, as figure 7 illustrates, taking into account the sidewall angles, roundness of the top and bottom corners, and other features that the structures might have.
About 200 000 spectral diffraction efficiencies were simulated with grating parameters of a similar magnitude as parameters of the reference samples.The ranges of chosen parameters are described in table 1.In addition to these, a smaller set of simulations was done with 2 nm increments for the period and line width, but the side wall angles and corner radii were constant 90 • and 15 nm respectively.
Each simulated spectrum contained the diffraction efficiency for 256 wavelengths evenly distributed between 400 nm and 750 nm.The simulated gratings consist of 15 segments.6 taller segments for the sidewalls, and 9 segments to model the rounding of the bottom corners.The simulations assume a plane wave light source at transverse electric (TE) mode and angle of incidence of 0 • from vacuum to SiO 2 .

Transfer learning reference data
The gratings were characterised using Zeiss LEO Gemini 1550 scanning electron microscope (SEM).Test gratings with identical parameters to the reference gratings were etched  along one edge of the samples for the purposes of SEM measurements.The wafers were cut so that the test gratings were cut from the main wafer, bisecting the test wafers.The smaller parts containing the test gratings were then prepared for SEM imaging.
For the SEM characterisation the test gratings were coated in a layer of chromium with a nominal thickness of 7 nm.This causes the line width of the SEM images to be larger than it is in the reference samples.Figure 8 shows an example SEM image of the cross section of a grating structure from which the dimensions were measured.
The characterisations were performed by fitting parallel lines to match the edge locations in magnified images to reach subpixel resolution.The standard uncertainty of the SEM characterisation for the period, line width, and step height were estimated to be 2.5 nm.The magnification of the SEM was separately examined by traceable AFM measurements to agree at level of 0.013%.

Structure of the ANN
The ANN was created using Tensorflow [19] and Keras [20] packages for Python.The model was constructed in two phases.In the first phase the multi-layer perceptron model is created and trained using the simulated data.The second phase includes the transfer learning where the model was fine-tuned using experimental data.
In the preliminary training phase an array of training data was generated by first randomly selecting a simulated spectrum.Then a small amount of random noise was added to it, so the spectrum would better match the real-world data.Finally, the spectrum was added to the training data set.This was repeated until the set contained 1000 000 spectra.This way there was no recognisable structure in the order of the  spectra, and more data was available for the training as some spectra were included multiple times with different noise.Finally, the data set was separated into training and test data arrays which were used in the training.The spectra were also scaled using the scaler function from scikit-learn [21] package.Early stopping condition with patience value of 50 was employed to prevent overfitting.The learning rate of the ANN was fairly low, and the network was quite large, so the training required about 600 epochs until the early stopping condition was fulfilled.At this point the ANN contains only one hidden layer.The hyperparameters of this layer seem to have little effect on the final after the transfer learning is applied.A more detailed hyperparameter optimisation was done for the transfer learning phase.
For the transfer learning another hidden layer was added to the model.The first hidden layer was frozen, and the model was then trained using experimental data measured using the scatterometer and SEM imaging.As the resolution of the measured spectra is greater than the resolution of the simulations, the closest matches between the simulated and measured wavelengths and their intensities were chosen for the training.An early stopping condition with patience of 100 was added to prevent overfitting.The value is so large because the learning rate is again very low.Training of the transfer learning layer can take over 2000 epochs.
An in-depth hyperparameter optimisation was conducted only for the transfer learning phase.The hyperparameters As figure 9 shows, the final ANN consists of an input layer with 256 inputs, two hidden layers with 512 neurons each, and an output layer with 7 outputs.It was then integrated to the measurement software, so it can analyse the diffraction efficiencies right as they are measured.

Results
The deviations in the between SEM characterisation and the artificial neural network assisted spectral scatterometer (ANNASS) are shown in figure 10.The values are for every measurement done, and each plot contains 138 points.This describes the deviation of the ANNASS and SEM data for the measurands after transfer learning.The residual standard deviations for pitch, line width, and step height are 1.3 nm, 4.3 nm, and 2.6 nm respectively.Before transfer learning the results deviated from the SEM measurements by 1.75 nm for period, −23.47 nm for line width, and 13.73 nm for step height.The residual standard deviations are 4.37 nm, 13.45 nm, and 8.55 nm for the period, line width, and step height respectively.
The values measured from the 6 reference sample gratings that were left out from the ANN transfer learning data are shown in table 2.

Uncertainty
The following presents a simplified uncertainty estimate for the height parameter measurement of ANNASS.The uncertainty components of the ANNASS can be divided to two main    The input quantities are described in table 3. The uncertainty of the h can be estimated, if the input quantity standard uncertainties and sensitivity coefficients are known, using the following formula [22]: The partial derivative represents the sensitivity coefficient c i of the quantity x i .The input quantities, respective sensitivity coefficients, and uncertainty contributions to the total uncertainty for evaluation of the step height measurement uncertainty are detailed in table 3.
The uncertainty of the wavelength scale, S λ , was estimated based on the calibration data of the spectrometer.The data has largest error of approximately 0.1 nm, which was chosen as the standard uncertainty of the quantity.The sensitivity coefficient of the wavelength scale is naturally 1.The uncertainty of the intensity scale, S I , was considered negligible based on the nonlinearity error of 0.1%.
The uncertainties of the collimation lens position, c, focus distance, f aperture size, a sample angular and rotational alignments, α s and β, and fibre alignment, α f , were evaluated by making small changes to the adjustments of the respective parameters of the scatterometer.The effects that these changes produced on the ANN results were examined to determine the sensitivity coefficients.In addition, evaluation was done for the standard uncertainty of each of the variables.These values are listed in table 3.
Contribution from noise, n, was evaluated by repeated measurements of a grating and observing the changes in the ANNASS results.Uncertainty of the background and reference intensities, I bg and I ref , were evaluated by measuring the same grating but remeasuring the background intensity or the reference intensity between each measurement and observing how the ANNASS results are affected.
The uncertainty of the transfer learning data, D tl , was evaluated by considering the accuracy of the detection of the grating parameter h from the SEM characterisation, taking into account calibration uncertainty and edge detection accuracy.This uncertainty has sensitivity coefficient of 1.The model uncertainty, or the uncertainty introduced by neural network variations, σ ANN , was evaluated by repeating the creation and training of the ANN multiple times with same data in different order.The uncertainty was chosen as the standard deviation of the difference between SEM and ANNASS measurements for h.
The expanded uncertainty (k = 2) of the period is 5.94 nm, for the step height it is 5.30 nm, and for the line width it is 7.28 nm.The results depend greatly on the data used for the transfer learning.The largest source of uncertainty is clearly the accuracy of the transfer learning data, i.e. the uncertainty of the grating parameters from SEM measurements.For period and line width, the uncertainty evaluation was performed in a similar manner.

Conclusion
In this paper is described a spectral scatterometer for photonics industry quality control purposes.The instrument utilises machine learning for spectral analysis and is capable of providing results for several measurands simultaneously and very quickly.The most important measurands in typical samples are height, pitch, line width, and side wall angles.A novel concept was demonstrated by employing transfer learning in the training of an artificial neural network.The benefits of using an ANN for inverse solving are that it is faster than traditional methods, and if the simulations do not exactly match the measurements, transfer learning will increase the accuracy of the inverse solving.Additionally, once the ANN is trained, there is no need for large libraries of simulated diffraction efficiencies.
The deviation of the results for the grating parameters by SEM and ANNASS was under 5 nm in most cases, with the largest deviations for line width.In addition, a metrological validation was made using independent reference samples.The uncertainty evaluation for the measurand height, h, resulted in the value of 5.3 nm for the expanded uncertainty.Within this uncertainty, the deviation of h values between the results obtained using machine learning and SEM measurements are shown to be in agreement.

Figure 1 .
Figure 1.One of the sample wafers.

Figure 2 .
Figure 2. Scatterometer hardware and a diagram of its components.Component marked with A is the fibre-coupled spectrometer, B marks the polariser, and C shows the sample holder.The constructed scatterometer has an additional beam splitter for a viewfinder, so the sample position can be inspected visually.

Figure 3 .
Figure 3. Spectrum of the light source.

Figure 4 .
Figure 4. Left: measured spectra from the DOE, plain substrate, and the background.Right: the calculated diffraction efficiency.

Figure 5 .
Figure 5.A sample DOE being measured in the scatterometer.The reflected light is aligned with the reference line for correct polarisation angle.The diameter of the wafer is 50 mm.

Figure 6 .
Figure 6.Comparison of simulated and measured spectra of two samples A and B. The simulated curves are created using parameters measured from the SEM images.Sample A period is 680 nm, line width is 309 nm, and height is 210 nm.Sample B period 675 line width nm, and height is 283 nm.

Figure 8 .
Figure 8. SEM image of a reference sample.

Figure 9 .
Figure 9. Diagram of the architecture of the ANN.The input parameters are the diffraction efficiencies for each wavelength, and outputs are the grating parameters.

Figure 10 .
Figure 10.Differences between the period, height, and line width measured by SEM and ANNASS.

Table 1 .
The minimum and maximum values, and increment sizes of the grating parameters used in simulations.

Table 2 .
(4)parison between the results for the three measurands acquired using SEM and ANNASS for the reference samples.ANN h (S λ , S I , c, f, a, α s , β, α f , n, σ ANN , D tl , I ref , I bg ) .(4)

Table 3 .
Uncertainty evaluation for the step height.