Intercomparison of flatness measurements of an optical flat at apertures of up to 150 mm in diameter

Recently, a scientific comparison of flatness measuring instruments at European National Metrology Institutes (NMIs) was performed in the framework of EURAMET. The specimen was a well-polished optical surface with a maximum measurement aperture of 150 mm in diameter. Here, we present an evaluation concept, which allows the determination of a mean flatness map taking into account different lateral resolutions of the instruments and different orientations of the specimen during measurement. We found that all measurements are in agreement with the mean flatness map within the uncertainty intervals stated by the participants. The aim of this scientific comparison is to specify an appropriate operation and evaluation procedure for future comparisons.


Introduction
The measurement and calibration of nominally flat surfaces is important for optical systems and for reference surfaces in optics and precision engineering. This is the reason why many National Metrology Institutes (NMIs) offer corresponding calibrations. Typically, optical measuring techniques like interferometry [1], small angle deflectometry [2] or other scanning techniques [3] are applied, which reach uncertainties down to a few nanometers. Previous investigations [4] have made it clear that the measurement protocol and adjustment instructions are important for the comparability of the measurements.
Comparison measurements are an important means to test or demonstrate the agreement of measurement results within claimed uncertainties. In the following, the results of a first scientific comparison of flatness measurements, which was performed in the framework of EURAMET (project 672), the European Association of National Metrology Institutes [5], will be reported. In this comparison, 12 European and 2 non-European NMIs participated using in total 17 different instruments. The Physikalisch-Technische Bundesanstalt (PTB), Germany acted as the pilot laboratory and coordinator.
In section 2, the specimen and the comparison protocol will be described, in section 3 the analysis principles will be presented, in section 4 the evaluation of the participants' measurements will be shown and a conclusion will follow in section 5.

Measurement quantity, specimen and protocol
The quantity to be measured was the local flatness deviation [6] of a well-polished optical flat with very small contributions from waviness and roughness. The resulting height values are given in a Cartesian coordinate system with the lateral axes in the best fit plane. For simplicity, the term topography will be used throughout the text to describe the shape of the surface of the flat, i.e. the 'flatness surface' according to [6].
The essential property of the specimen is the stability of its surface. It was decided to use a Zerodur specimen to reduce temperature influences. As it is assumed that the aging effects decrease with time [7,8], a specimen manufactured several years ago was selected. It has an overall diameter of 205 mm and a thickness of 34 mm.
The back side was slightly frosted to avoid disturbing reflections. The flat was inserted into a metal housing with an inner diameter of 209 mm for protection. Cork pads with a thickness of 1.8 mm were inserted for the specimen to rest on in the horizontal as well as in the vertical orientation (see figure 1).
During transport the flat had to be fixed in the housing by a plastic screw which had to be loosened when the specimen was inserted into the measuring setup. Previous studies have shown that a deformation of the flat caused by tightening the screw is reversible if the screw is loosened.
For transport, the housing was closed with a metal plate and was inserted into a foam gap in a rigid transport case, which also contained a recording instrument measuring shock, air temperature, pressure and humidity. The aperture of the housing was a few millimeters larger than the intended measurement area to prevent possible influences from diffraction by the edge in case of non-perfect focusing. A set of masks was provided together with the specimen that should be used in an advance measurement and then carefully be removed without shifting the specimen. The masks, which are 150 mm or 100 mm in diameter, are intended to set the measuring range. They are additionally provided with arrow-like structures to identify the angular orientation during measurement. The masks can only be mounted in a defined way ( figure 1). An additional mask with a regular grid of holes was supplied for the identification of any possible lateral distortion of the measurement system. For all instruments, the lateral distortion was found to be smaller than one pixel, and therefore corrections were not necessary.
The procedure for positioning and measuring the specimen was fixed in a measurement protocol. The intercomparison was organized in such a way that the specimen was sent back to the pilot lab after each measurement by a participant (starlike comparison). A control measurement was then performed by the pilot lab to ensure that the specimen had not been damaged. Those measurements were always performed with the same instrument, a Fizeau interferometer measuring on a (1000 × 1000) pixel grid.
Of the 17 instruments which were involved in the comparison, 14 participants used Fizeau interferometers with array sizes of the camera in the range of (169 × 169) pixels to (1000 × 1000) pixels, one of the participants used an interferometer with a photographic analysis system, and two participants applied scanning instruments. Eleven participants were able to measure the full area of 150 mm in diameter and three participants measured on slightly smaller apertures (>90% in diameter with respect to the full aperture of 150 mm in diameter). Three participants measured the reduced area of 100 mm in diameter.

Method of comparing the results
Originally, it was intended to compare the results in terms of flatness tolerances according to the ISO standards [6,9]. An evaluation regarding a peak-to-valley (PV) departure is not possible as the value depends on the spatial resolution of the instruments (see figure 2). A robust amplitude parameter (PVr) [10] which combines the PV value of the low-frequency topography and the rms value (see figure 2) of the high-frequency topography was also discussed. The high-frequency topography is the residual after removing the Zernike topography as described below. Both the flatness tolerance and the PVr value were proven to be difficult as an uncertainty interval was given by the participants, related to all measured points of the topography.
It is more reasonable to compare the individual 2D measurements to a mean topography generated from the results of the participants. As the lateral resolution of the results differs significantly, we decided to restrict the comparison to the low-frequency components of the 2D topography. A lowfrequency topography deviation map of the individual results with respect to the mean topography was provided for each participant. Here, we only show the maximum deviation from the mean topography compared to the stated uncertainty for each participant.
We evaluated the low-frequency topography in terms of Zernike polynomials, which are an orthogonal basis on the unit circle. In order to take all results into account, the radial degree of the Zernike polynomials was restricted to a value of ten, given by the result with the lowest point density.
The set of Zernike polynomials A nm (r, ϕ) in polar coordinates (r, ϕ) was applied using the following mathematical formulation: if is even and 0 otherwise.
We restricted the fit procedure to the following 36 Zernike polynomials (table 1, see [11]).
The splitting of the results into low-spatial-frequency comp onents represented by 36 Zernike polynomials as defined in table 1 and high-frequency components has been identified to compose a mean topography. For the evaluation, the values for the piston and tilt were set to zero.
For those participants measuring on a regular grid, it was intended to evaluate the measurement results with an enhanced lateral resolution. In principle, this might be possible with an appropriate combination of the pixels from the individual grid to a projection on a common grid. However, this procedure was not applied here as the high-frequency topography changed during the comparison (figure 3).
3.1.1. Orientation. Originally, it was intended to use the mask with the arrow-like structures to identify the angular orientation of the specimen and to ensure a uniform alignment regarding rotation for all results (figures 4(a) and (b)). But as not all participants were able to perform the alignment measurement, we alternatively aligned the results via a correlation function (figure 4(c)). Based on the high-frequency topography, we calculated the correlations with respect to a rotation between one result chosen as a rotation reference and the individual results. For this procedure, both measurements had to be extrapolated to the same grid. If possible, both methods for aligning were applied, i.e. using the arrow structure and the correlation procedure. It was then found that the results of these two methods agree to within 0.5°. While the correlation method is independent of a possible rotation of the specimen within the housing, it is sensitive to changes of topography.
For the angular correction of the low-frequency topography, the Zernike coefficients can be recalculated in pairs by rotation matrices. These pairs of Zernike coefficients with an identical radial degree (n) and an azimuthal degree (m) with the same value but opposite signs can be transformed to a rotated coordinate system as follows:

Influence of gravity
Of the total of 17 results, seven have been gained with the specimen horizontally aligned and ten with the specimen vertically aligned. The results from both groups cannot be directly compared due to the different influences of gravity on the topography. However, since the support points are fixed, the influence of gravity can be taken into account. To minimize this influence, calculations were previously performed based on a finite element method (FEM), (SOLIDWORKS, Dassault Systèmes) to determine the optimum position of the support points with a minimum bending due to gravity. It was found that three support pads symmetrically arranged at a distance to the center of 65 mm will lead in the best case to a contribution to the topography of 6.7 nm (PV) for the 150 mm diameter area, or of 3.3 nm for 100 mm in diameter in the case of a horizontal alignment of the specimen ( figure 5(a)). For the specimen vertically aligned, gravity leads to a deformation of the topography of 0.5 nm PV for 150 mm, or of 0.3 nm PV for a 100 mm measuring area ( figure 5(b)).     horizontally aligned. The mean values of the (3, −3) Zernike coefficient differ by 4.5 nm between horizontal and vertical mountings, which is in accordance with the result from the FEM calculations (3.9 nm). Therefore it was decided to perform a joint evaluation of the topographies corrected for zero gravity, i.e. to correct the individual measurements depending on their orientation through the results of the corresponding FEM calculations.

Control measurements by the pilot laboratory
In all, PTB performed 15 control measurements at non-regular time intervals during a period of 48 months. These control measurements were performed with the specimen aligned vertically using the same Fizeau interferometer (ZYGO Verifire ™ MST) with a beam expander and transmission flat (TF) with a 300 mm clear aperture. During the time period of the comparison, one change of location of the instrument took place, where the TF had to be removed and, in all, three recalibrations of the TF were performed. The individual measurements have been evaluated in terms of Zernike polynomials and the mean low-frequency topography has been calculated (figure 7), to which every point of the individual low frequency topography has been compared. The spatially resolved standard deviation is shown in figure 7 (right). For the calculation of the standard deviation map, the lowfrequency topography of each measurement was adjusted by an offset to have the same absolute magnitude for the maximum and minimum value.
The differences between the individual topographies and the mean topography did not exceed 5.5 nm and are all within the attributed coverage interval of 11 nm. The specimen can thus be regarded as stable.

Evaluation of the results for an aperture of 150 mm
The aim of the evaluation is to determine a reference topography (corrected for zero gravity) used for a comparison with the individual results. For this purpose, the mean topography y restricted to the low-frequency components, has been calculated as a weighted mean of the low-frequency topography measurements x i with standard uncertainties u according to:   Only those measurements obtained for the full aperture of 150 mm were taken into account because the evaluation has shown that an extrapolation of data taken with an aperture below 150 mm to the full aperture of 150 mm is critical. The weighting factors are the squared standard uncertainties (u(x i )) stated by the participants. In cases where an uncertainty map was given, the highest value was taken into account. The standard uncertainty of the reference topography has been calculated from: (4) Figure 8 shows the reference topography for the diameter of 150 mm. The associated expanded uncertainty is 4.9 nm (k = 2). Additionally, the standard deviation map with a coverage factor of k = 2 is shown.
To compare the individual results to the reference topography, the following steps were carried out: • Calculations of the Zernike coefficients of the individual results.
• Determination of the angular orientation with respect to a reference measurement, either from correlations of the high-frequency components of the topography or from a measurement using a mask. • If necessary, a correction of the Zernike coefficients due to a rotational misalignment according to equation (2). • Correction of gravitational effects depending on the mounting during the measurement. • Calculation of a low-frequency topography from the corrected Zernike coefficients.
The maximum deviations of the individual low-frequency topography from the reference topography are shown in figure 9 together with the uncertainty (coverage factor k = 2). The participants are named by the international country code. For those participants who performed their measurements on a slightly reduced area, the comparison was performed on the equivalent area of the reference topography. The corresponding values are marked with (*). The measurements CZ and DE1 were performed with scanning instruments, all other measurements with Fizeau interferometers. Participants marked with (#) were not considered for the calculation of the mean value.  All measurements are in agreement with the reference topography within the uncertainty intervals stated by the participants.

Evaluation of the results for an aperture of 100 mm
For the calculation of the reference topography for a circle with a diameter of 100 mm, the results of 16 participants could be taken into account. The reference topography obtained as well as the corresponding standard deviation map are shown in figure 10; the associated uncertainty of the reference topography is 2.1 nm (k = 2). The comparison of the individual results with the reference topography shows that all measurements are in agreement with the reference topography within the uncertainty intervals stated by the participants ( figure 11).

Remarks
The results from one participant (LV) were evaluated manually and were not available as a topography map. For this reason, these measurement results could not contribute to the reference topography. The results were given in terms of PV values of the cross sections (35 nm and 28 nm) with an expanded uncertainty (k = 2) of 30 nm. The PV values of the corresponding cross sections of the reference topography are 34.4 nm and 33.7 nm for the measurement area of 150 mm in diameter. One participant repeated the measurement as presumably the transport screw was not completely released during the first measurement.

Conclusion and outlook
The concept for the execution of a flatness comparison has been successfully tested. It was intended as a scientific comparison to gain knowledge about the procedure and the evaluation of flatness measurement results from different kinds of measurement devices. Moreover, the participants have gained an insight into the capabilities of their measuring instruments in comparison with those from other National Metrology Institutes.
Conclusions which can be drawn from the comparison are detailed below.
• Although time-consuming, an intercomparison in a starlike manner is a good concept because it allows control measurements to check the stability of the specimen. • The protection of the specimen during transport seems to be sufficient, as it was not damaged. • A joint evaluation of measurements in the horizontal and vertical orientation of the specimen is possible. For this, corrections of the deformation due to gravity have to be estimated, typically done by FEM calculations. When applying a Zernike analysis, a comparison of those Zernike coefficient values corresponding to the symmetry of the support pad positions indicates whether the required level of accuracy can be reached.
Some improvements should be considered for a future comparison.
• An appropriate improved cleaning procedure should be developed and applied only by the pilot lab. • The specimen housing should be improved to prevent the specimen from unwanted rotation. • Providing two flats, one with a good flatness (<20 nm (PV), resolving effects from the calibration) and one with a larger waviness (>100 nm, resolving effects from nonlinearities and lateral resolution), could be useful.
The experiences gained in this project will help with the planning of a future comparison under the auspices of the Bureau International des Poids et Mesures (BIPM).