Comparison of material measures for areal surface topography measuring instrument calibration

The calibration of areal surface topography measuring instruments is a topic that is currently under discussion in international standard committees and a specification standard that defines the so-called ‘metrological characteristics’ has been published. For the broad industrial adoption of the metrological characteristics for calibration, however, clear and easy-to-apply calibration guidelines are required. Thus, a single-sample calibration artefact has been developed which allows the determination of the standardised metrological characteristics. In order to promote the adoption of the metrological characteristics calibration framework, measurements of the material measures with different instruments have been conducted and the results are presented. We report early work on the uncertainty of the comparison results and discuss systematic deviations between the various surface topography measuring instruments.


Introduction and state of the art
In the 1990s, there were two key European projects on areal surface texture measurement and the results were published in the so-called 'Blue Book' [1] and 'Green Book' [2]. Following these publications, Working Group 16 'Areal and profile surface texture' of the ISO Technical Committee 213 was formed. Since then, the work on the ISO 25178 series of standards 'Geometrical Product Specification: Surface Texture: Areal' has resulted in the standardization of the areal surface texture parameters in ISO 25178 part 2 in 2012 [3]. This shows that the industrial application of areal surface texture measurement is relatively young and there have not been many comparisons of various areal surface topography measuring instruments.
In contrast, there have been several comparisons for surface texture parameter measurements using profile material measures. In 2002, a comparison of a profile sinusoidal material measure and a lapped ceramic sample with optical and tactile measuring instruments showed significantly different results [4]. Further comparisons of ceramic samples with optical measuring instruments and of periodic gratings with different measuring principles, have led again to significant deviations [4]. One reason for these deviations was identified as the missing calibration and standards infrastructure for optical measuring instruments [4]. Koenders et al published a comparison of step height measurements with different measuring principles and observed a 'good agreement' of most results [5].
Results of further comparisons have been published for the profile material measures standardized in ISO 5436-1 and the results often show significant differences, highlighting the need for a common framework for instrument calibration and comparison [6][7][8][9][10][11]: For stylus instruments, Doytchinov et al compared the measurements of step height material measures, profile roughness material measures and a sinusoidal profile and observed good agreement between instruments [6]. Koenders et al compared atomic force microscope (AFM) measurements of step heights and found that some measurements featured relatively large uncertainties [7]. In another AFM comparison using step height and grating material measures, it was concluded that even the results from high-precision measuring instruments need 'to be very carefully examined' when being applied for calibration measurements [8]. Baker et al measured multiple types of profile material measures with varying measuring principles and concluded there was a 'mixed' performance [9]. Thalmann et al examined various types of material measures described in ISO 5436-1 that were measured mostly with stylus instruments and concluded that there was 'a relatively large number of inconsistent results' [10]. Barsic et al in contrast examined only the case of groove depth measurement and observed a good agreement even when comparing different measuring principles [11].
For areal surface texture measurement, the calibration processes to be applied are still under discussion, and are summarised in depth elsewhere [12]. The ISO standardization will be based on three different parts. The first part concerns the material measures to allow determination of the properties to be calibrated, which are standardized in ISO 25178-70 [13]. The second part concerns the metrological characteristics that need to be considered for the calibration. These are described in the recently published standard ISO 25178-600 [14]. The calibration itself is the third part which will be described in the ISO 25178-700 which is currently a draft international standard [15]. Figure 1 summarizes the current state of standardization.
Comparisons have been made for the evaluation of areal surface topography. Tosello et al performed a comparison of areal surface texture parameters with optical measuring instruments by measuring profile material measures type C and D of ISO 5436-1 that were replicated in a polymeric material [16]. The results showed large deviations from an AFM reference measurement [16]. Townsend et al described a comparison of different computed tomography (CT) devices where areal surface texture parameters were evaluated and a good comparability to optical reference measurements was observed [17]. In preliminary work of some of the authors, the acquisition of functional surface texture parameters for practical surfaces was compared between optical and stylus instruments and locally large deviations were observed [18]. Pawlus et al carried out a similar comparison of tactile and optical measurements of engineering surfaces and concluded that 'large discrepancies' can occur for certain types of surfaces [19]. When the transfer behaviour of various instruments was compared with a linear approximation by some of the authors of this publication, similar results throughout different measuring principles were obtained, but the different physics can locally lead to strong deviations in the transmission of certain features [20]. Despite these sometimes worrying discrepencies between instruments, when careful analyses are carried out and adjustements for spatial frequency are taken into account, the difference between optical and stylus measurements can be as low as a few nanometres [21].
As described by the numerous comparisons presented above, the differences in the measurement results of various measuring instruments can be due to different effects: different physical measuring principles, different instrument parameters, e.g. the applied light wavelength or different measuring parameters such as the lateral point spacing result in different transfer behaviours. Thus, in order to achieve comparable results, a rigorous calibration of all instruments is required to estimate measurement uncertainty and to ensure the traceability of the obtained results [13]. Due to the third instrument axis (y-axis) and the use of many different measuring principles, the calibration of areal surface texture measuring instruments is more complex than that of profile-based instruments and requires the determination of a relatively large number of metrological characteristics. The application of the metrological characteristics and the comparability of their acquisition have not been widely examined yet. There is a range of calibration processes for areal surface topography measuring instruments which have not yet been defined in the current standardization. The ISO 25178-70 contains 24 different types of material measures in which the various characteristics to be calibrated are mapped [13]. Depending on the geometry and size of the structures to be manufactured, a wide range of different manufacturing principles has been utilized for their production. An overview on previous studies on the manufacturing of material measures has already been presented by some of the authors [12,22,23]. The methods range from ultraprecision cutting processes, such as micro-milling [24][25][26][27] to classical processes from microtechnology, such as focused ion beam [28] or lithographic methods [29].
Calibration procedures for the determination of the metrological characteristics have been described by Giusca et al [30][31][32]. In order to enable a comprehensive calibration with one set of material measures, the National Physical Laboratory (NPL) subsequently proposed the so-called 'Bento Box' [33] which was then further developed into an areal calibration standard that features various geometries for calibration on one sample in 2019 [34]. In previous research by some of the authors, direct laser writing (DLW) was used for the manufacturing of material measures. A feasibility study was carried out which showed that almost all geometries of ISO 25178-70 can be fabricated with DLW [22]. Additionally, the resolution limit of the process was determined with a calibrated AFM [35] and the chemical stability of the manufactured samples and their scaling were investigated [36]. In this study, it was shown that the chemical stability of the samples can be achieved with a UV-post processing and that iridium is a suitable coating material that features a high reflectivity for optical measurement and a high degree of homogeneity of the coating [35,36]. As a result of these investigations, a 'Universal Calibration Artefact' was proposed in 2018 that can map all basic metrological characteristics of the ISO 25178-600 on one sample [36]. The automated evaluation of the various material measures is discussed elsewhere [37].
In this paper we report on a comparison measurement of the structures of the Universal Calibration Artefact described in [36] with various measuring instruments and principles to evaluate the influence of different measuring equipments on the measurement results and the ISO metrological characteristics. The aim of the comparison is to investigate the capabilities of the calibration artefact for optical surface topography measuring instruments and to 'road-test' the new ISO drafts for the calibration of surface topography measuring instruments. The sample consists of features that can map the relevant metrological characteristics defined by the current ISO 25178-600 [14]. The samples and applied measuring instruments are described before the results are presented, the measurement uncertainty is estimated (where possible) and the metrological characteristics are correlated with parameters of the measuring instruments.

Applied samples
The Universal Calibration Artefact is designed to enable a comprehensive calibration of areal surface topography measuring instruments. The material measures included on the sample allow a determination of the metrological characteristics as described in ISO 25178-600. As shown in figure 2, six different geometries are available to map these basic metrological characteristics. To enable a calibration of varying microscope magnifications between 5× and 100×, all six material measures are present in different sizes ranging from 100 μm×100 μm to 800 μm×800 μm in four steps which results in twenty-four material measures on the sample (figure 2, see [36] for a detailed description). For this study, a total of twelve material measures with the sizes 100 μm×100 μm and 200 μm×200 μm were selected for the comparison.
The Universal Calibration Artefact is manufactured with DLW and the manufacturing process of each material measure was optimized in several iterations in order to meet the target geometry. In doing so, deviations caused, e.g. by the proximity effect, were determined after every correction step and the target dataset was adapted in order to reduce these deviations. The manufactured artefact is capable of performing a comprehensive calibration for all the metrological characteristics as defined in ISO 25178-600 [14]: The following material measures are included: (1) Star material measure (type ASG according to ISO 25178-70): allows the determination of the topographic resolution of measuring instruments. The spatial frequency transmitted with an amplitude of 50% of the original amplitude is usually calculated (the spatial period limit) [32].
(2) Chirp material measure (type CIN): some aspects of the topographic fidelity can be determined, which will also be related to the topographic resolution in ISO 25178-600 [38]. The material measure with a size of 100 μm×100 μm exhibits twenty different wavelengths between 9.46 μm and 0.47 μm (that are scaled for the 200 μm × 200 μm material measure).
(3) Flatness material measure (type AFL according to ISO 25178-70): the instrument noise and flatness deviation can be determined. This is achieved by determining the areal surface texture parameters as described in the current draft of ISO 25178-700 [15]. The parameter S q,noise is determined based on the subtraction of two repetitive measurements [30].
(4) Radial sine wave (type ARS according to ISO 25178-70): A general calibration of all three axes of the measuring instrument is possible. The measurands S a and S q can be determined as stated in ISO 25178-70.
(5) Cross-grating (type ACG according to ISO 25178-70): the cross-grating material measure is used for the calibration of the lateral axes. The metrological characteristic, as defined in ISO 25178-600, is the local x-y mapping deviation. The linearity deviations, amplification coefficients and the perpendicularity of the lateral axes can also be determined [31]. The position of all dales is evaluated and the mean difference to all nominal positions is calculated to provide a measure for the mapping deviation [37].
(6) Irregular rough surface (type AIR according to ISO 25178-70): the irregular rough surface is applied for the calibration of the acquisition of areal amplitude-based surface texture parameters. The surface is based on a real engineering surface and features a defined distribution of height values [39]. It enables the calibration of the linearity deviation and the amplification coefficient of the height axis [40]. In doing so, the response function of the height axis is determined by imaging all measured height values as a function of the certified height values [37].
The Universal Calibration Artefact allows a calibration at different microscope magnifications without changing the sample. For this purpose, the described material measures are manufactured with different lateral scales. The resulting measurands are summarized in table 1. The nominal values in the table were calculated based on the target geometry for manufacturing.
The design leads to twelve material measures per sample. The twelve material measures are referenced based on their types and sizes as ASG100, ASG200, CIN100, CIN200, AFL100, AFL200, ARS100, ARS200, ACG100, ACG200, AIR100 and AIR200. Three identical samples S1, S2 and S3 were manufactured and circulated to the participants of the comparison.

Participating laboratories
There were nine participants from five countries including research groups, national metrology institutes and manufacturers of measuring instruments. Table 2 lists the participants of the comparison.

Applied measuring instruments
Tracable reference measurements of all three samples were performed at the Physikalisch-Technische Bundesanstalt, Germany. For the first sample S1, all material measures with a size of 100 μm×100 μm were measured three times and for the other two samples (S2, S3), two selected material measures (types CIN and AIR) were measured in order to evaluate the reproducibility of the manufacturing process and to distinguish between the manufacturing uncertainty Table 1. Nominal measurands and metrological characteristics of the material measures [36].

Type
Size 100 μm×100 μm 200 μm×200 μm ASG spatial period limit of the instrument CIN topography fidelity limit (see ISO 25178-700) AFL S a /μm 0.000 0.000 S q /μm 0.000 0.000 S z /μm 0.000 0.000 linearity deviations l x, l y amplification coefficients α x, α y x-y mapping deviation (see ISO 25178-700) ACG and the measurement uncertainty. The reference measuring instrument is a metrological large-range AFM (Met. LR-AFM), which has a measurement volume of 25 mm×25 mm×5 mm [41]. All reference measurements were performed with a scan size of 89.9 μm×89.9 μm with a lateral spacing of 0.1 μm. The Met. LR-AFM has a traceability of its motion in the axes through laser interferometers [41]. A cantilever tip with a tip radius of about 10K20 nm and a tip height of 10-15 μm was applied for the measurements.
In addition, a number of optical surface topography measuring instruments were applied for the comparison. The instruments are anonymously assigned with numbers 2 to 29 (with 1 used for the Met. LR-AFM). Different objectives or adjustment settings of the same instrument are also distinguished by different instrument configuration numbers. The measuring parameters of all instruments are summarized in table 3. For the comparison, with the exception of the reference measurements by the Met. LR-AFM, coherence scanning interferometers (CSIs), confocal microscopes (CMs), focus variation microscopes (FVs) and phase-shifting interferometers (PSIs) were used. The instruments are classified by the parameters magnification, numerical aperture (NA-which describes a measure for the largest specular acceptance angle of an optical system) and lateral spacing (Δx, Δy).

Measuring strategy
The measured sample was aligned in the measuring instruments by ensuring that the edges of the field of view and the edges of the material measures were as parallel as possible. The set of material measures (either 100 μm×100 μm or 200 μm×200 μm) is measured with the highest possible magnification factor, providing that one individual material measure could fully fit into the field of view.
Two rounds of measurements have been carried out in the comparison. In the first measurement round, one set of measurements with the measuring instrument configurations 2 to 21 of table 3 was conducted-including one measurement per material measure and a repeated measurement of the type AFL material measure. This leads to a total of seven measurements per measurement setup and scaling of the material measures. The reference measurements using the Met. LR-AFM were introduced in the section 2.3. For the sample S1, all material measures with a size of 100 μm×100 μm were measured. For the other two samples, the material measures CIN and AIR were measured. All measurements are repeated three times to quantify the repeatability, leading to a total of thirty reference measurements. The results of the first measurement round are detailed in section 3. The first round of measurements is evaluated to observe systematic influences caused by different measuring instruments.
In the second measurement round, five repetitions of all material measures were performed using the different instruments, i.e. thirty measurements in total per setup and scaling. The repetitive measurements of the second round of measurements were taken with instrument setups 4, 6, 7, 9, 10, 12 and 18 to 29 of table 3. Some participants of the first round did not participate in the second round of measurements and some had changed their instrument configurations in the meantime so that a new instrument configuration number was assigned. Also, some participants used additional instruments in the second round. The results of the second measurement round are detailed in sections 4-5. With the results of the repetitive measurements, statistical influences on the results can be characterized.

Data evaluation
All evaluations were carried out using a single set of algorithms-the raw data was collected from the participants and all data was evaluated by the same operator with the same software. Raw data was specified as topography data which is: 3. Not aligned.
The data was evaluated with the routine shown in figure 3. The pre-processing was conducted in three steps a) cropping, b) F-operator and c) S-filtering. Whereas the first and second pre-processing steps were applied for all types of material measure, step c) was only used for the material measures AFL, ARS and AIR. Thus, the order of the pre-processing steps that were applied before the actual evaluation was chosen to be different from those suggested in ISO 25178-2 to allow separation of the mandatory and optional preprocessing steps (see figure 3).
In the first step, the evaluation area was cropped from the measured dataset. For all material measures except the type CIN, the inner 80% of the functional surface was selected for evaluation. For the chirped standard (type CIN), the entire functional surface (featuring a size of 100 μm×100 μm or 200 μm×200 μm each) was extracted from each topography. The evaluation area was localized based on the centre of the material measures ARS, ASG and ACG. For the material measures CIN, AFL and AIR, the central evaluation area was localized based on the edges of the material measures. Subsequently, the pre-processing was applied by an F-operator (plane-fit) for all material measures and an S-filter (Gaussian filter of ISO 16610-61 with a nesting index of 3.2 μm) for the material measures types AFL, ARS and AIR. Bandwidth matching is an essential requirement for comparison measurements [42]. The S-filter was applied according to ISO 16610-61 and ensures that a comparable bandwidth is present in the different datasets. In doing so, the nesting index was chosen based on the given bandwidth limitations resulting from the various instruments. No interpolation and L-filter were applied. For the material measures that evaluate metrological characteristics that are associated with the resolution limit or the lateral transmission characteristics (type ASG, CIN and ACG), no bandwidth-limitation was considered.
Subsequently, the individual measurands of each material measure were determined. This was done by using the software Opti-Check (Opti-Cal GmbH) that was specifically designed for the evaluation of the six types of material measures. The applied evaluation algorithms for the determination of the metrological characteristics are described in detail elsewhere [37].

Systematic influences of different measuring instruments
The systematic influences of the various measuring instruments on the metrological characteristics were examined in the first round of measurements. In doing so, only one measurement of every material measure was acquired by optical tools (except of the type AFL material measure which had two repetetive measurements). Their results were compared to the reference measurements taken by the Met. LR-AFM where three repeated measurements have been taken on each material measure. Overall, twenty different instrument configurations were used. The results of the different material measures are summarized in figures 4-12 and figures A.1-A.18 in appendix A. It should be mentioned, in these figures only the reference value is given with an error bar which indicates a 95% confidence interval based on the student's t-distribution of its repeat measurements (type A analysis as described by the guide to the expression of uncertainty in measurement (GUM)). Repetitive measurements of other measuring instruments in order to analyze the random element of the measurement uncertainty are presented in section 4.
For the comparison, the mean values of the Met. LR-AFM measurements (Instrument configuration 1 or 'Reference') are indicated as reference values if applicable-otherwise the arithmetic mean value of all  the measuring instruments is indicated as a reference value (dotted lines). For the resolution characteristics no comparison to the reference is intended as the results are instrument characteristics, i.e. there is no reference value. The three individual artefacts (labelled as samples 1, 2, 3) were manufactured with identical configurations and describe the workpieces that were measured. However, to be able to distinguish between the manufacturing uncertainty and the measurement uncertainty, the three samples are evalaluated individually, indicated with the different colours blue, red and green. The numbers 2-29 on the x-axis indicate the applied instrument configuration for the measurement (see table 3). In the following, the results for the six different types of material measures are described for the first round of comparisons.

Type ASG material measures (star)
For the ASG100 material measure, the lateral period limit values are similar for all instruments used. Except of one outlier, all values are between 3 μm and 5 μm (see figure 4) with most values featuring a lateral period limit of approximately 4 μm. As the resolution is an instrument-specific criterion it cannot be expected that all examined instruments feature identical values. Thus, also no reference line is indicated. The corresponding results of the 200 μm material measure can be found in appendix A (figure A.1). It is worth pointing out here that the lateral resolution values that would be calculated from the Sparrow resolution criteria (often used in manufacturer's specifications) would typically be less than 1 μm for these instruments-this highlights the need for a more realistic specification of the areal topographic resolution [12]. The definition of the instrument resolution should rather be based on the measurement of an actual topographic structure than on the possibility to distinguish two topography points. The lateral period limit is a suitable measure to describe an actual structure resolution.

Type CIN material measures (chirp standard)
The chirp standard is evaluated as a second material measure for a resolution criterion. For the 100 μm material measure, the results scatter more than in the previous evaluation of the star-shaped grooves material measure. The reason is that the various measuring instruments react differently to the steep angles on the material measure and feature a broad variety of different transfer behaviours. Due to the discrete series of wavelengths, also the results for the evaluation of repetitive measurements scatter more significantly as a small change in amplitude can shift the 50% transmission limit to another wavelength [20]. However due to the application of a sinusoidal surface topography to determine the resolution limit, the CIN material measure also provides a practical way to characterize the lateral resolution capabilites of surface topography measuring instruments. As the lateral resolution is an instrument-specific property, no target value is defined and it is clear that the resulting values differ systemtically for the different measuring instruments. The results of the small scale fidelity limit determined by the CIN100 material measure measurements are summarized in figure 5; the results of the CIN200 material measure are given in appendix A ( figure A.2). The small scale fidelity limit represents the shortest wavelength that can be transferred with an absolute relative amplitude deviation of less than 50%.

Type AFL material measures (flat surface)
Many optical measuring instruments feature noise values in the sub-nanometre range. Also here, the results are reproducible for many measuring devices (see the results for the parameter Sq, noise in figure 6). The results are independent from the size of the   material measure (see appendix A - figure A.3). Also, the measurement noise is an instrument characteristic which is specific for each instrument. Thus, the values are highly scattered. It is also possible that some instruments that obtain a noise value in the picometre range perform an internal averaging process throughout the acquisition time. As such implementations are usually not revealed by the instrument manufacturers it cannot be completely ruled out that the raw data of the instrument was determined based on such operations.

Type ARS material measures (radial sinusoidal surface)
When the areal surface texture parameters of the radial sinusoidal geometry are compared, most values scatter about±10% around the nominal value   In order to explain the differences in the transfer behaviour, some examples of measurement results are compared qualitatively. Figure 8 illustrates different surface topographies measured from sample 1 measured with the instruments 1, 4, 6, 14, 19 and 21. A good qualitative comparison can be observed-only instruments 4 and 6 feature areas of non-measured points at the locations where steep angles are present on the material measure. It can be observed that the differences in the surface texture parameters result primarily from the areas that feature steep angles.

Type ACG material measures (crossed grating)
When the amplification coefficients and the linearity deviations of the lateral axes are evaluated, a good agreement can be observed. Most linearity deviations differ less than 1% from the nominal value, as shown in figure 9. Most linearity deviations feature a value of approximately 200 nm. The linearity deviation indicates the largest deviation of an inidvidual position of a dale with regard to its nominal position. The lateral transmission behaviour is in good agreement with the desired properties. This is also indicated by the mean absolute deviation that describes the average difference between the nominal and actual centre of each dale. This parameter is calculated by considering both the deviations in x and y directions and determining the absolute distance between the actual and nominal positions [37]. Most measuring instruments feature a mean absolute deviation of 300 nm or less-indicating an overall reliable lateral transfer behaviour. The remaining results can be found in appendix A (figures A.7-A.13).

Type AIR material measures (irregular surface)
The transmission behaviour of the height axis is examined with the AIR material measure based on the measured height distribution which is imaged as a function of the target height distribution for the determination of the response curve of the height axis. The amplification coefficients for the inner 80% values of the height distribution are shown in figure 10. For some instruments, large deviations from the mean values and the nominal value of 1 can be observed. The reason is again the occurrence of steep angles that are transferred differently by the various instruments. The AIR material measure has steep angles-this results in not all examined measuring instruments transfering the entire surface topography reliably. This results in values of the amplification coefficient between 0.4 and 1.3.
The impact of steep angles on some instruments can also be observed when the linearity deviation values are compared as shown in figure 11. Whereas many instruments feature linear transfer behaviour, significant deviations of some instruments can be observed that are caused by the deviations in steep areas-leading to nonmeasured points or imaging artefacts. The impact of the slope on the response function of the various instruments can be observed by comparing the measured surface topographies. Examples are presented in figure 12, where the topography results of the instrument configurations 1, 4 and 14 are given. The measured surface topographies feature a different number of non-measured points-depending on the transfer behaviour of the specific instrument. This can result into a strong deflection of the measured height distribution. When the measured height values are plotted as a function of the target height values, the response function is determined, which also deviates significantly from the nominal behaviour in some cases. This influences the determined values of the amplification coefficient and the linearity deviation. In the bottom figure, the response function of instrument 14 with the target values being defined as the measured height values of a reference measurement of instrument 1 is shown. The reference data could in this case be applied for a calibration of the material measure. Again, the corresponding results for the 200 μm material measure are provided in appendix A (figures A.14-A.18).
The results of the first set of measurements demonstrated that some geometries were measured with a high conformity throughout the broad variety of measuring instruments, while others led to a large variation of the results and significant deviations to the nominal values of the metrological characteristics. By comparing all results, it can be assumed that geometries from material measures which do not feature large slopes (type ASG, AFL and ACG) are in better compliance. When large slopes are present-which is especially oberservable for the type AIR and CIN material measures, the results scatter much more due to the occurrence of non-measured points. The affected areas lead to the most significant deviations of the transfer behaviour. In order to provide a more sound statistical significance for the observed tendencies, a second round of comparisons was performed with repetitive measurements. The second set of comparison measurements was used to analyze the stochastic influences in addition to the already described systematic deviations.

Repeatability analysis
The first round of comparisons highlighted systematic differences for the measurements of some metrological characteristics. Subsequently, a second circulation of the samples was performed with five repetitive measurements of each material measure in order to allow a quantification of the measurement uncertainty. Only the reference measurements were again evaluated from three measurements of each material measure. In a previous examination, the uncertainty of twelve repetitive measurements of each type of material measure was determined [37] and the evaluation uncertainty of the different algorithms was compared to the uncertainty of the measurement repetition. It could be observed that for some material measures, the evaluation uncertainty was significantly larger than the statistical uncertainty introduced by the measuring instrument. Thus, in the following, the results of a second round of comparions including repetitive measurements are described, that were performed with eighteen different instrument setups. In total>800 areal topography measurements were evaluated in the set of repetitive measurements. All error bars in figures 13-18 and figures B.1-B.18 in appendix B again indicate a 95% confidence interval based on the student's t-distribution.

Type ASG material measures (star)
The material measures with the star-shaped gratings (type ASG) were evaluated and figure 13 shows the results determined for the lateral period limit. Similar to the first round of comparisons, the results do not systematically differ to a large extent. Most mean values range between 3.5 μm and 5.5 μm. However, the statistical uncertainty is very different. This is also caused by the evaluation uncertainty. Due to the extraction of profiles from the measured areal topography, also a small change in the measured topography may lead to a very different threshold value of the transmission characteristics. This has been previously observed [37]. Thus, a relevant uncertainty can result from the evaluation itself that is based on the extraction of profiles. In future work, the evaluation method should be enhanced to use a larger share of information from the measured areal surface topography. Similar to section 3, the additional evaluations of the ASG200 material measure are given in appendix A (figure B.1).

Type CIN material measures (chirp standard)
Similar results can be observed with the chirped standard when the small scale fidelity limit is examined as shown in figure 14. The mean values are in good agreement, whereas the statistical scattering due to the evaluation is visible. In the chirp evaluation also a threshold of the transmission is applied which can result in significant changes of the small scale fidelity limit when a small change in the surface topography occurs. This is due to the fact that typically only one profile of the areal surface topography is considered for the chirp evaluation. A slight change in the evaluated profile can shift the 50% transmission threshold to another sinewave-which results in a relatively large uncertainty for some devices. The results for the CIN200 material Figure 13. Results of the ASG100 material measure, repetitive measurements-lateral period limit. Figure 14. Results of the CIN100 material measure, repetitive measurements-small scale fidelity limit.   measure are provided in appendix B (figure B.2). Also, for this material measure, the evaluation routine should be further examined in future work to allow a reduction of the evaluation uncertainty.

Type AFL material measures (flat surface)
The areal flatness material measure is evaluated regarding the instrument noise. Some determined values of Sq, noise scatter significantly. Most values are in the nanometre-range and are comparable to each other. Generally, the noise levels mostly feature typical magnitudes. Some instruments again show a noise level in the picometre range that can be caused by an averaging throughout the sampling process. The results are illustrated in figure 15 and appendix B (figure B.3).

Type ARS material measures (radial sinusoidal surface)
Similar to the first round of measurements, the result of the radial sinusoidal structure show that it is relatively simple to measure because it does not feature many steep angles. The most relevant differences between the different instruments were observed in the areas with large angles. This was demonstrated when examples of different measured topographies of the first round of comparison measurements were compared (figures 8 and 12). Thus, also in the second round of comparison measurements except for some outliers, comparable results of the areal surface texture parameters and a small statistical scattering are observed ( figure 16 and appendix B -figures B.4-B.6).

Type ACG material measures (crossed grating)
Similar to the first round of measurements, the metrological characteristics that are associated to the transfer behaviour of the lateral axes agree well with the target values for all measuring instruments. For example, the systematic deviation of the amplification coefficients of the x-axis is smaller than 1% for all instruments. The results of the amplification coefficient and the linearity deviation of the x-axis and the mean absolute deviation of the 100 μm material measure are shown in figure 17, the remaining results can be found in appendix A (figures B.7-B.13). Again, the actual and target locations of the dales differ mostly less than 300 nm. The amplification coefficients feature stable values-indicating a reliable transfer behaviour of the lateral axes.

Type AIR material measures (irregular surface)
The type AIR material measures feature steep angles in some areas. Thus, the results differ significantly throughout the various instruments. Figure 18 summarizes the results of the z-amplification coefficients, linearity deviations and the measured Sa values for the 100 μm material measure. Depending on the capabilites of the various instruments to resolve large angles, the results differ significantly and for some instruments also scatter throughout the individual set of repetitive measurements. The additional results can be found in appendix B (figures B.14-B.18).

Correlation analysis of repetitive measurements
In both rounds of comparisons the degree of deviations depended on the type of material measure used. The material measures with large angles led to larger scattering of the results. Thus, it could be assumed that, e.g. a correlation of some metrological characteristics with the NA of the instruments is present. In order to quantitavely determine the connection between the parameters of the instruments and the metrological characteristics, Pearson correlation coefficients were determined. For two parameters x 1 and x 2 with mean values¯x x , 1 2 and empirical standard deviations s ,  The correlation analysis was performed with the data from the second round of measurements. The results in table 4 show that for many parameter combinations there is pronounced correlation behaviour due to the high statistical significance of the data. The resolution limits (lateral period limit and small scale fidelity limit) feature a positive correlation with the lateral spacing and a negative correlation with the magnification and the NA. When the lateral parameters of the ACG material measure are considered, the correlation behaviour between the x-and y-axes differs. Thus, no clear trend can be assumed. The irregular surface (type AIR) shows a negative correlation between linearity deviation and NA and a positive correlation to the lateral spacing. The increasing number of non-measured points with decreasing NA can be observed in the correlation of Sa, Sq and the NA. This is in accordance with the previous results.

Conclusions
The metrological characteristics have been standardized for the calibration of areal surface topography measuring instruments, however, there is still not much experience with their application, their comparability and repeatability. As the determination of the metrological characteristics is required for a rigorous calibration of surface texture measuring instruments, it is important to fulfill this task reliably in order to ensure the traceability of measurement results and the uncertainty estimation. In order to systematically investigate the influence of various measuring instruments on the determination of the metrological characteristics, we have described an international comparison measurement of a Universal Calibration Artefact that can map all basic metrological characteristics of ISO 25178-600. Different areal surface topography measuring instruments were used to determine the corresponding metrological characteristics of twelve material measures that were imaged onto one sample. Two different sets of comparison measurements were conducted. Three identical samples were manufactured with direct laser writing and circulated. The reference measurements of the samples were performed by the PTB using a traceable AFM.
The comparability of the measurement results throughout the various instruments depends strongly on the type of material measure and the evaluated characteristic. When small surface slopes are present, a good agreement between the different instruments was observed. Material measures with large angles, however, may lead to strong deviations and it can be observed that the NA correlates with the Table 4. Correlation coefficients between the discretization, the magnification and the numerical aperture for the second round of measurements. Green highlights for correlations0.3, red highlights for correlations−0.3. corresponding properties. Thus it can be concluded that a comparability of different optical topography measuring instruments is still challenging as the measurement results depend on many instrument properties-even when the bandwidth-limitation of the various instruments is adapted in order to make the results comparable. The metrological characteristics, however, can illustrate and quantify the observed deviations. Whereas it was expected that instrument characteristics, such as the resolution criteria and the instrument noise, are specific for each measuring instrument, also many other parameters scattered significantly due to the occurrence of non-measured points. The most significant deviations were observed in areas that feature large angles and non-measured points occurred already at the locations of much smaller slopes than it would be expected based on the NA limitations. This led to deviations of the transfer behaviour of the height axis. In contrast, the lateral axes were in good agreement for all applied measuring instruments.
The investigations showed that the metrological characteristics of the ISO standardization are a versatile and useful tools to evaluate the measurement uncertainty of areal surface topography measurements and can describe their capabilities, limitations and comparability. However, some evaluation routines still require further analyses which should be considered for future standardization: it was shown that the evaluation routines of both resolution material measures (star-shaped material measure and chirp standard) can lead to a high evaluation uncertainty as a threshold value is involved and only a small fraction of the areal surface topography is actually considered. In future work, methods for the reduction of the evaluation uncertainty will be examined. Also, additional evaluations of the comparison data with regard to height distributions or other methods from ISO 25178-2 will be performed in order to make use of the sound statistical database for additional comparison criteria than the metrological characteristics. Figure A2. Results for the small scale fidelity limit-material measure CIN200. Figure A1. Results for the lateral period limit-material measure ASG200. An outlier value of 11.47 μm for instrument 5 is not imaged. Figure A3. Results for Sq, noise -material measure AFL200. An outlier value of 30.4 nm for instrument 5 is not imaged. Figure A4. Results for Sq-material measure ARS100. Figure A5. Results for Sa-material measure ARS200. Figure A6. Results for Sq-material measure ARS200. Figure A7. Results for y-amplification coefficient-material measure ACG100. Figure A8. Results for y-linearity deviation-material measure ACG100. Figure A9. Results for x-amplification coefficient-material measure ACG200. Figure A10. Results for y-amplification coefficient-material measure ACG200.   Figure A13. Results for absolute mean deviation-material measure ACG200. Figure A14. Results for Sq-material measure AIR100. Figure A15. Results for the z-amplification coefficientmaterial measure AIR200. Figure A16. Results for the z-linearity deviation-material measure AIR200. Figure A17. Results for Sa-material measure AIR200. Figure A18. Results for Sq-material measure AIR200.