Results of round robin form measurements of optical aspheres and freeform surfaces

High-quality aspherical and freeform surfaces are in high demand, and the high-accuracy form measurement of such surfaces is a challenging task. To explore the current status of form measurement systems for complex surfaces such as aspheres and freeforms, interlaboratory comparison measurements are performed. This study presents the pseudonymized results obtained using three different surfaces (metal asphere, glass asphere, toroidal surface) in a total of six different round robins. These results were taken from a total of 13 different measurement instruments based on 9 different measurement principles and operated at 12 different laboratories. They were analyzed using a sophisticated procedure that was first developed in 2018 and then refined and tested on simulated data in 2022 to address the challenges of such a comparison at this level of accuracy. In the current study, we applied these refined methods to data acquired from tactile and optical point measurements as well as from optical areal measurements. As there are no absolutely measured and very well characterized reference standard aspherical and freeform surfaces available at the accuracy level of a few tens of nanometers root-mean-square, the approximated true forms of the surfaces were derived from the measurements and indicate the manufacturing accuracy of the surface forms. Then, the measurement’s differences to the approximated true forms were analyzed, which directly indicate the systematic measurement errors of the instruments. By also comparing the approximated true forms from the two different round robins for each surface, additional insights into the reliability and stability of these so-called virtual reference topographies were gained.


Introduction
Comparison studies are important for ensuring the quality and functionality of measurement devices and are intended to provide an independent, and unbiased comparison of all available measurements while offering insights into the performance of the form measurement systems currently available.This presents a challenging task, since the results of different instruments and laboratories have different 3D measurement point distributions [1], leading to different grid patterns and grid point densities.Another challenge associated with such comparison studies is that the true form of the surfaces is not known as there are no absolutely measured and very well characterized reference standard surfaces available at the accuracy level of a few tens of nanometers root-mean-square (RMS).
In recent years, form measurement comparisons of optical aspheres and freeform surfaces have been performed regularly within the international association CC UPOB e.V 15 , with contributions made by international partners from research institutes and industry.The CC UPOB e.V. is a non-profit association targeting at the increase of scientific knowledge in the area of ultraprecise surface figuring and its related measurement challenges.To increase knowledge transfer between the partners and lower the hurdle for participation, the comparison campaigns are evaluated in an anonymous way.In 2018, the first round robin comparison study performed within this association was published [2].Since then, several further comparison studies have been organized by CC UPOB e.V. and the evaluation methods used in [2] have been continuously improved.These refined evaluation methods have been published and were tested by using simulated test data [1].
In this paper, new round robin measurements performed within CC UPOB e.V. between 2019 and 2022 are analyzed using the optimized procedures and presented to explore the current status of a large variety of current form measurement systems and principles.All measurements are analyzed using the same version of the comparison software.
The paper not only provides insight into the current status of complex form measurement results by applying the data evaluation procedures to a large variety of measurement data, but also discusses the concept of using virtual reference topographies (VRTs) for comparison studies and points out its benefits and limitations.A VRT is defined to be the pointwise median of the available measurements after removal of the design form and spherical form errors.It is therefore an approximation of the true surface form deviation from the design topography (neglecting spherical form deviations) [2] and indicates the manufacturing error of the surface.Assuming that it approximates the true form deviation, its removal from the measurement results directly indicates the systematic measurement errors of the instruments.This method is used in this paper to compare the measurement results of the three samples.
The novelty of this paper is twofold: First, the form measurement results of new surfaces by a large variety of different measurement instruments are compared by applying a refined comparison method that had been previously tested on virtual test data [1].Overall, 13 different instruments based on 9 different measurement methods were used at 12 different laboratories, and a total of 41 different measurements are analyzed here.Compared to the study of different measurement data presented in 2018 [2], the list of contribution partners has changed, returning partners have updated their measurement systems, new surfaces were investigated, and the evaluation procedures have been refined.
Second, each surface was part of two different round robins, and the resulting VRTs calculated from the data are compared to one another as well.Such a comparison provides insights into the reliability and stability of the VRTs and highlights the influence of the number and quality of the contributing measurements.This demonstrates the value of the concept of referencing to a VRT.
The paper is organized as follows.Section 2 presents the surfaces that were measured and section 3 introduces the measurement principles applied during the round robins.In section 4, the comparison procedure used in this study is described in detail.Section 5 presents the results of the comparison together with a discussion.Finally, a summary of the results and conclusions is offered in section 6.

Specimens
In this study, measurements of the form of three different surfaces were performed, the first two of which are of the same design but of different material.The first aspherical surface is made of metal (figure 1(a)) and was manufactured by LT Ultra GmbH.The second aspherical surface is made of glass and was manufactured by Asphericon GmbH (figures 1(b) and (c)).The third specimen investigated has a toroidal surface made of glass (figure 1(d)) and was manufactured by the company Optik+.The glass aspherical surface was measured in two different settings, once while mounted on a holder (figure 1(b)) and once as an unmounted asphere with no holder (figure 1(c)).The other two specimens were not changed between the two measurement comparisons and were both measured mounted on a holder.Both were used in two different interlaboratory comparisons involving different numbers and types of measuring systems.The design function of the two convex aspherical surfaces is based on the definition of a standard asphere [3]: where r = √ x 2 + y 2 is the distance from the center of the surface, R is the vertex radius of curvature, κ is the conic constant, and A i are further coefficients describing the asphericity.The aspherical surfaces used here have the following parameters: aspherical coefficients (A 4 , A 6 , . . ., A The convex toroidal surface has two different radii along the two orthogonal axes.These design radii are r v = 40 mm and r h = 42 mm; the specimen has a diameter (clear aperture) of 50 mm.The mathematical description of the toroidal surface is given by Depending on the manufacturing technology and accuracy of the surfaces, the surfaces have non-rotational symmetric manufacturing errors that can be used for the alignment of the measurements to each other.Sometimes, there are only very small non-rotational symmetric errors which even cannot be measured by every instrument.In these cases, the alignment might cause errors in the comparison results.To improve and facilitate alignment, all three specimens have three markers on the surfaces.For the metal asphere, the markers were manufactured by LT Ultra GmbH within the diamond-turning manufacturing process.For the glass samples, the markers were produced by NTG Neue Technologien GmbH & Co. KG using ion beam figuring.The markers are Gaussian peak markers with depths in the range of a few hundred nanometers and full-width-half maximum in the range of 0.5 mm to1 mm.The markers are located approximately 3 mm to 4 mm from the edge of the surfaces at 0 • , 30 • , and 90 • (depending on the z-rotation of the sample; this angle may vary in the results shown below, but their relative position to each other remains the same).With these markers, the alignment of the specimen is unambiguous.Therefore, the markers facilitated the alignment of the different measurements to one another during the comparison.

Measurement systems
In this study, 13 different instruments based on 9 different measurement methods were used at 12 different laboratories.Note that not all instruments contributed to each of the six measurement campaigns.Overall, 41 measurements were analyzed.Table 1 gives an overview of the number of measurements analyzed in each round robin.
The different measurement methods applied in this study for the form measurement of optical asphere or freeform surfaces can be divided into three groups: tactile measurements [4], optical point measurements [5][6][7][8][9][10], and optical areal measurements [11][12][13][14][15][16][17].Tactile measurement systems are tactile coordinate measuring machines (CMMs).They use a tactile probe to measure the surface by touching it with a small contact force at many positions.For form measurements with a tactile CMM, the surface of the specimen is scanned with a chosen point density and grid pattern along the surface.Although this is a pointwise measurement, the point densities of such devices can be very high, but measurement time increases with increasing point density, in particular if a high areal point density is desired.One tactile CMM used in this study is the UA3P 3D Profilometer [4].This system is a contact-type 3D profilometer with an extremely low contact force (using an 'Atomic Force Probe') using a He-Ne frequency-stabilized laser for coordinate measurement and position verification.The contact-type measurement ensures the detection of a surface point cloud independent of light reflection or absorption effects.
Optical point measurement systems are non-contact CMMs.They use an optical sensor to scan the surface at many positions.Such systems are usually faster than tactile CMMs and the procedure is contactless, reducing the risk of potential scratches or digs on the surface to be measured.Optical point measurement systems that directly scan the surface form used in this study include the following: the LuphoScan system [5][6][7] uses an interferometric point probe that is based on the physical principle of multi-wavelength interferometry, together with a movement system, a reference frame and three reference probes that determine the position of the sensor within the reference frame.The NMF600 S [8] has a cylindrical CMM setup with a differential confocal sensor as an optical probe.A high-stability separate metrology system relates the probe position to the surface position.The MarForm MFU200 Aspheric 3D system [9,10] is also a cylindrical coordinate measuring instrument equipped with an optical point sensor based on white light interferometry.For the measurements a probe tip with high numerical aperture is applied, allowing to measure the samples without changing the angle of the probe.The system is equipped with an internal compensation system consisting of a set of capacitive sensors measuring distance changes to a reference frame.Also contributing to this study were two optical point measurement systems that are based on deflectometry and derive form measurement from indirect measurements: the V-Spot technology is based on a pointwise measurement of the gradients, which are then used to calculate the surface form [18].The second method is called experimental ray tracing and involves scanning the surface with a narrow laser beam, detecting the beam direction behind the surface under test, and reconstructing the surface form using the slopes calculated from the data measured by a model-based approach [19].
Optical areal measurement systems are fast, provide high lateral resolutions since the measurement data is usually acquired by a camera, and are among the non-contact measurement techniques.Different methods of this type were used in this study, exhibiting various advantages and disadvantages.One such method uses an interferometer with a transmission sphere in combination with a computer generated hologram (CGH) [11], or alternatively, with a specific CGH with integrated reference surface [12].Such measurements are very fast because only a single shot is needed.But since a CGH has to be manufactured for each lens design, the non-recurring engineering costs are high [13].Furthermore, except for the CGH with integrated reference surface [12], aligning a CGH is time-consuming and alignment errors can lead to additional measurement errors [11][12][13].
When not using a CGH, stitching methods or elaborate computational methods are required to perform areal optical surface measurements of aspheres or freeform surfaces.In this study, a Zygo Verifire Asphere [14] was used that measures small areas of the surface step by step by moving the specimen relative to the interferometer in such a way that the area's deviation from a spherical test wavefront is small.A large number of such area measurements are then used to reconstruct and stitch together the topography.This procedure is more time consuming than single-shot methods.Another areal optical measurement technique used in this study is tiltedwave interferometry [15,16].Here, a microlens array is used in the illumination arm of the interferometer, producing several differently tilted wavefronts that illuminate the surface to be measured.No movement of the specimen is needed to reconstruct the form; instead, a model-based evaluation procedure is used that requires solving several high-dimensional inverse problems [17].
For more details on the different measurement principles and setups we refer to literature.
Note that for each method, different instruments or realizations may have been used at different laboratories or in different round robins, and that different settings, hardware, and/or software may have resulted in performance variances.Moreover, the results may depend on the operator and the settings chosen by the operator.

Comparison procedure
The comparison procedure is based on the refined procedure tested and discussed in [1].For reader convenience, a description of the procedure is provided again here.The comparison procedure can be divided into two steps.First, the individual data sets are pre-processed to align them in the same manner and calculate the characteristic data for each measurement that will be compared in the second step.
The data are delivered as absolute form measurement data in Cartesian coordinates, i.e. including the design surface.Data evaluation is performed using the MATLAB ® [20] software.Large outliers are first removed so that they will not influence the alignment steps.This is done by fitting the data to Zernike polynomial functions of up to order n = 18 (throughout this paper, the ANSI Standard Zernike definition is used [21]).These Zernike polynomials are then subtracted from the data and all data points of the residual that are larger than a threshold value are removed from the data.The threshold value depends on the data and is calculated by multiplying the median absolute deviation (MAD) of the residual of the Zernike polynomial fit by a chosen factor.In this study, a factor of 30 is used.
To mutually align the data and to transform all individual data into the same coordinate system, each point cloud is aligned with the design topography by minimizing the RMS value of the differences between the measured data points and the design topography (describing the design deviation in the z-direction) in a least squares sense, allowing shifts and rotations of the measured point cloud along the three Cartesian coordinate axes.The allowable degrees of freedom of this procedure depend on the symmetry of the design topography.To reduce the influence of the different 3D measurement point distributions on the alignment, data interpolated to the same equidistant 1000 pixel × 1000 pixel grid are used for this step [1].The resulting necessary shifts and tilts are then applied to the original point cloud measured.After this alignment, the design topography is removed from the data sets to make the differences between the measurements more visible.The resulting data are referred to as residual data in the following.
The residual data are often dominated by spherical form deviations [2,22].To compare the spherical form deviations as well as the non-spherical differences between the measurements, the individual best-fit spheres (BFSes) of the residual data are calculated and removed from the residual data.Again, it is important to use data interpolated to the same common 1000 pixel × 1000 pixel grid to calculate the individual BFSes in order to reduce the influence of the different 3D measurement point distributions of the data sets [22].The BFSes are determined using the Levenberg-Marquardt algorithm [23, section 5.2] as implemented in MATLAB ® [20].The BFS calculated on the interpolated data is then removed from the original residual data.The resulting data is called non-spherical residual data.
Depending on whether the design surface is symmetrical, the data sets may need to be aligned with each other by rotating them around the z-axis.To this end, an arbitrary nonspherical residual data set is chosen as a reference, and the maximum correlation to this reference is determined for each non-spherical residual data set in dependence of its rotation about the z-axis.The step-width of the angle is 0.1 degrees.For this step, the frequency components with the most significant structures should be used.This depends on which low-and mid-spatial frequencies are present on the specimen's surface.In this paper, the interpolated mid-spatial frequency structures of the non-spherical residual data (after subtracting Zernike polynomial functions of up to order n = 18) are used.The resulting rotated data are referred to as reduced data in the following.
Following these pre-processing steps, the data are compared to each other.To do so, the reduced data are interpolated to the same common 1000 pixel × 1000 pixel grid.Since calibrated reference standards with uncertainties in the range of some nanometers RMS do not exist, a reference topography is derived from the measurements.For this purpose, based on these interpolated data, the pointwise median topography is calculated.This means that for each grid point, the median value of all available reduced data is calculated.This pointwise median topography serves as a reference for the comparison and is referred to in the following as the VRT.It is an approximation of the true form deviation of the surface from its design topography, neglecting the spherical form deviations and taking into account all available information and indicates the manufacturing accuracy of the surface form.In future, when the uncertainties for every measurement become available, the VRT can be replaced by a value that also takes the uncertainties of the measurements into account, e.g. a weighted mean value.Moreover, when the future differences between the measurements in the spherical contributions are smaller, the VRT could be calculated without previously removing the individual BFSes.
The pointwise difference of each interpolated reduced data set to the VRT is calculated and compared.Assuming that the VRT approximates the true form deviations from its design form, these pointwise differences directly indicate the systematic measurement errors of the instruments.Additionally, the RMS values and the more robust MAD of these differences and of the VRT itself are calculated and compared.While the values of the VRT indicate the quality of production, the values of the differences correspond to the statistics of the systematic measurement errors of the instruments.
Finally, the VRTs of the two different round robin studies of the same surface are also compared for each surface to get insights into its reliability and stability to approximate the true form.Their comparison requires aligning them to each other.To this end, one of them is chosen as a reference and the maximum correlation is calculated between the reference and the second one in dependence of the rotation angle around the zaxis.The angle is varied in steps of 0.1 degrees.The same frequency structures as mentioned above are used for the alignment.After this, the rotated VRT has to be interpolated to the same common 1000 pixel × 1000 pixel grid and the pointwise difference between the VRTs is then calculated.

Results
In this section, we present the comparison results for the six different round robins.The data sets are pseudonymized by numerical identifiers.It should be noted that the numbers were chosen independently for each specimen and each round robin.This means, for example, that the number 1 used in the analysis of the metal asphere in 2021 is not necessarily the same measurement system denoted by the number 1 in the analysis of another specimen or year.In addition, all measurement results are presented based on interpolated data only in order to reduce the possibility that the measurement principle can be identified based on the selected 3D measurement point distribution.
The measurements in each round robin are evaluated on an aperture with a certain diameter.It is to note that not all measurement systems were able to measure the complete aperture of the surfaces.Therefore, the diameter of the aperture evaluated is chosen such that the largest common diameter of all measurements contributing to the specific round robin is used.
For all six measurement campaigns, the following results are presented: At first, the radii of the individual BFSes and the corresponding peak-to-valley (PV) values resulting from the BFSes on the evaluated aperture for the measurements of the round robin together with the median values are shown as bar plots.With this information one can compare the differences between the measurements due to the differently measured spherical form error and the approximated spherical manufacturing error can be derived from the median values.
Then, for each measurement campaign, the VRT calculated from the measurements contributing to the round robin is shown.This is the pointwise median topography of all measurements after removing the design and the individual BFSes.It is an approximation of the true form deviation of the surface from its design topography, neglecting the spherical form error and taking into account all available information.The VRT is therefore an indication for the production quality of the surface form, and the RMS and MAD values indicate its manufacturing accuracy on the given surface diameter.
The VRT is then used as a reference and is removed from the reduced data in order to identify the differences between the measurements of each measurement campaign.For each measurement campaign, the resulting differences of the reduced data to the VRT are presented.These resulting differences directly display the systematic measurement errors of the instruments, when assuming the VRTs approximating the true form deviations (neglecting spherical form errors). Finally, the RMS values and the more robust MADs of the differences to the VRT for each measurement are displayed together with the values of the VRT as bar plots for each interlaboratory comparison.In cases where the individual RMS and MAD values are larger than those of the VRT, the measurement did not serve to get closer than the design to the approximated true form.Some numerical results of all six measurement campaigns are summarized in table 2.
Finally, the pointwise difference between the VRTs of the two measurement campaigns is shown and discussed for each sample providing insights into their reliability and stability.
It should be noted that we decided to not include figures of the original measurement data, of the residuals after removal of the design form, and of the reduced data after removal of the individual BFS, to reduce the number of figures presented in this manuscript.The original and all interim results can be generated by combining the VRT and the differences to the VRT together with the BFS (radii presented as bar plots for each measurement) and the design.Therefore, all results are contained in the manuscript.
In the following, for each of the three samples, the measurement results are presented in a separate section below.

Asphere made of metal
This section presents the results of the form measurements of the aspherical surface made of metal (figure 1(a)) obtained during the two round robins performed in 2021 and 2022.The specimen underwent no change during this time, so the only differences between the two round robins were in the number of participants and in the measurement systems used.While only five measurements were included in the round robin of 2021, nine different measurements were analyzed in 2022.In 2022, new measurements were performed.In cases where new data were collected with the same measurement system as in 2021, or where data were supplied from a measurement system that was not included in 2021, the new data are analyzed here.Two data sets were not updated and are the same as in 2021.In the two measurement campaigns, not all measurement instruments were able to measure the complete aperture of the surface.The largest common aperture of the campaign that could be used for data evaluation had in both measurement campaigns a diameter of 24.8 mm. Figure 2 shows the radii of the individual BFSes (a), (c) and the corresponding PV values (b), (d) resulting from the BFSes on the evaluated diameter for the measurements of the round robins performed in 2021 (a), (b) and 2022 (c), (d), together with the median values.As can be seen here, the difference between the spherical form errors measured by the instruments is quite large, and the resulting PV differences due to these different BFSes are in the range of about 600 nm in 2021 and 700 nm in 2022.The median PV value of the spherical form error is 204 nm in  Due to the diamond turning manufacturing process, mainly rotational symmetric errors are visible.Figure 4 shows the differences between the measurements and the VRTs for the two measurement campaigns of 2021 (a) and 2022 (b).In 2021, measurement number 2 shows larger rotational symmetric deviations.Measurement number 5 is primarily characterized by a very large astigmatism.A remeasuring of the sample by that instrument showed that the sample was deformed due to a too large force on the hollow holder during the measurement.Therefore, the measurement comparison performed helped to identify such handling problems at this high level of accuracy.When providing samples for future comparisons, care should be taken to prepare the samples with a holder of greater stiffness or to adjust the mechanical design of the holder.In addition, all partners should be informed that a too large force on the holder can cause deformation of the surface form.Note that because of the small number of measurements in 2021, the VRT might also be affected by these large deviations.The astigmatism in particular appears to be slightly visible in measurements 1, 3 and 4. Therefore, a new comparison with updated data will likely result in an improved   VRT, but we have left the data here as measured to show and examine the effects on the VRT, and also to demonstrate that such comparison studies are helpful in identifying problems at this high level of accuracy.Figure 4(b) shows the results of 2022.While measurement numbers 4, 6, and 8 show some larger deviations, other measurements are very close to the VRT.This is especially true for measurement numbers 1, 5, and 7. Note that the color scale ranges only between plus and minus 40 nm demonstrating the high accuracy of some of the measurements.
Figure 5 shows the RMS values and the more robust MADs of the differences to the VRT for each measurement together with the values of the VRT for 2021 in (a) and for 2022 in (b).As mentioned, the values of the VRT itself describe the deviation of the approximated true form to the design, neglecting spherical form errors, and are therefore an indication for the production quality of the surface form.The RMS value is 26 nm in 2021 (a) and 27 nm in 2022 (b), while the MAD value is 19 nm in 2021 (a) and 21 nm in 2022 (b), showing that the asphere made of metal was manufactured with high accuracy and the results of the two years fit well to each other.The individual RMS values of the differences to the VRTs range between just 8 nm and 100 nm in 2021 (a) and 5 nm and 36 nm in 2022 (b), the MADs are between 3 nm and 54 nm in 2021 (a) and 3 nm and 26 nm in 2022 (b).In cases where the individual values are larger than those of the VRT, the measurement did not serve to get closer than the design to the approximated true form.This is the case for measurement numbers 2 and 5 in 2021 and for measurement numbers 4, 6, and 8 in 2022.Comparing the individual results of the two years shows that in both measurement campaigns several instruments exhibit RMS deviations from the approximated true forms in the single nanometer range, neglecting spherical form errors, while some show larger deviations, either due to systematic measurement errors or to issues such as handling problems during the measurement procedure.Systematic measurement errors depend on the measurement device, but could be caused e.g. by sensor calibration errors, axis position errors, scaling errors, system calibration errors, handling errors, model errors, or coordinate transformation errors, to name just a few.Therefore, the results of these comparisons are important for the participants to identify such errors.A summary of the numerical results is presented in table 2.
Finally, the two VRTs of the measurements of the metal asphere are compared.For this purpose, the VRTs are aligned to each other before the VRT of 2021 is subtracted from the VRT of 2022. Figure 6 shows the pointwise difference of the VRTs (c) and the aligned VRTs of the measurements performed in 2022 (a) and 2021 (b).The difference has a systematic structure with an RMS value of 6 nm and an MAD of 3 nm.The reason for the systematic structure seems to be the large astigmatism of one of the five measurements contributing to the VRT in 2021.This leads to the conclusion that the VRT of 2021 was affected by this measurement, as was already assumed when discussing the results of 2021.Therefore, the VRT is more reliable when more data sets of the same high quality contribute to it.Nevertheless, the RMS value of the differences of the VRT is more than a factor of 4 smaller than the RMS value of the individual VRTs.Therefore, the VRT is still a reliable approximation of the true form, taking into account all available information.

Asphere made of glass
This section presents the results of the form measurements of the aspherical surface made of glass (figures 1(b) and (c)) obtained during the two round robins performed in 2019 and 2021.Whereas in 2019 the specimen was measured mounted on a holder, in 2021 it was measured without a mount, so it might also be interesting to see if the effect of the mounting is reflected in the results when comparing the VRT of 2019 with that of 2021.Seven measurements were included in the round robin of the mounted asphere in 2019, while eight different measurements were analyzed for the unmounted asphere in 2021.In the two measurement campaigns, not all measurement instruments were able to measure the complete aperture of the surface.The largest common aperture of the campaign that could be used for data evaluation had a diameter of 22.0 mm in 2019 and 24.8 mm in 2021.Note that the numerical results can therefore not be compared directly.However, to show the results on the largest common aperture for both campaigns, we decided to not reduce the aperture diameter for 2021; also because in 2021 the marker structures of the sample are clearly visible.
Figure 7 shows the radii of the individual BFSes (a), (c) and the corresponding PV values (b), (d) resulting from the BFSes on the evaluated diameter for the measurements of the round robins performed in 2019 (a), (b) for the asphere mounted on a holder and 2021 (c), (d) for the unmounted asphere, together with the median values.As can be seen here, the difference between the spherical form errors measured by the instruments is quite large, and the resulting PV differences due to these different BFSes are in the range of about 250 nm in 2021 and 5300 nm in 2021.In 2021, measurement number 8 measured a much larger spherical form deviation (which corresponds to a small radius of the BFS leading to a large PV resulting from the spherical form) than all other instruments.Due to this, the difference between the measurements is large.The median PV value of the spherical form error is 105 nm in 2019 and 148 nm in 2021, indicating the manufacturing error of the spherical form.The results show, as for the metal asphere, that the measurement of the absolute form, including spherical form errors, differs significantly between the measurements.
In figure 8, the VRT of the seven measurements of 2019 is shown in (a) and of the eight measurements of 2021 is shown in (b).In 2019, the three marker structures are hardly visible because the diameter of the evaluated aperture is a bit too small.Hints of the markers can be seen at the edge of the surface at six, seven and nine o'clock.The overall deviations are rather small with an RMS value of 23 nm.In 2021, the diameter of the evaluated aperture of the surface is larger than in 2019, and the three marker structures are clearly visible at the edge of the surface at six, seven and nine o'clock.Again, the overall deviations are rather small with an RMS value of 31 nm.Note that in 2021 the value is affected by the markers, leading to a larger value as in 2019.Nevertheless, the manufacturing quality of the surface form is high and comparable to that of the asphere made of metal.However, in contrast to the metal asphere, the dominant manufacturing errors of the glass asphere are in the mid-spatial frequency range which may be attributed to the different manufacturing technology of the glass asphere.
Figure 9 shows the differences between the measurements and the VRTs for the two measurement campaigns of 2019 (a) and 2021 (b).While measurement numbers 2 and 6 in 2019 and 2, 6, and 8 in 2021 show some larger deviations, most measurements are very close to the VRT.Note that the color scale ranges only between plus and minus 40 nm, again demonstrating the high accuracy of most of the measurements.
Figure 10 shows the RMS values and the more robust MADs of the differences to the VRT for each measurement together with the values of the VRT for the measurement campaign of 2019 in (a) and of 2021 in (b).The RMS value of the VRT is 23 nm in 2019 and 31 nm in 2021, while the MAD value is 16 nm in 2019 and 19 nm in 2021, showing that the asphere made of glass is a high-quality manufactured asphere.Note that the numerical values of both years cannot directly be compared, since the data evaluation was performed on an aperture with different diameter in both campaigns and in 2021 especially the markers also contribute to the deviation from the design.The individual RMS values range between just 6 nm and 16 nm in 2019 and 6 nm and 48 nm in 2021, the MADs are between 3 nm and 11 nm in 2019 and 3 nm and 25 nm in 2021.For the round robin of 2019, the individual values are in all cases smaller than those of the VRT, showing that each measurement helps to get closer than the known design to the approximated true form.Nevertheless, there are also some measurements with RMS values in the range of just 5 nm, showing the high quality of the contributions.In 2021, the individual values of measurement numbers 2 and 8 are larger than those of the VRT.These measurements did not serve to get closer than the design to the approximated true form.
Nevertheless, there are measurements with only a few nanometers RMS deviation to the VRT, evidencing the high quality of the contributions.Also for this sample, a summary of the numerical results is presented in table 2.
Finally, the VRTs of the two round robins for the glass asphere are compared.Since the asphere was measured while mounted on a holder in 2019, and without any holder in 2021, it is interesting to investigate whether any systematic difference in the approximated true form is visible.To this end, the measurements of 2021 were evaluated again on a smaller aperture with a diameter of 22.0 mm so that the VRTs could be compared.After this, the two VRTs were aligned to each other before the VRT of 2019 was subtracted from the VRT of 2021. Figure 11 shows the pointwise difference of the VRTs (c) and the aligned VRTs of the measurements performed in 2021 (a) and 2019 (b).The difference has a systematic structure with an RMS value of 6 nm and an MAD of 4 nm.The RMS value of the differences of the VRT is approximately a factor of four smaller than the RMS values of the individual VRTs.The systematic structure seems to be the effect of mounting, since a ring structure is visible in the pointwise difference map in (c), and the specimen was glued to a holder on a ring with a structure similar to that seen in the pointwise difference map of the two VRTs.This leads to the conclusion that the two comparison studies with approximately the same amount of high-quality measurement data yield reliable VRTs that help in the comparison of different measurements and of different specimen measurement conditions.The results show that even small changes in the surfaces can be made visible by comparing the VRTs.Furthermore, the results indicate that the mount of the specimens leads to small deviations in the surface form.Therefore, form measurements should always be performed under the mounting condition in which the specimen will be used.For measurement comparisons, the specimens should be measured by all participants with or without holders in order to achieve comparable results.Alternatively, holders should be developed that do not lead to any changes in the surface form [24].

Toroidal surface with large design deviations
This section presents the results of the form measurements of the toroidal surface (figure 1(d)) performed in 2021 and 2022.Between the two round robins of 2021 and 2022, no changes were applied to the specimen, so the only differences between the two round robins were in the number of participants and in the measurement systems used.While five measurements were included in the round robin of 2021, seven different measurements were analyzed in 2022.In cases where new data were collected with the same measurement system as in 2021, or where data were supplied from a measurement system that was not included in 2021, the new data are analyzed here.One data set was not updated and is the same as in 2021.In the two measurement campaigns, not all measurement instruments were able to measure the complete aperture of the surface.The largest common aperture of the campaign that could be used for data evaluation had a diameter of 40.0 mm in 2021 and 44.0 mm in 2022.Note that the numerical results can therefore not be compared directly.However, to show the results on the largest common aperture for both campaigns, we decided to not reduce the aperture diameter for 2022.
Figure 12 shows the radii of the individual BFSes (a), (c) and the corresponding PV values (b), (d) resulting from the BFSes on the evaluated diameter for the measurements of the round robins performed in 2021 (a), (b) and 2022 (c), (d), together with the median values.The resulting PV differences due to these different BFSes are in the range of about 2300 nm in 2021 and 8300 nm in 2022.The reason for the larger difference value in 2022 is that measurement number 7 measured a very low spherical form error leading to a small PV value, while all others measured a much larger spherical form error.The results show again, that the measurement of the absolute form, including spherical form errors, differs significantly between the measurements.The median PV value of the spherical form error of all analyzed measurements is 6655 nm in 2021 and 9098 nm in 2022.The large values suggest that the surface form has a large spherical manufacturing error.The median PV value of 2022 is even larger than in 2021 since in 2021 an aperture with a smaller diameter was evaluated.The values can therefore not be compared directly.
In figure 13, the VRTs of the five measurements of 2021 (a) and of the seven measurements of 2022 (b) are shown.The overall form deviations from the design topography are very large, resulting in RMS values of 778 nm in 2021 and of 1218 nm in 2022.These values show that also the non-spherical manufacturing result shows large design deviations, and the manufacturing of the form is not of the same high quality as it was for the aspheres.Due to these large deviations from the design, the three marker structures are not visible in the figures.However, for the alignment the mid-spatial frequency structures were used and in this frequency range the markers facilitated the alignment in 2022.In 2021, the diameter of the aperture evaluated was too small, so that the markers could not be used for the alignment.Nevertheless, the surface has also a lot of midspatial frequency manufacturing errors, that facilitated the alignment.
Figure 14 shows the differences between the measurements and the VRTs for the two measurement campaigns of 2021 (a) and 2022 (b).In 2021, measurement numbers 2 and 4 show larger deviations compared to the other three measurements.In 2022, larger deviations are visible for measurement numbers 2, 6, and 7.The systematic differences of these measurements are of similar size in both measurement campaigns.One reason might be that certain systematic alignment, calibration or reconstruction errors of such a complex toroidal surface during the measurement might lead to such systematic effects.Note that the color scale ranges between plus and minus 250 nm, which is a factor of 10 smaller than the color scale of the VRT.Therefore, the individual differences to the VRT are small compared to those of the VRT to the design.
Figure 15 shows the RMS values and the more robust MADs of the differences to the VRT for each measurement of 2021 (a) and 2022 (b) together with the values of the VRT.
The RMS value of the VRT is 778 nm in 2021 and 1218 nm in 2022, while the MAD value is 300 nm in 2021 and 515 nm in 2022, showing that the toroidal surface has very large design deviations.The individual RMS values range between just 11 nm and 227 nm in 2021 and 23 nm and 300 nm in 2022, the MADs are between 3 nm and 146 nm in 2021 and 8 nm and 196 nm in 2022.Therefore, all measurements have RMS values smaller than that of the VRT itself, so all measurement results are closer than the design topography to the approximated true form.Nevertheless, measurement numbers 2 and 4 in 2021 and measurement numbers 2, 6, and 7 have much larger values than the other measurements, showing that for the complex freeform surface with large design deviations, some measurement systems seem to exhibit larger measurement errors possibly stemming from alignment, calibration, or reconstruction errors during the measurement due to the large design deviation or the lack of rotational symmetry.Also for this sample, a summary of the numerical results is presented in table 2.
Finally, the VRTs of the two round robins of the toroidal surface are compared.To this end, the measurements of 2022 were evaluated again on a smaller diameter of only 40.0 mm to allow the VRTs to be compared.After this, the two VRTs were aligned to each other before the VRT of 2021 was subtracted from the VRT of 2022. Figure 16 shows the pointwise difference of the VRTs (c) and the aligned VRTs of the measurements of 2022 (a) and 2021 (b).The difference has an RMS value of 19 nm and an MAD of only 6 nm.Compared to the large RMS values of the VRTs themselves, these values are very small and show that the VRT is a very stable approximation of the true form.Nevertheless, it is to note that the RMS value of the pointwise difference of the VRTs for the toroidal surface is about three times larger than for the aspheres.The reason is that some measurements of both campaigns showed larger systematic measurement errors (see figure 14) possibly stemming from the fact that the measurement task is more challenging due to the loss of symmetry and the much larger deviations from the design of the toroidal surface.However, the RMS value of the differences of the VRT is more than  a factor of 40 smaller than the RMS value of the individual VRTs demonstrating the stability and reliability of the VRT as an approximation of the true form.Especially in cases where the surface has much larger deviations from the design than the differences between the measurements (like this toroidal surface), the VRT is very helpful in detecting the systematic differences between the measurements.Otherwise, when only comparing the form deviations from the design, the differences between the measurements would not be visible at all in these cases.

Summary and conclusions
In this paper, the results of six round robin comparison studies for the form measurement of two aspheres and a toroidal surface are presented.The basic evaluation procedure was first published in 2018 [2] and then refined and tested using virtually generated test data in 2022 [1].The current study evaluated all data using the same software version, applying the refined methods to the measurement data obtained from a large variety of instruments.Overall, 13 different instruments based on 9 different measurement methods were used at 12 different laboratories to contribute to the results, and a total of 41 measurements were analyzed.Besides gaining knowledge of the current status of the level of agreement between a large variety of different instruments, insights were gained with regard to the stability and limits of the evaluation method.For this purpose, an approximated true topography called the VRT was calculated from the data, and the VRTs of two different round robins for the same surface were compared.A VRT approximates the true form of the surface from its design form (best estimate) and displays the manufacturing accuracy of the surface form (neglecting spherical form deviations).When comparing measurements after removal of the VRT, the results directly indicate the systematic measurement errors of the instruments.When comparing the VRTs of two different round robins of the same surface, insights into its reliability and stability is gained.
The results show that the manufacturing and measurement results of aspheres made of glass and metal are on a high professional level, with approximated manufacturing deviations from design of between 23 nm (RMS) and 31 nm (RMS), neglecting spherical form errors. Several instruments exhibit RMS deviations from the approximated true forms in the single nanometer range, neglecting spherical form errors, while some show larger deviations, either due to systematic measurement errors or to issues such as handling problems during the measurement procedure.In the latter case, these kinds of comparisons help to detect such deviations at this high accuracy level and therefore to improve the methods.Comparing these results to the comparison measurements of aspheres carried out by CC UPOB e.V. and published in 2018 [2], not only have the comparison methods evolved but the measurement results and the differences between the measurements have also improved.
The results of the calculated individual BFSes and their corresponding PV values reveal that the residuals, after the removal of the design form, are often dominated by spherical form errors.Such form errors lead to PV differences between the measurements of the aspheres of several hundred or even thousands of nanometers, enabling the conclusion that the correct measurement of the spherical form error is one of the greatest challenges in asphere (and freeform) metrology.
The toroidal surface has a large design deviation with RMS values of 778 nm and 1218 nm (depending on the diameter evaluated), neglecting the spherical form error.This is a daunting measurement task but the results show that all of the measurement systems performed well in measuring these large design deviations.A few measurements with larger deviations from the VRT have RMS deviations in the range of up to 227 nm, which is still a factor of three smaller than the RMS value of the VRT.Other measurement results have RMS deviations from the VRT in the range of just a few tens of nanometers, which is a factor of 50 smaller than the RMS value of the VRT.This shows that reliable results across different measurement systems can be achieved even for such a challenging measurement task.
One hurdle in conducting such a comparison study is that there are no absolutely measured and very well characterized reference standard surfaces available at the accuracy level of a few tens of nanometers RMS, meaning that the true form of the surface being measured is not known.This is why the concept of referencing to a VRT was proposed in 2018 [2] and refined in 2022 [1].Applying this concept to two different round robins for each of the three surfaces investigated not only provided insights into the current level of agreement between the measurements of a large variety of different instruments and their systematic measurement errors, but also into the reliability and limitations of the concept.Furthermore, it allowed the effect of mounting on the form stability of the glass asphere to be investigated.The VRT of the glass asphere showed systematic differences probably caused by the mounting, since the specimen was glued to a holder on a ring with a structure similar to that seen in the pointwise difference map for the two VRTs.Despite this, the systematic difference between the two VRTs is only very small, demonstrating the good reliability and stability of the VRT as well as the usefulness of the concept when the design deviations of the surface to be measured are larger than the differences between the measurements.
No change was made to either the metal asphere or the toroidal surface between the two round robins.For both surfaces, the only things that changed were the number and type of measurement systems that contributed to the VRT.For the asphere made of metal, a small systematic deviation between the two VRTs was visible.The reason is that in the first round robin, only five measurement systems contributed, and of these one had a large systematic measurement error.This error affected the VRT to a small extent.This shows that the reliability of the VRT as an approximation of the true form increases with the number of contributing measurements of the same high quality.Nevertheless, the VRT was also in this case a helpful tool in comparing all data in a better manner and in visualizing the differences between the five measurements.
For the toroidal surface, the VRTs of the two round robins were very stable: The RMS value of the differences was more than a factor of 40 smaller than the RMS values of the VRTs themselves.This proves that the concept of referencing to a VRT is very valuable.
Going forward, it would be interesting to calculate the VRT without removing the individual BFSes to get an approximation of the true form including the spherical form error.This would require greater consistency in the absolute measurement data.Additionally, when measurement uncertainties are available for all measurements, the VRT should be calculated by including the corresponding uncertainties as a weighting factor to further improve the reliability of the approximated true form.
Future work within the association CC UPOB e.V. will also address the measurement and comparison of additional information about an asphere, for example, the inner centration of double-sided aspheres.

Data availability statement
The measurement data analysed in the manuscript were gathered within round robin comparisons, organised by the Competence Centre for Ultraprecise Surface Figuring (CC UPOB e.V.).The CC UPOB e.V. (www.upob.de/) is a nonprofit association targeting at the increase of scientific knowledge in the area of ultraprecise surface figuring and its related measurement challenges.In the round robins, suitable aspheric and free-form test artefacts are circulated amongst the participants for characterization.The different resulting measurement data are individually communicated to the PTB.The PTB (corresponding author) is tasked to analyse and compare the results in an anonymous way and to compile the results, suitable for submission of a scientific publication.To keep the anonymity of the participants, the data cannot be made publicly available upon publication.The data that support the findings of this study are available upon reasonable request from the authors.

Figure 1 .
Figure 1.Images of the different specimens: Metal asphere mounted on a holder (a), glass asphere mounted on a holder (b), unmounted glass asphere (c), and specimen with a toroidal surface mounted on a holder (d).

Figure 2 .
Figure 2. Radii of the individual BFSes (a), (c) and the corresponding PV values resulting from the BFSes on the evaluated diameter (b), (d) for the measurements of the metal asphere in 2021 (a), (b) and 2022 (c), (d), together with the median values.

Figure 4 .
Figure 4. Metal asphere: pointwise differences between reduced data and VRT of measurements in 2021 (a) and 2022 (b).

Figure 5 .
Figure 5. Metal asphere: RMS values and MADs of differences between reduced data and VRT togehter with RMS value and MAD of VRT of measurements in 2021 (a) and 2022 (b).

Figure 7 .
Figure 7. Glass asphere: radii of the individual BFSes (a), (c) and the corresponding PV values resulting from the BFSes on the evaluated diameter (b), (d) for the measurements of the mounted glass asphere in 2019 (a), (b) and the unmounted glass asphere in 2021 (c), (d), together with the median values.

Figure 8 .
Figure 8. Glass asphere: VRT of measurements of the mounted asphere done in 2019 (a) and of the unmounted asphere done in 2021 (b).Note that the VRT of 2021 was evaluated on an aperture with larger diameter than in 2019.Therefore, in 2021 also the marker structures are clearly visible.

Figure 9 .
Figure 9. Glass asphere: pointwise differences between reduced data and VRT of measurements of the mounted asphere in 2019 (a) and of the unmounted asphere in 2021 (b).

Figure 10 .
Figure 10.Glass asphere: RMS values and MADs of differences between reduced data and VRT togehter with RMS value and MAD of VRT of measurements of the mounted asphere in 2019 (a) and of the unmounted asphere in 2021 (b).

Figure 11 .
Figure 11.Glass asphere: difference of VRTs (c) of measurements of unmounted asphere in 2021 (a) and mounted asphere in 2019 (b) evaluated on the same diameter.

Figure 12 .
Figure 12.Toroidal surface: radii of the individual BFSes (a), (c) and the corresponding PV values resulting from the BFSes on the evaluated diameter (b), (d) for the measurements of the toroidal surface in 2021 (a), (b) and 2022 (c), (d), together with the median values.

Figure 13 .
Figure 13.Toroidal surface: VRT of measurements of 2021 (a) and 2022 (b).Note that the VRT of 2022 was evaluated on an aperture with larger diameter than in 2021.

Figure 14 .
Figure 14.Toroidal surface: pointwise differences between reduced data and VRT of measurements in 2021 (a) and 2022 (b).

Figure 15 .
Figure 15.Toroidal surface: RMS values and MADs of differences between reduced data and VRT togehter with RMS value and MAD of VRT of measurements in 2021 (a) and 2022 (b).

Figure 16 .
Figure 16.Toroidal surface: difference of VRTs (c) of measurements of the toroidal surface in 2022 (a) and 2021 (b) evaluated on the same diameter.

Table 1 .
Overview of the number of measurements analyzed in this study.

Table 2 .
Summarized numerical results of all samples and measurement campaigns.