Assessment of 2D-based tests for the qualification of simulation software for dXCT

The interest in using computer simulations of dimensional x-ray computed tomography (dXCT) for various metrological purposes—such as measurement planning, performance prediction, performance optimisation and, finally, measurement uncertainty estimation—is increasing along with the ever-growing demand for more reliable measurements with dXCT. However, before a piece of simulation software can be used for tasks related to coordinate metrology, it has to be ensured that it is able to simulate physical laws, characteristic effects and basic CT system functionalities correctly and with sufficient accuracy. In short, the software must be qualified for dimensional metrology tasks. As one part of such a qualification process, a method is presented here for determining conformity intervals of 2D tests (projection-based tests) based on 3D tests (testing based on dimensional evaluations in a reconstructed volume) for the assessment of dXCT simulation software. The method consists of varying relevant parameter values in order to verify their influence on 3D measurement results. The results of the 3D tests with varied parameter values are then transferred to the quantities tested in the 2D tests and used as the basis for determining conformity intervals. Two approaches are applied for determining whether or not a variation of a parameter value is significant: (a) statistical and (b) heuristic. Two examples are presented, each based on simulated images, which show the application of the two different approaches for determining conformity intervals for the results of the 2D tests.

(Some figures may appear in colour only in the online journal) * Author to whom any correspondence should be addressed.
Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

Introduction
X-ray computed tomography (CT) systems have been used as coordinate measurement systems (CMSs) for almost two decades now [1]. Thanks to its holistic approach (acquisition of the inner and outer structures of an object with a single scan), CT enables measurement tasks in coordinate metrology that were previously not possible. Nonetheless, CT-based CMSs have yet to fully attain the same level of confidence and reliability enjoyed by conventional CMSs, i.e. tactile and optical systems. The inferior confidence still placed in CT for many applications can be attributed to the high complexity of the measurement process, which hampers the estimation of measurement uncertainty and, consequently, traceability to the SI unit of length, the metre.
Conventional methods applied in coordinate metrology for estimating measurement uncertainty often require either a comprehensive understanding of the measurement process (as described in the guide to the expression of uncertainty in measurement -GUM [2]) or a great deal of effort due to the need for repeated measurements of a calibrated standard (as described for tactile CMSs in ISO 15530-3 [3] and VDI/VDE 7 2617-8 [4], and specifically for CT in VDI/VDE 2630-2.1 [5]). Therefore, industry, research and calibration/measurement laboratories share a strong interest in acquiring time-, cost-and resource-efficient methods for estimating CT measurement uncertainty. Methods based on simulation seem to be a promising solution as they potentially yield more information with less effort and lower costs.
Several software packages are available for the simulation of CT measurements using radiographic modelling (e.g. [6][7][8][9][10]). These have already been used in research and industry for various applications as described, for example, in [11][12][13][14]. However, one important obstacle to using simulations for tasks related to coordinate metrology is the lack of standardised procedures (guidelines, standards, etc) for qualifying simulation software for the CT measurement process.
The general concept of simulation-based uncertainty estimation using the Monte Carlo method is described in GUM Supplement 1 [15]. Furthermore, ISO/TS 15530-4 specifies requirements for the application of uncertainty evaluating software to CMS measurements. It also describes testing methods for verifying uncertainty evaluating software and various test procedures for the evaluation of task-specific measurement uncertainty. However, both GUM Supplement 1 and ISO/TS 15530-4 do not yet consider the specifics of CT.
To overcome this limitation, the CTSimU project series was initiated with the aim of developing a standardised procedure to estimate measurement uncertainty using simulation. Due to the complexity of the matter, the approach was divided into three sequential research topics: (a) the CTSimU project dealt with the correct simulation of basic physical laws and functionalities (basic qualification) of CT simulation software [1], so that in step (b) methods to build and verify digital 7 VDI = Association of German Engineers and VDE = Association for Electrical, Electronic and Information Technologies twins (i.e. digital replicas of real CT systems) using qualified CT simulation software can be developed (CTSimU2), thus enabling (c) the development of standardised methods and procedures, using digital twins, to estimate the uncertainty of CT measurements. The outcomes of the CTSimU project were presented to and are being discussed by VDI/VDE technical committee 4.33 (until 2022: 3.33) for the development of a guideline for the basic qualification of CT simulation software for dimensional metrology using CT-based CMSs.
Before a CT simulation software tool can be used for tasks related to the estimation of measurement uncertainty, it has to be ensured that the relevant physical laws, characteristic effects and basic functionalities of the CT measurement process are correctly reproduced. In other words, the software needs to be validated for specific tasks related to dimensional measurements. This basic qualification process aims at qualifying simulation software based on a test framework (i.e. a set of tests) in which several physical laws, characteristic effects and software functionalities are tested separately. These physical laws and characteristic effects are translated into requirements that a simulation software package must fulfil to be considered 'basically qualified'. To test simulation software efficiently, the test framework is divided into 2D and 3D tests, see figure 1. The 2D tests are carried out based on a single or a few projections of specific test scenarios to evaluate one or more physical effects or functionalities. Data evaluation of the 2D tests is based on 2D image processing techniques developed within the CTSimU project [16][17][18]. The 3D test verifies the ability of the simulation software to perform complete CT scans. Data evaluation and testing are based on dimensional measurements performed on the reconstructed volume.
How does one decide if a simulation software program reproduces a physical law and/or a characteristic effect and/or a functionality with sufficient accuracy? To answer this question, a method to determine conformity intervals for the 2D tests based on simulations of complete CT scans was developed and is presented in this contribution. The method consists of varying specific scan parameters (to be tested in the 2D test scenarios) in the simulation of a meaningful, complete CT scan to verify their influence on dimensional measurements. The conformity interval is, in this context, a range of values of a 2D metric in which the results produced by simulation software tools, while being tested, must lie in order to be considered 'basically qualified'.
Next, the approaches for CT simulation that were applied in the project are briefly presented, followed by a detailed explanation of the proposed method for determining conformity intervals. Finally, two examples of how to determine conformity intervals for 2D tests are described, one for the simulation of projection noise and another for the simulation of a geometric offset of the rotary table.

CT simulation
The two main approaches to the simulation of radiographic images are: (a) deterministic ray-casting simulations and (b) stochastic Monte Carlo-based particle physics simulations. Approach (a) essentially reproduces the Beer-Lambert law of radiation attenuation along rays in the virtual scene, and (b) follows the trajectories of individual photons and their interactions with matter by taking into account the phenomena of radiation extinction (i.e. absorption and scattering processes).
In both approaches, a virtual CT with its main components (i.e. x-ray source, rotary table/measurement object and x-ray detector) and their geometrical relations and radiation interactions must be implemented with adequate precision. For example, the CT geometry -meaning the relative positions and orientations of the x-ray source, rotary table/object and detector -and information on the object geometry have to be defined in the software environment. Simulation software programs additionally require information on the chemical composition and density of the materials that make up the components (e.g. the object, simulation environment) as well as the parameters for the generation of the x-rays (e.g. acceleration voltage, anode material) and the detection of the radiation (e.g. scintillator material, size of the detection elements). The object geometry can usually be imported into the simulation environment as surface meshes (e.g. STL files) or CAD geometry files.

Approach for determining conformity intervals for 2D tests
The main task of the test framework is to verify the ability of a software program to simulate physical effects and the CT geometry with sufficient accuracy. Insufficient simulation accuracy of the effects and geometry will have an impact on the reconstructed volume and potentially on the measurement result, causing measurement deviations. As a consequence, any software which is not able to reproduce effects and system geometry with sufficient accuracy can most likely not be used for metrological tasks. A simulation software tool should therefore be able to simulate physical quantities with sufficient accuracy such that no significant deviation from the expected measurement result is observed (e.g. manifested as a measurement error).
But how does one decide if a simulation software tool reproduces physical quantities with sufficient accuracy? To answer this question, this contribution proposes a method for determining conformity intervals for the assessment of 2D tests based on the dimensional results of simulations of complete CT scans (i.e. 3D simulations). The method consists of a stepwise variation of parameter values from their nominal value. The goal is to determine whether a significant 3D measurement deviation is caused by the varied parameter value. Parameter value variation in this context means that the value of a single parameter, such as noise, is varied in the positive and negative directions from a predefined nominal value (see figure 2).
3D simulations are carried out with the multi-geometry cuboid (section 3.2) and a specially designed ideal simulation scenario (section 3.3). Complete scans of the nominal (p n in figure 2) and varied values of a parameter (p i in figure 2) are simulated and dimensionally evaluated. The measurement results obtained with varied parameter values (R_p i in figure 2) are then compared with the results achieved with nominal conditions (R_p n in figure 2). If the variation value of a parameter significantly influences the measurement result (i.e. the result differs substantially from the result obtained with the nominal parameter value), the variation behaviour is used for the determination of the conformity interval. The tolerable amount of parameter variation (measurement result remains within limits) is used for defining the conformity interval. The effects are tested separately to achieve maximum sensitivity, thus ensuring that even a small simulation error will be caught by the test framework.
To determine whether the variation has a significant influence on the measurement result compared to the result obtained with the nominal parameter value, two approaches are applied: (a) a statistical method and (b) a heuristic threshold based on the literature and expert knowledge, see section 3.1. Figure 2 schematically illustrates the concept for determining the conformity interval. Here, p is considered to be a parameter of a physical quantity to be tested based on a radiograph. The conformity interval for p is established by simulations of complete CT scans followed by dimensional measurements, where the parameter value p i is varied stepwise (with sign) from its nominal value (p n ). If the measurement deviation generated by a varied parameter value p i is significantly larger than the result obtained with the nominal value (i.e. |R_p i | > |R_p n |), then this variation value cannot be used for the determination of the conformity interval and the last tolerable parameter value is used for appointing the conformity interval.
Some quantities of the test framework, such as noise, require no further steps for the definition of the conformity intervals because the 2D test is carried out with the same parameters as the 3D test. As soon as the upper and lower limits in 3D are found, they can be directly used as the boundaries of the conformity interval. However, there are some test quantities where the parameters measured in 2D are different than the parameters measured in 3D, for instance geometrical parameters, which require an additional step for transferring the limits from the measurands in 3D to the measurands in 2D. In such cases, the last variated value (results of 3D) for which the result is similar (within limits) to the results obtained with nominal conditions is simulated in 2D to find the conformity interval for the 2D measurand of that test.

Definition of thresholds
How does one determine if the measurement deviation caused by the variation of a parameter is significant? Two methods are suggested and were applied for determining whether the 3D measurement deviation caused by the variation of a parameter value is significant. The approaches are based on (a) a statistical hypothesis test and (b) a heuristic threshold.
In the statistical method, both measurement results (varied R_p i vs. nominal R_p n ) are compared statistically with one another based on a hypothesis test. The result of the statistical test indicates with a certain level of confidence whether both results (R_p i vs R_p n , cf figure 2) are considered different. If a value of a parameter variation p i is found to cause a statistically significant difference from the result obtained using the nominal parameter value p n , this variation value p i will most likely (with 95% probability for a statistical significance of α/2 = 0.05) cause a significant measurement deviation. This approach requires a series of repeated simulations. To improve the sensitivity of the test for different parameter variations, the test scenarios are simulated using (almost) ideal scenarios (see section 3.3), i.e. all influencing parameters, apart from those under test, were switched off (including noise). Consequently, repeated simulations can lead to identical results if the parameter under investigation is not itself introducing an element of randomness to the simulation (as is the case for image noise). Such fully deterministic cases (like specific geometric misalignments), disallow the application of a statistical test since repeated measurements of a varied parameter would lead to identical results. Instead, a heuristic threshold must be applied here.
The definition of the heuristic threshold is based on the literature and expert experience ( [19,20]). The heuristic threshold method compares the dimensional measurement results of the simulations (difference of varied vs. nominal) with a predefined threshold that depends on a predefined fraction of the voxel size of the reconstructed volume (e.g. 10% × voxel size).
Not all measurands react sensitively to all variated parameter values, but as soon as a single measurand (of the measurands described in section 3.2) exceeds the threshold or shows statistical difference, the parameter variation p i will be considered to have a significant influence on the measurement result. Consequently, the boundary of the conformity interval is determined from a 2D test of a simulation using the last parameter variation p i-1 and p -i+1 that was still considered 'insignificant'.

Reference standard used in the 3D simulations
The 3D simulations used for the determination of the conformity intervals of the 2D tests were carried out with the multi-geometry cuboid [21]. The multi-geometry cuboid has a scalable geometry (it was used here with dimensions of 22.2 mm × 21.8 mm × 15.0 mm) and features 37 inner and outer geometrical elements (i.e. cylinders, cones, planes, half-spheres/calottes, tori), see figure 3. Based on the geometrical elements, 54 measurands of different complexity (including distances, diameter, cylindricity, concentricity, etc) were measured, see table 1. To identify the geometrical elements in the reference standard, the labels of all 37 geometrical elements are presented in figure 3. The set of measurands intentionally included both simple and relatively complex measurement tasks, thus guaranteeing that a minimum number of measurands would sensitively react to the variation of the respective parameter values and detect potential undesired effects and hidden errors in the simulation software. Further applications of this standard can be seen in [22,23]. For the current contribution, reconstruction (standard filtered back projection), local gradient-based surface determination and data evaluation were carried out automatically (using scripts) in the software VG Studio Max 3.5 by Volume Graphics GmbH, Heidelberg, Germany.

3D basis scenario
A 3D test scenario was designed to simulate the variation of the parameter values with the multi-geometry cuboid. The scenario is designed to sensitively react to variations in different factors influencing the measurement results, but it still features a realistic scan scenario, see figure 4. To guarantee maximum sensitivity, effects were considered separately with simulations carried out under (almost) ideal conditions.  Table 1 shows the explanation and allocation of the features to the specific measurement tasks. Table 1. 54 measurands measured in the multi-geometry cuboid. Measurand numbers (in italic) match the presentation of results in section 4.

Measurand type
Designation Geometrical elements used (bold face entries indicate reference datums) Point-to-point distance This means, apart from the parameter under test, all other influencing factors were set to exactly match the reconstruction parameters (e.g. for the CT geometry parameters) and any image quality degrading effects were switched off (e.g. detector unsharpness, projection noise). The simulations were performed with an ideal energyintegrating detector, a polychromatic x-ray spectrum (150 kV) and the multi-geometry cuboid made of aluminium. The environment was assumed to be air.
This scenario was designed to have a magnification factor of 16.6667, a beam opening angle of ∼9.1 • , and a detector pixel size of 680 µm, which results in a voxel size of [∼40.8 µm] 3 . All 3D simulations were carried out with 1500 projections (in equidistant angular steps of a complete 360 • rotation). All simulations were performed using the analytical simulation tool aRTist 2.10 from BAM, Berlin, Germany [6], which follows the approach (a) mentioned in section 2. The software also contains analytical models for the x-ray source and the detector. The also possible Monte-Carlo-based simulation of x-ray scattering processes within aRTist has not been used in this study.

Projection noise
How can projection noise be tested in 2D? The simulation of noise can be tested based on a single free-beam projection image (i.e. radiographic image with no object between source and detector). A possible way of doing that is based on comparing the results of a nominal signal-to-noise ratio (SNR) with those of the measured SNRs. The SNR can be calculated as defined in equation (1). Data evaluation is carried out using the CTSimU Toolbox [16,17]. SNR = mean signal amplitude mean noise amplitude = µ σ (1) In equation (2), N is the number of pixels in a certain evaluated region of interest (ROI), i is a sequential pixel index within the ROI, and x i is the grey value of pixel i. The mean grey value within the ROI is denoted as µ.
How does one define conformity intervals for the projection noise test? The definition of conformity intervals for the projection noise test is based on the proposed method presented in section 3.1 and on the proposed statistical test method. The generic test scenario presented in section 3.3 was used for this test.
The SNR values were varied stepwise and compared to the nominal value. For every new variation of the SNR value, five repeated 3D simulations were carried out and the results compared statistically with the results of the nominal SNR value, for which 20 repeated simulations were performed. When a result (of a single measurand) reveals a statistical difference, this means that this variated SNR value already influences the measurement results significantly and that the magnitude of its influence cannot be accepted as tolerable. In such cases, the last tolerable value is used for the conformity interval.

Statistical approach
Considering that different SNRs deliver results with unequal variances, a comparison based on the Welch's statistical hypothesis test (t-test for unequal variances) was conducted [24].
The Welch's test is based on the acceptance or rejection of two hypotheses: the null hypothesis (H 0 ) assumes that the nominal (X N ) and variated (X V ) averages of the sample are equal, while the complementary hypothesis (H 1 ) assumes that they are different, see equations (3) and (4): The sample averages were calculated from 20 repeated simulations of the nominal SNR (a number suited to obtain a more robust statistic given that the nominal SNR is the reference) and from five repeated simulations of the varied SNR values. The different sample sizes are considered in the calculation of the Welch's statistic and in the estimation of the degrees of freedom (equations (6) and (7)). The comparison is carried out for each measurand individually.
Considering a two-sided rejection region of the H 0 , H 0 can be rejected and H 1 accepted if the statistic t w is larger or smaller than the tabled t-student value for v degrees of freedom, α/2 (0.05) significance level and 95% probability: The Welch's statistic (t w ) is calculated as a function of both sample averages (X N andX V ), the standard deviation (s N and s V ) and the number of repetitions (N N and N V ) according to equation (6): The number of degrees of freedom, v, is calculated based on equation (7), where v N and v V are the degrees of freedom associated, respectively, with the standard deviations of the nominal sample and the samples with varied SNR values. One advantage of the Welch's test over the standard t-test is that the Welch's test delivers a more robust statistic with respect to type 1 errors (i.e. false positives) than does the ttest, given that it estimates the statistical degrees of freedom for both samples.

How does one determine the conformity interval for the '2D projection noise' test for a nominal SNR of 100?
The goal is to find both a higher and a lower varied SNR value (around the nominal SNR of 100), p i and p -i , that significantly influence the measurement results. To do so, five repeated simulations of each varied SNR value (i.e. SNR …, 98, 99, 101, 102, …) were carried out and compared statistically with 20 repeated simulations of the nominal SNR of 100. When the normalised result t w /t α/2,v (derived from equation 5 was larger than 1, H 0 (X N =X V ) could be rejected and H 1 (X N ̸ =X V ) accepted, meaning that the samples could be considered statistically different. This statistical comparison was carried out individually for each of the 54 measurands. The proposed method suggests that as soon as a single measurand of a varied SNR value presents a significant reaction (statistical difference), this varied SNR value cannot be accepted as tolerable, and the next acceptable value should be used for determining the conformity interval.
The results of the statistical comparison based on the Welch's test for the 54 measurands are presented in figure 5.
For the sake of simplicity and graph legibility, figure 5 only presents the results of the statistical comparison of SNRs 95, 96 102 and 103. The graph shows that all results (t w /t α,v ) obtained with SNR 96 and 102 lie below 1, while the results of three measurands of SNR 95 (4. Con_Cyl61-1 c-Con61-0, 29. Dia_Sph42 and 53. SFo_Cal12) and two measurands of SNR 103 (11. Cyl_Cyl32-c and 36. Fla_Cub52-1d) were statistically different when compared to SNR 100. As SNRs 95 and 103 influence the measurement result significantly, they cannot be accepted as tolerable, so the conformity interval of SNR 100 is [96, 102] (for the given step size of one in the presented test series). This means that when simulating an SNR of 100, the simulation software is allowed to wrongly produce projection images with SNRs between 96 and 102, with no significant influence on the 3D measurement result (see figure 6). We note that this simplified test was carried out with Gaussian noise distributions. Other distributions are expected to have a different influence and could therefore result in different conformity intervals.

How can the position of the rotary table be tested in 2D?
The conical hole sheet standard [21] can be used (see figure 7) to test, based on a single radiographic image, if a simulation tool correctly positions and orients the x-ray source, rotary table and detector relative to one another. The standard features 10 holes which have a distorted conical shape and whose surfaces are oriented towards the focal spot of the x-ray source. The holes are designed in this way so as to improve the precision of data evaluation (segmentation) by creating sharp projected hole edges on the detector.  To verify the correct positioning and orientation of the rotary table based on a simulated radiograph, the relative distances and vector orientations between all possible hole pairs of the hole sheet are used along with the absolute positions of all holes in the pixel coordinate system. Specifically, an incorrect positioning of the rotary table can be identified by determining the distance between holes as well as the absolute positions of the holes. The reference data evaluation of the geometrical parameters test is implemented in the CTSimU Toolbox [16,17]. of conformity intervals for this test is based on the approach presented in section 3.1 and on the heuristic threshold method. The generic test scenario with the multi-geometry cuboid presented in section 3 was used for this test. The position of the rotary table was varied stepwise (in 1 µm steps) and compared to the nominal position (source-object-distance -SOD = 127.5 mm) in the direction of the magnification (x-) axis and also perpendicular to the rotation axis (i.e. in the ydirection) separately, see figure 4 for the coordinate system. For every new position, a new 3D simulation was carried out and the results compared with the measurement results for the nominal position (difference between varied and nominal measurement result). If this difference (for a single measurand) exceeded the heuristic threshold of 10% × voxel size (i.e. 4.08 µm), this indicated that the associated variated position already exerted a significant influence on the measurement results and hence that this position could not be accepted as tolerable. The last tolerable value is therefore used for the conformity interval. Now that it is clear which 3D positional variations are tolerable, this 3D conformity interval must be transferred to the actual 2D test because the 2D test cannot measure the 3D geometry parameters directly. Instead, it uses the position and vector orientation of the projected holes on the detector plane to conclude whether or not the hole sheet is positioned correctly. The 2D test does not measure the hole sheet's position in 3D space, but uses very different, derived parameters: the mean distance between the holes to check for scale deviation; the mean vector rotation angle to check for rotational misalignments; and a mean translation vector to check for positional misalignments. Because all three of these parameters are mean values for an ensemble of ten hole coordinates (and, more specifically, their possible combinations), they all come with a root mean square deviation (here called standard deviation). Those three standard deviations are important parameters for identifying if something went wrong in the 2D test. In total, this results in six distinct measurands for the 2D test.

Conformity interval for the position of the rotary table
To obtain a conformity interval for the 2D test, the geometry variations that were still deemed tolerable in the 3D test are used as deliberate misalignments of the hole sheet in simulations of the 2D test. Preliminary simulations have shown that the tested effects present a linear behaviour in 3D and 2D. The resulting projection images are run through the evaluation process, and the evaluation results from the Image Processing Toolbox [16,17] then constitute the 2D conformity intervals for each of the six measurands of the 2D test. Each 2D measurand can react with a different sensitivity to the simulation of the still tolerable variations (found from the results of the 3D simulations). The question now is how to select a conformity interval based on a single projection of the still tolerable variation for the 2D measurands? The criterion used must take into account that the observed error should be large enough to be considered significant but at the same time small enough that a slightly imprecise reproduction of the effects is 'caught' in the 2D test. Thus, the smallest variation whose absolute mean result minus three times its standard deviation is still larger than the absolute mean of the nominal condition plus three times its standard deviation is taken as the conformity interval (see equation 8): Nom and σ Nom are the respective mean and standard deviation acquired with nominal conditions. Var i and σ_Var i are the respective mean and standard deviation acquired with the variated parameter values.
The results of the 2D measurands of the conical hole sheet are presented in table 2. In bold are the first results found to fulfil the criterion presented in equation (8) 2D test of the rotary table position, see table 3. The graphical representation of the 2D test results is presented in figure 9. The graphic results permit the visual identification of the lowest variation mean that satisfies the criterion presented in equation (8); in other words, the smallest mean variated value that causes a significant change in the results in 2D. The criterion compares 'only' the means of the results. There are, however, parameter variations that cause a stronger change in the standard deviation than the mean of the results. For this reason, the variances of the variated parameter value and are also compared with the nominal values. To verify if the variances between the nominal and variated values are significantly different, the ratio between the standard deviations (σ 2 /σ 1 for σ 2 > σ 1 ) of the nominal and variated parameter values is used, with σ 2 representing the greater of the standard deviations obtained from the nominal or variated values. The smallest standard deviation for which the ratio σ 2 /σ 1 is greater than 10 is used for the conformity interval of the rotary table position test. A graphical representation of these example results is presented in figure 10. The graphs in figure 9 make clear that for the mean of the 2D measurands (scale deviation, rotation angle and translation vector), the smallest mean that fulfils the condition expressed by equation (8) is yielded when the position of the rotary table is mistakenly offset by −21 µm and 25 µm in the x-direction and −21 µm in the y-direction. No significant difference was observed for the standard deviations from the tested variations. Now that the influence of the effects causing a significant deviation in 3D on the 2D measurands is known, the conformity interval is taken based on the smallest results to guarantee that wrongly simulated parameters (even small errors) will be detected.

Summary and discussion
The correct simulation of a measurement process is essential to ensure the ability to later estimate the measurement uncertainty using simulations of a digital twin.
A method for determining conformity intervals for the 2D testing of radiographic simulation software for physical laws, characteristic effects and basic functionalities was presented. The method is based on simulations of complete scans of the multi-geometry cuboid. Parameter values associated with physical laws, characteristic effects or basic functionalities are varied, simulated and evaluated in 3D. Varied parameter values that are found to lead to a significant change of the measurement results serve as a basis for determining the conformity intervals. Two approaches were proposed in this contribution to determine whether a measurement change caused by a variation of a parameter value is considered significant: (a) a statistical approach and (b) a heuristic threshold. In (a) the results of the 3D simulations are compared statistically by means of a hypothesis test. In (b) a heuristic threshold depending on a Figure 11. Official logo of the WIPANO funding Programm and the Federal Ministry of Economic Affairs and Climate Action fixed fraction of the voxel size is applied. If the measurement deviation resulting from the varied parameter value was above this predefined threshold, the varied value was considered to influence the measurement result significantly. Based on this decision, the conformity intervals were determined.
Two application examples were presented for the determination of the conformity interval, one for the test of projection noise and one for the test of the rotary table's position. To verify the significance of the results, the statistical approach was applied to the projection noise example and the heuristic approach applied to the rotary table position.
As concerns the methods for testing the significance of the measurement result, the conclusion is that both approaches are promising and that they complement one another. One noteworthy advantage of the statistical approach is that the threshold depends on the variance of the effect being tested and on the measurand. However, the method is time-consuming as it requires a large number of simulations (per variation). Further, it is applicable only to non-deterministic effects since it considers the variance of the effect to create the threshold, and it involves relatively complex data evaluation. As for the heuristic approach, it is fast because it requires only one simulation (per variation), it involves simple and straightforward data evaluation, and it is generally applicable to distinct effects. On the other hand, the method works with a static threshold (depending only on the voxel size of the scan) that does not consider the effect variance, and this threshold was chosen empirically.
Besides the presented examples, the methods described in this contribution were applied to determine the conformity intervals of further effects, e.g. CT geometry, detector pixel sampling and unsharpness, and focal spot size. Beyond that, both methods will serve as the basis for the definition of conformity intervals in the CTSimU2 project for the evaluation of the CT system digital twins to be built during the project.

Credits
Markus Bartscher (MB), Florian Wohlgemuth (FW), Carsten Bellon (CB) and Stefan Kasperl (SK) contributed to funding acquisition and project administration. Fabrício Borges de Oliveira (FB), Tamara Reuter (TR), David Plotzki (DP), MB, FW, CB and SK contributed to the review and finalization of the full paper. FB, TR, DP and FW contributed to the conceptualization, methodology, investigation and formal analysis. FB, DP and TR wrote the original draft.

Data availability statement
The data cannot be made publicly available upon publication because the cost of preparing, depositing and hosting the data would be prohibitive within the terms of this research project. The data that support the findings of this study are available upon reasonable request from the authors.