Analysis of a regional metrology organization key comparison: model-based unilateral degrees of equivalence

A statistical testing method is developed to analyze a Regional Metrology Organization (RMO) key comparison (KC) with linking to the corresponding KC conducted by the International Committee of Weights and Measures (CIPM). We establish a statistical model to which a generalized least-squares method is applied to determine a linking invariant, ensuring that the CIPM KC reference value is unchanged by the analysis. The proposed method generally gives different unilateral and bilateral degrees of equivalence (DOEs) from those assessed by other methods. The approach accounts for uncertainty information provided by all participating laboratories and correlation information provided by the linking laboratories. Since decisions are made on the basis of calculated DOEs, we emphasize that it is valuable to have adequate knowledge of the properties of methods used for analyzing the results from an RMO KC.


Introduction
The International Committee of Weights and Measures (CIPM) Mutual Recognition Arrangement (MRA) [1] is technically supported by key comparisons (KCs), for which purpose the CIPM Consultative Committees (CCs) provide KCs in many metrology disciplines.The KCs are implemented to establish the degree of equivalence of national measurement standards maintained by national metrology institutes (NMIs) and to provide for the mutual recognition of calibration and measurement certificates issued by NMIs.The calibration and measurement capabilities (CMCs) of participating NMIs must be consistent with the results derived from the KCs.
Although there is no official guideline for CIPM KC data evaluation, many analyses are conducted based on peerreviewed publications such as those given in the references to this paper to determine the unilateral and bilateral degrees of equivalence (DoEs) for the laboratories participating in a KC.It is noted that DoEs consist of both value and uncertainty components, while the value components are often misleadingly referred to as the unilateral or bilateral DoEs.
These DoEs are measures indicating the extent to which the conducted measurements are consistent with a consensus value, the KC reference value (KCRV), and among themselves.
The following is noted in the MRA document: 'Participation in a CIPM key comparison is open to laboratories having the highest technical competence and experience, normally the member laboratories of the appropriate Consultative Committee.'[1, clause 6.1]To establish global metrological traceability, frameworks in addition to the CIPM KCs are needed.The MRA document [1] explicitly requires Regional Metrology Organization (RMO) KCs and RMO supplementary comparisons to take place.These RMO comparisons must be linked to the results in the CIPM KCs to show the degree of consistency of the measurements among laboratories participating in the RMO and CIPM comparisons.In this regard, the MRA document states 'RMO key comparisons must be linked to the corresponding CIPM key comparisons by means of joint participants.The degree of equivalence derived from an RMO key comparison has the same status as that derived from a CIPM key comparison.' [1, clause 3.2] 'The results of the RMO key comparisons are linked to key comparison reference values established by CIPM key comparisons by the common participation of some institutes in both CIPM and RMO comparisons.'[1, annex T.4].
An approach is therefore needed to assess the unilateral DoE that is directly related to the KCRV in the CIPM KC, which is referred to as the CIPM KCRV in the present study.
Similarly to the CIPM KCs, there is no official guideline for the analysis of the RMO KCs but procedures given in papers such as [2][3][4][5][6] can be applied.A concern is that in our understanding no statistical (hypothesis) testing approach has been suggested in these studies.Generally, testing and estimation are the two main approaches to decision making from data.
There are some studies where no statistical model is specified so that it is difficult to say which of the two approaches is employed.For example, Elster et al [3] describes such a method for which we can naturally derive the same procedure as theirs by using an estimation approach as shown in appendix C. Further, Kharitonov and Chunovkina [4] report two methods to give unilateral DoEs without adopting a statistical model.One of the two methods, called procedure C in their paper, is essentially incorporated in the method proposed by Decker et al [2] to develop a practical procedure.
Thus, as far as we know, there has been no method in which a statistical testing approach based on a specific statistical model is explicitly given.One possible drawback of the simple application of such a model may be that the measurand estimated by the CIPM KCRV would be re-estimated through the RMO KC analysis if special treatment were not given, as explained in section 3.
That the CIPM KCRV is not to be changed by the results of an RMO comparison is supported by [7, chapter 4] and [8] and especially by the statements '. . .RMO key comparisons extend the metrological equivalence established by the CIPM key comparisons to a greater number of national metrology institutes . ..' [1, annex T.9] 'The linkage does not modify the value and the uncertainty of the master CIPM key comparison reference value, which remains unique and non-altered for the whole family [the CIPM KC and the corresponding RMO KCs].It simply extends the matrix of equivalence and the graph of equivalence in order to give evidence on the comparability between institutes that have participated . ..' [9] In this study, we develop a statistical data analysis of RMO KCs in which the measurand in the CIPM KC is not reestimated, and consequently obtain unilateral and bilateral DoEs as statistics in a standard testing approach.Although re-estimation is largely inconsistent with the objectives of the MRA, it possesses some technical advantages in that with a large set of RMO participants, it would be expected that the resulting consensus value would often be better defined.
Moreover, to show explicit linking between the unilateral and bilateral DoEs and the CIPM KCRV, we introduce a 'linking invariant', which is also considered in the study by Decker et al [2].To determine the linking invariant, a generalized least-squares (GLS) method is developed in which the estimate, the KCRV, of the measurand in the CIPM KC is fixed.Then, we propose unilateral and bilateral DoEs to be clearly used as statistics in statistical testing using the linking invariant.These unilateral and bilateral DoEs are generally different from those in previous studies.We will characterize those differences in this paper through the application to an actual example and simulations.
Literature searches indicate that quantitative information on the correlation coefficients, which is important information in RMO KC analyses, is often not available.There are exceptions.In the fluid flow example in section 5, the correlation coefficients were assessed as 0.8 [10].In a KC of lamp spectral irradiance [11] for wavelengths from 250 nm to 2500 nm, participating laboratories were requested to separate uncertainties due to correlated and uncorrelated effects.These effects depend on the lamp being compared and the wavelength used.Based on that information, for low wavelengths, correlation coefficients for one laboratory were ≈ 0.8, whereas for high wavelengths they were ≈ 0.4.On the other hand, for a vibration accelerometer comparison [12], it is stated that, since no information about correlations is available, the data are treated as being uncorrelated.The last case appears to be quite common despite the frequent availability of uncertainty budgets from which the correlation coefficients could potentially be determined [13].In the present study, we handle only cases where the correlation coefficients between the reported values from identical laboratories are quantitatively reported or assessed.
This paper is organized as follows.Section 2 gives assumptions on the analysis, shows mathematical forms for the value components of unilateral and bilateral DoEs, and discusses general principles of the statistical testing approach we use as an extension of the analysis of a CIPM KC.Section 3 provides a GLS approach to estimate the linking invariant and makes comparison with other available studies.Section 4 provides the uncertainty evaluation of the DoEs.Section 5 applies the analysis to an actual example from the fluid flow area.We discuss the features of our proposal in section 6 through simulations.Section 7 gives a brief summary of this study.
In this work, as in the GUM [14], for economy of notation the same symbol is used for the random variable that represents a quantity and a realization of the quantity.For improved visualization, zero elements in some matrices are replaced by blanks, such as in expression (26).
All calculations given in this paper were made using MATLAB R2022a General Release (9.12.0.1884302) using an Intel ×64-based processor with an i5-5300U core, a 2.30 GHz CPU and a 64 bit operating system.

Available information
Let I L = {1, . . ., L} denote the indices of the linking laboratories in the CIPM and RMO KCs, I x = {L + 1, . . ., M} those of the non-linking laboratories in the CIPM KC, and, finally, I y = {L + 1, . . ., N} those of the non-linking laboratories in the RMO KC.
The following information is assumed to be available: 1. Data x i , u(x i ), i = 1, . . ., M, representing values and associated standard uncertainties, respectively, provided by the M participants in the CIPM KC; 2. Analogously, y i , u(y i ), i = 1, . . ., N, provided by the N participants in the corresponding RMO KC; 3. Correlation coefficients ρ i , i = 1, . . ., L, provided by the L laboratories that participated in both comparisons, the linking laboratories; 4. The CIPM KCRV x ref , taken here as the weighted mean (WM) of the data in point 1.
Regarding point 4, we consider only cases where the CIPM KCRV x ref is obtained as the WM of the x i , i ∈ I L ∪ I x , as given later in expression (4).Some of the reported data might not be used in actual CIPM KCs in the computation of the WM.We will give some remarks on that possibility in appendix B. Further, although our approach can potentially be adapted to choices other than the WM, we do not consider that option here.

Linking invariant and DOEs
In our analysis, the linking of the data in an RMO KC to a CIPM KC is expressed explicitly.For this purpose, we introduce a linking invariant h link as do Decker et al [2].The value y i + h link may be interpreted as what would have been reported by laboratory i in the RMO KC had it actually participated in the CIPM KC.
In this subsection, we show the computation and discuss the interpretation of unilateral and bilateral DoEs using the linking invariant.Here, the value component of the unilateral DoE is a statistic that expresses the difference between the reported value and the CIPM KCRV after adjustment by a linking invariant.Similarly, the value component of the bilateral DoE expresses the difference between the reported values from two laboratories after adjustment by a linking invariant.
The value component of the unilateral DoE for non-linking laboratory j ∈ I y in an RMO KC using the linking invariant h link is ( The bilateral DoE is the difference between the unilateral DoEs after the adjustment.When values from the two KCs are to be compared, the value component of the bilateral DoE between laboratory j ∈ I y in the RMO KC and laboratory ℓ ∈ Because the CIPM KCRV remains unchanged, the unilateral DoE of a linking laboratory is also not adjusted based on the results of an RMO KC, and expression (2) is employed for the relation between a non-linking laboratory in the RMO KC and a linking laboratory.The value components of the bilateral DoEs between non-linking laboratories j ∈ I y and ℓ ∈ I y in the RMO KC are similarly given by (3)

Interpretation of the unilateral DoE in a CIPM KC analysis
The value components of the DoEs in section 2.2 are regarded as test statistics.This interpretation can be considered to be a natural extension of the practice of a typical CIPM KC analysis.How we can interpret the unilateral DoEs in a CIPM KC is thus explained in this subsection before showing the statistical approach for RMO KC analyses.
The CIPM KCRV x ref and its associated standard uncertainty u(x ref ) are given by Cox [15]: and the unilateral DoE (d where and k is the coverage factor, which is equal to 1.96 under the assumption of normality. Candidates for a statistical model for the unilateral DoEs (5) in a CIPM KC include with which we can derive the model Here µ x denotes a value of the measurand and N(µ, σ 2 ) the normal distribution with mean µ and standard deviation σ.It is noted that while x ref is an estimate of the measurand, µ x is the true value of it.Bayesian hypothesis testing for model (7) has been suggested by Kacker et al [16], in which the unilateral DoEs ( 5) are given as test statistics.
For statistical testing, the following score, called the E n score, for the jth participant is computed from the unilateral DoE: The E n score suggests 'satisfactory' performance when its absolute value is no greater than unity and 'unsatisfactory' otherwise.The probability of 'unsatisfactory' performance for a specific laboratory when model ( 7) holds is 5 %, that is, on average once in 20 KCs.Performance evaluation using the E n score can hence be interpreted as a statistical test with a significance level of 5 %.Note that the statistical test is not for the consistency of the values in aggregate like the χ 2 test in [15] but that of the value provided by a specific laboratory.In other words, E (CIPM) j is not used as a component of multiple tests for model (7) but a sole statistic for it with high power of test against the bias from µ x in the distribution of x j .When the main interest of a CC Working Group is whether a participant has a bias or not, the hypothesis testing approach using this model can be used.
Our interpretation of unilateral DoEs as test statistics is explained here.Performance is satisfactory when the value component of the unilateral DoE is not significantly different from zero.When performance is unsatisfactory, it is expected that investigation to find causes of the inconsistency is conducted.It is not generally expected that the computed value parts of unilateral DoEs are used for compensation purposes.
Alternative interpretations are possible.For instance, there are some studies where the value components of unilateral DoEs are interpreted as estimators of hidden biases.Details are given in appendix C.1.It is worth noting that the bias estimation approach gives the physical meaning of the DoEs as hidden biases and their associated uncertainties.Based on this idea, once the non-zero hidden biases are discovered in the KCs, they may need to be corrected regardless of the performance evaluations of the E n scores (because they are physically meaningful biases).Our impression is that since such corrections are not often implemented in many metrology fields, most CIPM CCs regard the DoEs as non-physical but statistical.
However, since both approaches can give the same unilateral DoEs in a CIPM KC as shown in the present subsection and appendix C.1, the choice does not influence the computation.Further, in an RMO KC analysis, there could be differences in the computed DoEs provided by these two approaches.

Statistical model to be validated in an RMO KC analysis
The value components of the DoEs specified in section 2.2 are considered as test statistics for statistical testing.Such testing is implemented in this study to check the validity of statistical models. Define where V i is the covariance matrix associated with z i .We assume the following statistical model for the generation of the data in the CIPM KC and those reported by the linking laboratories in the RMO KC: where µ = [µ x , µ y ] ⊤ , N(µ, V) denotes the bivariate normal distribution with mean µ and covariance V, and µ y denotes a value of the measurand in the RMO KC.Defining the difference between the values of the CIPM-and RMO-KC measurands as η = µ x − µ y , the estimate of η is the linking invariant h link .The definition of η is given so that h link is consistent with the linking invariant defined by Decker et al [2].
It should be noted that model (9) is not checked by the value components of the DoEs specified in section 2.2 because that is confirmed by the interpretation in section 2.3.Although the correlation information is not validated in the CIPM KC, we assume it is reliable, being based on information provided in carefully checked uncertainty budgets, for example.
The statistical model to be validated using the DoEs specified in section 2.2 is Model (10)  .Then, the E n scores are used for statistical testing.

Linking invariant as a GLS solution
We consider first the best linear unbiased estimators (BLUEs) of µ x and η having the GLS solutions x blue and h blue , respectively, using the data in model (9).
The solution vector for the GLS problem would be where We do not include in expression (11) a sum involving y j ∈ I y because the linking invariant must be determined using only the most reliable information, namely, that from the laboratories in the CIPM KC (including the linking laboratories, of course).By checking the validity of the statistical model for y j with j ∈ I y after determining the linking invariant, we can avoid the discussion of possible outliers in y j with j ∈ I y .
In the solution so provided, h blue would be the linking invariant, as required, and x blue would be a new CIPM KCRV.More information would have been used by the approach in determining this new CIPM KCRV (compared with the data originally used in the CIPM comparison only), which implies that it could be an improved value.However, despite this potential benefit, any change to the KCRV would be inconsistent with the MRA as emphasized in section 1: the results from the RMO comparison must be expressed in terms of the KCRV provided by the CIPM KC.
Accordingly, a solution that preserves the CIPM KCRV x ref can simply be obtained after setting x = x ref in expression (11).By defining h link as the linking invariant for this purpose, the problem simplifies as follows: where because the difference between f(x ref , h) and g(h), namely, the second sum in expression (11) with x = x ref , is independent of h.It should be noted that although x in expression ( 11) is essentially fixed to x ref in this formulation, µ x in model ( 9) is not fixed.Since, using symmetry, After some algebra, we obtain the following expression: where and (also defining P, which is used later) The linking invariant h link can be expressed as a linear combination of the uncertain inputs x i and y i as shown in expressions ( 15) and ( 16).In the uncertainty evaluation of h link , we can hence apply the law of propagation of uncertainty [14,17], which would be exact in these circumstances.

Unilateral DOEs
The standard uncertainty u(d (RMO) j ) associated with the value component d (RMO) j of the unilateral DoE for non-linking laboratory j ∈ I y in the RMO KC is considered.Since the KCRV x ref and the linking invariant h link are independent of the reported value y j , expression (1) and the uncertainty propagation law [14, clause 5.2] give We consider only unilateral DoEs for non-linking laboratories in the RMO KC, since the unilateral DoEs for the linking laboratories have already been evaluated in the CIPM KC.
The variance u 2 (h link ) and covariance u( the derivations of which are given in appendix A. Hence, expressions (18) to (20) give The uncertainty component of the unilateral DoE is where k is the coverage factor equal to 1.96 in this study as a consequence of the assumed normality and the coverage factor of 95 %.The E n score for laboratory j is computed using

Bilateral DOEs
The standard uncertainty u(d (RMO, CIPM) ) associated with the value component d (RMO, CIPM) j,ℓ of the bilateral DoE between the non-linking laboratory j in an RMO KC and laboratory ℓ in a CIPM KC, that is, j ∈ I y and ℓ ∈ I L ∪ I x , is considered.Expression (2) and the uncertainty propagation rule [14, clause 5.2] give The standard uncertainty u(h link ) of the linking invariant is given by expression (19) and the covariance as follows: the derivation of which is given in appendix A.
For the standard uncertainty associated with the value component d (RMO, CIPM) j,ℓ of the bilateral DoE between the nonlinking laboratories j ∈ I y in an RMO KC and the laboratories ℓ ∈ I L ∪ I x in the CIPM KC, the use of expressions ( 6) and (21) yields It should be noted that this computation is valid only when the CIPM KCRV is computed as the WM of the reported values including x ℓ .For a case where x ℓ is not used in the computation, the uncertainty evaluation is given in appendix B. The standard uncertainty u(d (RMO, RMO) j,ℓ ) associated with the value component d (RMO, RMO) j,ℓ of the bilateral DoE between the non-linking laboratories j and ℓ in an RMO KC, that is, for j, ℓ ∈ I y , is discussed.From expression (3), Since no correlation between y j and y ℓ is assumed, the uncertainty propagation rule [14, clause 5.1] yields The uncertainty components of the bilateral DoEs in both cases are where k is as before.The E n scores based on these bilateral DoEs are formed using . (25)

Computation of linking invariant
We use the data reported in APMP.FF-K4 [10], which is an RMO KC in the area of fluid flow.Two artefacts were circulated in this comparison with nominal volumes of 20 l and 100 ml.The data for the artefact with nominal volume of 20 l are employed as the actual example in the present study.
The results of APMP.FF-K4 were linked to those of CCM.FF-K4 [18].Two laboratories participating in the comparisons had been nominated as the linking laboratories.
In CCM.FF-K4, three artefacts with nominal volume 20 l identified as TS 710-04, TS 710-05 and TS 710-06 were circulated.The WM in expression (4) was used only for the data for TS 710-06.For TS 710-04 and TS 710-05, another computational method was employed to avoid influence from possible outliers.The CIPM KCRV was computed as the mean of the three values.Here, only the data for TS 710-06 are used.Therefore, it should be noted that the actual analysis cannot be directly compared to our results.In APMP.FF-K4, only one artefact was circulated, and the data on it are used in our analysis.
The reported values and their associated standard uncertainties in both KCs are shown in table 1 and illustrated in figure 1.For CCM.FF.K4, only the data for TS 710-06 are shown in the table.Laboratories 1 and 2 are the linking laboratories: I L = {1, 2}.The values reported by laboratory 1 for the CCM and the APMP comparisons are correlated due to a common reference standard and the use of the same instrument.A similar statement can be made for laboratory 2. Both correlation coefficients ρ 1 and ρ 2 were assessed as 0.8 in [10].The laboratory identifiers are not identical to those in the final report of those KCs.
Eight laboratories participated in CCM.FF-K4.All reported values were used in the computation of the WM of the data for TS 710-06, so I x = {3, . . ., 8}.The WM for the CIPM KC data and its associated standard uncertainty are computed as where m 0 = 20 000 ml is the nominal volume.
Reported data in (a) CCM.FF-K4 (the CIPM KC) [18] and (b) APMP.M.FF-K4 [10] (the RMO KC), as given in table 1. Vertical bars show the expanded uncertainty for k = 1.96.The legend in (a) also applies to (b).The y-axis ranges are different but the scale intervals are identical.
Thus, using expression ( 16), and Q = 86.3ml −2 , using the right-hand expression (17).Thus, expression (15) yields the linking invariant We can compare this result with those of previous studies.Decker et al [2] suggested a linking invariant by using the statistical method proposed by Kharitonov and Chunovkina [4].
In their method, the linking invariant is obtained through the following computation: Moreover, while the linking invariant is not given in the method proposed by Elster et al [3], it is naturally derived by comparing expression (1) in the present study with expression (27) in their paper.For a matrix Λ with (i, j) element Λ i,j = u(d (CIPM) j − y j ) for i, j ∈ I L , the linking invariant is given as This approach could be interpreted as an extension of the study given by Sutton [5].More is shown in appendix C.2.The differences among these values are insignificant compared to possible statistical errors in this example.The standard uncertainty u(h link ) associated with h link is 0.108 ml using expression (19).However, it is shown that there are differences between the analysis methods.Section 6 will offer a synthetic data set to show that these differences can have a serious influence on the results.

Computation of DOEs
To exemplify the computation of the DoEs, we again apply the linking of APMP.M.FF-K4 with a part of the data in CCM.FF-K4.Specifically, we take the unilateral DoE of laboratory 10 in APMP.M.FF-K4: Using the computations in section 5.1 and The E n score for the performance of laboratory j is computed as  | exceeds unity, implying that laboratory 7 reported an extreme value, an extreme uncertainty or both.The proposed analysis for an RMO KC, however, is unaffected by the existence of possible outliers from non-linking laboratories, because the linking invariant is determined only using the data in a CIPM KC and the data from linking laboratories.
For comparison, we show the results from the other methods mentioned in section 5.1.It is found that almost the same results are reported by the three approaches.In this example, strong correlations are considered between the two values of the linking laboratories, and |d (CIPM) , respectively.These facts result in the small difference between the methods employed as explained in section 6, and a synthetic case where we can find significant differences is shown there.
As an instance of a bilateral DoE, we focus on the relationship between laboratory 10 in APMP.M.FF-K4 and laboratory 4 in CCM.FF-K4.The value counterpart d (RMO,CIPM) with k = 1.96 is also considered.We see an 'unsatisfactory' result in this case.Thus, an 'unsatisfactory' performance in a bilateral DoE can arise even when performances using the unilateral DoEs of the two concerning laboratories are evaluated to be 'satisfactory'.Table 3 shows the bilateral DoEs computed for laboratory 10 in the RMO KC.We find some 'unsatisfactory' performances where |E (RMO,CIPM) 10,ℓ Actions to be taken when such results occur depend on decisions taken Table 3. Assessed bilateral DoEs involving laboratory 10 in the RMO KC based on the data in table 1, where (R,C) ≡ (RMO,CIPM) and (R,R) ≡ (RMO,RMO).

CC.FF-K4
APMP.FF-K4 by the CC Working Group.It should be noted that the data in table 1 is taken as a numerical example, and a different approach was implemented in the actual analyses [10].

Demonstration of the proposed method using synthetic data
No serious difference is found in the linking invariants and unilateral DoEs by the three approaches for APMP.M.FF-K4 as shown in section 5.In general, these approaches may differ appreciably depending on the data.In this section, the difference between methods is shown using a synthetic data set.Consequently, it can be said that an essential difference can arise only when the correlations between two values from the linking laboratories are weak.To explain clearly the reasons for the difference, a deliberately simple synthetic data set is used in which the correlation coefficient of the single linking laboratory is varied.
The synthetic example is given in table 4 and figure 2. It is assumed that five laboratories participate in the CIPM KC.Laboratory 1 is the only linking laboratory.The correlation coefficient is varied from 0 to 1.The assumed CIPM KC data are shown in figure 2 It should be noted that when there is only a single linking laboratory, the linking invariant given by Decker's method [2] and that derived through Elster's method [3] are identical.We cannot find any differences either in the set of assessed DoEs with these two methods.We refer to this identical approach as the reference method in this section.
We focus on only a single non-linking laboratory in an RMO KC for simplicity of discussion.When assessing the unilateral DoE of a laboratory, data from non-linking laboratories other than the concerned laboratory have no effect.Therefore, there is no need to consider multiple non-linking laboratories The linking invariant h link is a function of ρ 1 in our proposal.Figure 2(c) shows the variation of h link .It is found that in the reference method, h link is a constant, because is given independently of the magnitude of ρ 1 . 3We can hence find the difference between these two linking invariants, with the difference at a maximum when ρ 1 = 0 in the interval [0, 1].
Using our proposal when ρ 1 = 0, expression (15) simply gives It is thus found that In the case of using the reference method when ρ 1 = 0, In other words, with our proposal, the amount to be compared with y 2 is given as y 1 .With the reference method, the value y 1 in our proposal is compensated by the difference between x 1 and x ref .This difference is evaluated as insignificant in the CIPM KC in accordance with section 2.3, because 0.65 0.69 = 0.9 < 1.
In our proposal, the difference x 1 − x ref is neglected because no common bias in x 1 and y 1 is suggested through the information ρ 1 = 0.In the statistical testing concept shown in section 2.3, the purpose of the CIPM KC is to validate that the hidden bias is zero.When no significant bias is found for laboratory 1 in the CIPM KC and no correlation suggested, the zero bias may naturally reflect the qualitative conclusion obtained through the CIPM KC.In the reference method, the common bias is suggested even when no correlation is assumed.In other words, the distribution of the quantity for which y 1 is a realization has a mean to be estimated by the insignificant bias.Thus, we can find the conceptual and statistical difference between two methods.This difference in the linking invariant can have practical effect in performance evaluation.For ρ 1 = 0, the proposed method gives The reference method gives | is less than unity for the proposed method and greater than unity for the reference method may have significance in terms of any decisions made.Figure 2(d) shows the variation of E (RMO) 2 as a function of ρ 1 .Not only at ρ 1 = 0 but for ρ 1 ⩽ 0.4, the above discrepancy between the analysis methods arises.
The larger ρ 1 is, the smaller are the differences in the linking invariant and E (RMO) 2 between the two methods.When ρ 1 = 1, the unilateral DoEs given by the two methods are mutually consistent.The reason for the consistency is that expression (15) yields This result means that the compensation given for the reference method is similarly applied for our proposal.The difference from the case of ρ 1 = 0 is that the common bias in x 1 and y 1 is implied through the correlation information provided by the linking laboratory.It can be said that our proposal gives minimal compensations based on the reliability of the uncertainty and correlation information provided by the linking laboratories.In contrast, the reference method may compensate biases as much as possible whenever the data support it.
In both methods, since h link has no uncertainty when ρ 1 = 1, the unilateral DoE is The uncertainty component of the unilateral DoE is determined to be smaller than that for the case of no correlation.In general, since using the correlation information can make the analysis more precise, the correlation information must be specified reliably.
The advantages of the proposed method and the reference method can be summarized as follows: 1.The presently proposed method compensates insignificant biases in a CIPM KC only when the possible biases are implied in the correlation information.2. The reference method compensates insignificant biases in a CIPM KC even when the possible biases are not implied in the given information.
Since the methods have different features, the CC should choose an analysis method in accordance with its intention to implement an RMO KC.When linking laboratories report smaller uncertainties than non-linking laboratories in an RMO or the number of linking laboratories is large, the difference between these two methods can be marginal.

Summary
A statistical testing procedure is developed to analyze an RMO KC with linking to the corresponding CIPM KC.A statistical model in which biases are not considered is employed.This model can be implemented to check whether a bias exists statistically in a reported value.To determine a parameter of the linking invariant, we use the GLS method under the condition that the CIPM KCRV is fixed.
The proposed method generally gives different unilateral and bilateral DoEs from those assessed by other available methods.Compared to the other methods, our proposal has the advantage that insignificant biases found in a CIPM KC are compensated only when the possible biases are implied in the correlation information.In other words, our proposal is based on the reliability of the uncertainty and correlation information given by the linking laboratories.Since decisions are made on the basis of calculated DoEs, a conclusion is that it is valuable to have adequate knowledge of the properties of available linking methods for analyzing the results from an RMO KC.

C.2. Extension of the bias estimating approach to an RMO KC
The model of expression (33) may be extended to the analysis of an RMO KC as follows: with constraint (34).The term ∆ (RMO) i is the bias for laboratory i in the RMO KC, while ∆ i is that for laboratory i in the CIPM KC, which can be a linking laboratory.To handle the linking invariant explicitly, µ x − η is employed in expression (10) instead of µ y in expression (35).As far as we are aware, there has been no publication in which model (35) with constraint (34) is straightforwardly employed.However, there are some studies in which an equivalent model is employed.
Sutton [5] proposed a statistical model with explicit biases.The model is generally given and details are not specifically suggested in his paper.We developed the details in the proposed model on the assumption used in this study, and derive the model that is equivalent to the covariances u(x i , x ref ) and u(y i , x ref ) being given by expression (27).In [5], a constant as large as the CIPM KCRV is defined as K, and the distributions of x i − K and K − x ref are considered rather than x i and x ref (if our understanding is correct).Specifically, K − x ref ∼ N(K − µ x , u 2 (x ref )) is considered.Giving the distributions of x i and x ref is however statistically equivalent to giving those of x i − K and K − x ref .
The analysis with this statistical model can be consequently identical to that with the model with expression (35) with the constraint with expression (34), when the WM is employed as the CIPM KCRV.If we apply model (33) and take the WM as the CIPM KCRV x ref , the distribution of x ref has the population mean of When expression (34) is used as a constraint, the mean is given as µ x .Thus, expression (34) is implicitly applied in this method.
Moreover, the unilateral DoE d (RMO)   i derived as the GLS estimate of ∆ (RMO)   i based on the model of expression (35) and the bilateral DoEs obtained through expressions ( 2) and ( 3) are identical to those suggested by Elster et al [3] when the CIPM KCRV is given as the WM in expression (4).Although those authors developed the procedure in the absence of a specific model, we interpret their study as corresponding to that developed that by Sutton [5] and clarifying some mathematical expressions in it.
It should be noted that through the bias estimation method explained in this appendix, the re-estimation of µ x is implemented, which can be interpreted as the re-determination of the KCRV purposely avoided in our proposal in the main manuscript.

j.
The E n score suggests 'satisfactory' when |E(RMO)

4 = 1 . 2 =
05 ml.The standard uncertainty associated with d (RMO,CIPM) 0.51 ml.For the performance evaluation, the E n score, (a).The CIPM KCRV x ref is given as the WM, −0.65, of x 1 , . . ., x 5 with u(x ref ) = 0.35.The data for the synthetic example in table 4 relate to (a) the CIPM KC and (b) the RMO KC in figure 2. The results obtained are portrayed in figure 2 as (c) the linking invariant h link and (d) the E n score E (RMO) 2 .

Figure 2 . 2 .
Figure 2. Data for the synthetic example in table 4 for (a) the CIPM KC and (b) the RMO KC, and results obtained for (c) the linking invariant h link and (d) the En score E (RMO) 2 .The legends in (a) and (c) apply also to (b) and (d), respectively.
is regarded as the statistical model suggested by laboratory i under the assumption of normality.Based on this statistical model, we investigate the distributions of the d(RMO)

Table 1 .
[18].FF-K4[10]and CCM.FF-K4[18]as an example of linking.The linking laboratories are laboratories 1 and 2. For simplicity of expression, the reported values are offset by the nominal value m 0 = 20 000 ml.

Table 2 .
Unilateral degrees of equivalence for non-linking laboratories and En scores computed using the data in table 1.

Table 2
shows all the computed unilateral DoEs for the data in table 1.Only |E(RMO)7

Table 4 .
Data employed in the simulation in section 6.The only linking laboratory is laboratory 1.