The least informative distribution and correlation coefficient of measurement results

Correlations play a significant role in data analysis and the evaluation and expression of the uncertainty, yet estimating them is often difficult. This paper provides examples of how to infer the measurand value, given only the uncertainties and correlation ranges of the measurement results. The least informative data-distribution is not Gaussian, but the marginal distributions are. Explicit results are given in the case of a data pair, where the inferred correlation coefficient is the midpoint of the given range.


Introduction
Correlations play an important role in data analysis and evaluating and expressing the uncertainty [1][2][3][4][5][6][7]. For instance, when averaging correlated data with unequal uncertainties, contrary to what might be expected, both large positive and negative correlations reduce the uncertainty [8]. Moreover, when the correlation coefficient tends to one, the weighted mean lies outside the data interval and the associated uncertainty tends to zero.
Estimating the correlation coefficients of literature data can be difficult or impossible [4,5]. The authors do not always give them, nor the information needed for an estimate. When data are subject to accounting identities, which express a datum as the sum of the other, or they have specified marginals correlations considered in [9,10].
Here, we consider the evaluation of the least-informative distribution and correlation coefficient by considering a data * Author to whom any correspondence should be addressed.
Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
pair, where no more information is available than the standard uncertainties and the range of the correlation coefficient. We determine the sought distribution by ensuring that, subject to any contextual information, it is minimally informative. Only in this way can we be sure that the distribution and correlation coefficient take all the information available into account, but no uncontrolled assumptions have been introduced.
In the discrete case, the maximum entropy principle, which minimizes the Shannon information encoded in a distribution, solves the problem. For continuous distributions the Shannon information is meaningless. In this case, we minimize the Kullback-Liebler divergence, which measures the difference between the sought distribution and a distribution which is assumed to encode the absence of testable information [11,12]. A concise introduction to classical (and quantum) information theory is in [13].
In section 2, we set the problem for two variables and derive the distribution and correlation coefficient consistent with the contextual information. The following section gives the measurand posterior-distribution. Next, in sections 4 and 5, we present the posterior inference of the measurand and its standard uncertainty and discuss from where constraints on the correlation arise. The general case of more than two variables is considered in section 6. Section 7 provides some application examples.

Problem statement
Let x 1 and x 2 be two measurement results having variance σ 2 1 and σ 2 2 , but unknown correlation coefficient ρ. The interval of the possible ρ values might be known to be smaller than the default [−1, 1] one, as will be illustrated in section 5 and examples. Since shifted and scaled variables have the same ρ value, we can assume a zero mean and unit variance without loss of generality. The challenge is to find the least informative sampling distribution and ρ value consistent with ρ ∈ [ρ 1 , ρ 2 ].

Solution
Since, by marginalization, the sought sampling distribution of the data is the problem is to find the joint distribution of the data and correlation coefficient, can be written in terms of conditional distributions as where L(x 1 , x 2 |ρ) is the data likelihood (given the mean, variance, and correlation coefficient) and π(ρ) is the coefficient distribution, given ρ ∈ [ρ 1 , ρ 2 ].
The (x 1 , x 2 ) distribution having the minimum Kullback-Liebler divergence from the uniform one and zero mean, unit variance, and specified ρ value, is binormal. The distribution of ρ ∈ [ρ 1 , ρ 2 ], where if(.) is one if its argument is true and zero otherwise, follows trivially from minimizing the divergence from a uniform distribution. Eventually, by using (2)-(4) in (1), the sought distribution is Figure 1 shows its contour plot when ρ 1 = 0 and ρ 2 = 1. The least-informative correlation coefficient of the data, is obtained from (2) by carrying out the relevant integrations. Also, the marginal distributions of the data are normal, with zero mean and unit variance. The equation (6) can be generalized to the case where more detailed measurable information on ρ is available and, consequently, π(ρ) is any distribution. By proceeding as before, we still have x 1 = x 2 = 0, σ 2 1 = σ 2 2 = 1. Furthermore, ρ 12 = ρ π , where the average . π is taken with respect to π(ρ).

Posterior distribution of the measurand
Let us consider a data pair having the same mean, the measurand μ, and variance-covariance matrix where 0 a 2 1 is the variances' ratio. Without loss of generality, we can set x 1 = 0 and x 2 = 1, which can be done by shifting and rescaling the data. Therefore, σ 1 = aσ 2 and σ 2 are the fractional standard uncertainties of the most and least accurate datum, respectively, relative to their difference. Also, if σ 2 1, the data are inconsistent.

Fixed correlation
If ρ is known, the least-informative distribution and likelihood of the reduced data x 1 = 0 and x 2 = 1 is To take informed decisions, we need the posterior distribution of the measurand values. By mapping the prior information on μ into a uniform distribution, as its support tends to the reals, the sought posterior converges to a normal one. Hence,

Unknown correlation
If ρ ∈ [ρ 1 , ρ 2 ], by the same reasoning that delivered us to (3), the least-informative distribution and likelihood of the reduced data x 1 = 0 and x 2 = 1 is If we give to the μ values a sequence uniform prior distribution having increasingly large supports, the posterior one converges to μ ∼ L(x|μ)/Z, where the normalization constant Z is given in the appendix. Figure 2 shows the posterior distribution of the measurand for increasing uncertainty ratio, when σ 2 = 1, ρ ∈ [-1, 1] (left), ρ ∈ [0, 1] (center), and ρ ∈ [-1, 0] (right). If the data have different uncertainties, both positive and negative correlations shrink the mean to the most accurate datum, while reducing the uncertainty. Also, unless ρ < 0, the posterior mean approaches the most accurate datum from the outside of the data range.

Known correlation
If ρ is known in advance, the posterior mean, and variance, of the measurand equal the maximum likelihood estimates [8].

Unknown correlation
When ρ is unknown, the analytic expressions of the posterior mean and variance of the measurand (given in the appendix) are too complex to give useful insights. They differ from (10) and (11) and from the maximum likelihood estimates. This difference is not surprising. The maximum likelihood estimates are the optimal compressions of the measurement results; the posterior mean and variance are the updated expectation and uncertainty of the measurand. We give some asymptotic behavior that might help to check the correctness of (A.2) and (A.3). If the uncertainty ratio tends to one, the posterior mean is the data midpoint. If it tends to zero, the posterior mean, approaches the most accurate datum from above or below according to the midpoint of the interval of the possible ρ values. The posterior variance approaches It is worth noting that when ρ 1 = ρ 2 = ρ, (12) and (13) are the same limits as per (10) and (11). Eventually, if σ 2 1 the posterior mean is which, when a → 0, repeats (12).  (10) and (11) ones, when σ 2 = 1. Positive correlations bias the mean toward the most accurate datum more than the negative ones. Unless ρ ∈ [-1, 0], (A.2) shifts more quickly than (10). This occurs because-as shown

Correlation range
Correlations arise because, for example, the same standards or input data are used. Other sources are noise transients in timeseries data and constraints, for instance, to match an aggregate datum.
To see how contextual information constrains the ρ range, without loss of generality, let us suppose that the data model is where η 1,2,0 are zero-mean uncorrelated errors having variances u 2 1 , u 2 2 and u 2 0 , respectively and η 0 and h take the uncertainty of the common references and input data (or the noise memory) and the (possibly) different sensitivities into account.
If h = 1, (14) describe data corrected for the same quantity, where the quantity magnitude is uncertain. If h = 1, the data are corrected for different values of the same quantity, where their responsiveness is uncertain. For example, different thermal expansions, where the expansion coefficient is uncertain. In this case, u 2 0 is the variance of the expansion coefficient and h the ratio of the differences between the measurement temperatures and a reference one.
This inequality defines the range of the possible ρ values when there is no other information than the total standard deviations σ 1,2 . The data can be maximally correlated only if |h|a = 1. If h = 1, the maximum ρ value is equal to the uncertainty ratio. If a = 1, the magnitude of the sensitivity ratio or its inverse bound the correlation.

Extension to multiple data
To extend our results to more than two data, it is necessary to take the positivity of the covariance matrix into account. A criterion, named after James Joseph Sylvester [15], to test the matrix positivity is that all its principal minors are positive. For instance, in the case of three data, the first two principal minors are always positive and the Sylvester's criterion reduces to 1 − ρ 2 12 − ρ 3 23 − ρ 2 13 + 2ρ 12 ρ 23 ρ 13 > 0. Figure 4 shows the threedimensional region in which this inequality is true.
By assuming that the results were corrected for the same systematic effects, where η i,0 are zero-mean uncorrelated errors having variances u 2 i,0 . In (17), η 0 and h i take the common contributions to the error budgets and the different x i 's sensitivities to them into account. For the sake of simplicity, we consider only the case ∀i, h i > 0 and assume that every common contribution to the error budgets is not greater than the total standard uncertainty of the most accurate datum, say σ 1 . Hence, when there is no other information than the total standard deviations σ i , the ranges of the ρ i j values are constrained by By proceeding as in section 3.2, the posterior distribution of the μ value is where x = (x 1 − μ, x 2 − μ, . . . x n − μ) T and Σ 0 limits the integration region to where Σ is positive definite (for an

Example 1
The two repeated measurements of the Avogadro constant, N A = 6.022 140 99(18) × 10 23 mol −1 and N A = 6.022 140 76(12) × 10 23 mol −1 , given in [16] are correlated by a number of (nearly) equal corrections, made for the effect of the same influence quantities. To make these N A values usable for the estimation of a self-consistent set of values of the constants of physics by the Committee on Data for Science and Technology (CODATA), one of the authors estimated their correlation coefficient as equal to 0.17 [4].
Taking this value into account, the posterior inference and maximum likelihood estimate of the Avogadro constant are both equal to [4] N A = 6.022 140 82(11) × 10 23 mol −1 .
If the analyst does not know the ρ value-by assuming that the data model is (14), where h = 1-is still possible to infer the N A values as follows. Since the uncertainty ratio is a = 2/3, according to (16), the ρ value must be in the [0, 2/3] interval and its least informative value is ρ = 1/3, not so far from the estimated 0.
This value is in excellent agreement with (20), which required a detailed analysis of each contribution to the total uncertainties of the measured values. Reduced results x i (first row and column), uncertainty ratios a i = σ i /σ 4 (diagonal), and correlation coefficients (upper triangle, blue). The lower triangle (red) gives the correlation upper bounds estimated via the model (18). The largest reduced standard-uncertainty is σ 4 = 0.424. Adapted from [6].

Example 2
Since 2011, the international Avogadro coordination determined N A by counting the atoms in the same 28 Si-enriched monocrystals.
Reference [6] provides guidance on how these results must be used to infer an updated value. Table 1 shows the reduced data, sorted according to ascending uncertainties, the uncertainty ratios a i = σ i /σ 4 1, and the correlation coefficients (typed blue). N 4 and N 1 are the least and most accurate results, respectively, and Taking these data into account, the posterior inference and maximum likelihood estimate are both equal to μ 0 = x ML = 0.15 (15), or N A = 6.022 140 588(65) × 10 23 mol −1 [6]. If the correlations were set to zero, because unknown, μ 0 = x ML = 0.25 (13), or N A = 6.022 140 631(54) × 10 23 mol −1 , are significantly different.
If the correlations were unknown, by making the simplest assumption that the results were equally corrected for the same systematic effects, the data model is the same as (17). The upper bounds to the correlation coefficients given by (18) are shown in table 1 (typed red). Two of them are slightly inconsistent. This inconsistency might originate from having neglected the differences between the systematic-correction variances.
The posterior mean and standard uncertainty have been numerically evaluated via (19) and the relevant integrations. They are μ 0 = 0.17 (15), or N A = 6.022 140 598(64) × 10 23 mol −1 . Not so dissimilar from the inference using the maximally informative data.

Example 3
As an example from the interlaboratory comparisons, which produce a consensus value for the common measurand that averages the measurement results of the participants, we considered a bilateral comparison of stainless steel mass standards [17], where the participants supplied an estimate of the correlation coefficients.
Reference [17] and table 2 provide the differences between the true and nominal masses measured by the pilot (before and after the circulation) and participating laboratory, as well Weighing results for the 500 g mass standard as differences (expressed in mg) from the nominal mass value (first row and column), associated standard uncertainties (diagonal), and correlation coefficients (upper triangle, blue). The lower triangle (red) gives the correlation upper bounds estimated via the model (18). Adapted from [17].
as the associated standard uncertainties, σ i , and correlation coefficients (typed blue). The upper bounds to the correlation coefficients, calculated by application of (18), are typed red.
The posterior mean and standard uncertainty, −0.234(37) mg, have been numerically evaluated via (19) and the relevant integrations. It is worth noting that, in this case, the domain 0 < ρ i j < σ 2 1 /(σ i σ j ) is not fully included in Σ 0. The maximum likelihood estimate and the associated standard uncertainty given in [17] are −0.234(41) mg.

Example 4
As a last example, we consider the results of the frequency measurements of the unperturbed transition 1 S 0 -3 P 0 in the 171 Yb atom [18,19], where the traceability to the international system was provided by a link to the international atomic time (TAI) [20]. The authors of [18,19] calculated an uncertainty contribution from the primary frequency standards that contributed to TAI during the measurements of 0.07 Hz and 0.06 Hz, respectively. The systematic contributions to the uncertainty of each TAI's standard are correlated, though different averages might have been used in the two measurements. The data published by the International Bureau of Weights and Measures allowed the estimate of the ρ value. We calculated ρ = 0.27, after a careful evaluation of the contributions of the relevant primary and secondary standards. Taking this value into account, the posterior inference and maximum likelihood estimate of the 1 S 0 -3 P 0 transition-frequency are ν Yb = 518295836590863.671(94) Hz.

Conclusions
Correlation coefficients give the appropriate weights in analyzing the measurement results and expressing the measurand uncertainty. However, the information reported in the literature is often insufficient to estimate them, and additional inputs would be necessary. Therefore, we provided examples of the inference of the measurand value, when only the total uncertainties and correlation limits are associated with the measurement results.
To do this, we determined the least informative sampling distribution by minimizing the Kullback-Liebler divergence relative to a uniform distribution. In the case of a data pair, the inferred correlation coefficient is the midpoint of the possible values. In particular, the correlation coefficient inferred from the absence of constraints is zero. However, the data distribution is not Gaussian, though its marginals are.
Our results apply when common contribution(s) are included in the error budgets, but their amounts are unknown. Obviously, our results do not apply to correlation of dark uncertainties [21].
The posterior variance is [14]