Ellipsoidal nested sampling, expression of the model uncertainty and measurement

The measurand value, the conclusions, and the decisions inferred from measurements may depend on the models used to explain and to analyze the results. In this paper, the problems of identifying the most appropriate model and of assessing the model contribution to the uncertainty are formulated and solved in terms of Bayesian model selection and model averaging. As computational cost of this approach increases with the dimensionality of the problem, a numerical strategy, based on multimodal ellipsoidal nested sampling, to integrate over the nuisance parameters and to compute the measurand post-data distribution is outlined. In order to illustrate the numerical strategy, by use of MATHEMATICA an elementary example concerning a bimodal, two-dimensional distribution has also been studied.


Introduction
Use of Bayesian methods as ways to incorporate the model uncertainty into the data analysis and the uncertainty budget is often computationally expensive, but advances in computing technology have allowed to take these methods into account. Consequently, the choice between two or more hypotheses, for example when we have to the fit the shape of an experimental curve, can be made by considering different models, each indexed by one or more parameter, where the Bayesian model selection and model averaging provides the probabilistic framework to simultaneously treat both the model and data uncertainties. Let x the list measurement data, (A, B, . . . , M, . . .) the list of different possible models, θ the measurand, and (θ A , θ B , . . . , θ M . . .) the relevant nuisance parameters. The joint distribution of the data, measurand, parameters, and model can be written in terms of conditional probabilities as where we have introduced the likelihood L(θ, θ M , M|x) = P (x|θ, θ M , M), the conditional prior probability distribution of the measurand and parameters π(θ, θ M |M ), and the prior probability of the model Π(M ). By conditioning the left hand side of Eq. (1) on the data and the model, and applying the chain rule, we can write (1) as where we introduced the so-called "evidence" Z(x|M ) = P (x|M ). It is the normalizing factor of the numerator in (2), i.e. it satisfies the relationship By applying the Bayes theorem to P (x, M) we obtain the discrete probability Prob(M |x) for the model M provided by the data which, by use of the law of total probability, can be rewritten in terms of evidence Z(x|M ) of each model as Eq. (5) gives the probability of a model given the observed data, and a simple way to select a model is to choose the most probable. However, when the relevant probabilities are approximately equal and no single model stands out, it is necessary to average over the considered models.

Multimodal, ellipsoidal, nested sampling
Since the evaluation of evidence Z in (3) becomes impractical when the integration space has 20 ÷ 30 dimensions, and, as in the most practical case, the likelihood is different from zero in a small fraction of the integration domain, we have investigated a nested sampling technique relating the likelihood values to the prior volume [1,2]. Firstly, p likelihood samples L θ 1 , . . . , L θp are sampled in Θ ⊕ Θ M according to π(θ, θ M |M ). Next, the smallest, indicated as L 1 , is removed and replaced by a new sample, L new , subject to the constraint L new > L 1 . The prior volume enclosed by the iso-likelihood surface L = L 1 is estimated as p/(p + 1), where p/(p + 1) is the mean value of the largest of p uniform samples in [0, 1], the total volume of Θ ⊕ Θ M being normalised to 1. The discharge of the lowest likelihood L n , sampling of a replacement constrained to L new > L n , and shrinking of the prior volume of the associated iso-likelihood surface to V n = p n /(p + 1) n are repeated until some stopping criterion is satisfied, for example when the contribution to (3) of the surviving likelihood samples, i.e. L max p n /(p + 1) n , where L max is the maximum sample, is less than some pre-defined value.
By using the sequence of the discarded likelihoods 0 < L 1 < L 2 ... < L N , and the differences The challenge in implementing (6) is sampling within the iso-likelihood surfaces L > L n . As the Monte Carlo Markov Chain algorithm may be inefficient, improvements have been proposed in [3]- [6], which we are presently investigating for application to metrology. Ellipsoidal sampling [3] replaces the iso-likelihood surface L = L n by a hyper-ellipsoid given by the covariance matrix of the living samples and centered in their mean value, and the L new > L n is sampled within the intersection of the domain of integration and this hyper-ellipsoid [7].
When the integrand in (3) is a multimodal function, the hyper-ellipsoid will be often bonded to the maxima of the integrand and the sampling will result in an unacceptable decrease in the acceptance rate of L new . In this case, a possible strategy is partitioning the set of all living points in clusters, and then enclose each cluster in a "small" hyper-ellipsoid, centered in the mean value of the cluster points and defined by the covariance matrix calculated with the cluster points.
In [4] is described a method to avoid hyper-ellipsoids to overlap, but we will consider a method to manage the overlapping according to [5,6,8]. At each step of the method the set of all living points is partitioned in clusters. If the number of hyper-ellipsoids is greater than 1, a single hyper-ellipsoid is sampled with probability proportional to its volume; then a L new > L n is sampled and accepted with probability 1/(number of hyper-ellipsoids containing L new ). If the point is rejected, a new sampling of a hyper-ellipsoid takes place.
The bimodal function f (x, y), which is centered in (x, y) = (35, 35) and (x, y) = (40, 40), plays the role of a likelihood; 1/V D is the prior distribution and UnitBox(x) is by definition equal to 1 for |x| 1/2 and 0 elsewhere. Our goal is to reobtain the value of the analytical calculation, that is I = 180. Fig. 2 shows an ellipsoid (red dashed curve) calculated after sampling uniformly 200 points from the domain of integration. The external ellipsoid, red continuous curve corresponding to h k 1 , where h = 1.1, enlarge the sampling volume and is useful to better study the contour lines (iso-likelihood surfaces) of the integrand. It is also shown the minimum point of the integrand (green large disk). Strictly speaking, the integrand  should be equal to zero outside the support of f (x, y), but in order to implement the nested sampling it is necessary to add to f (x, y) a little appropriate function l(x, y). In our example with function (8), after 1881 steps, the set of the 200 living points is totally shrunk in the maximum points of the integrand f (x, y), see Figs. 3; moreover, a file containing a list of 1881 "minimal likelihood" values L n is available to calculate the evidence Z according to Eq. (6). With the trapezoid rule, the final result is integral= 148.7 with uncertainty= 14.3. The uncertainty has been assessed by use of one of the equivalent formulae suggested in [9]. The set of the lowest likelihood points is shown in Fig. 4 and the list of the number of clusters at each step is shown in Fig. 5 A realization of 100 independent calculations of the integral with 200 points, that is 100   different sets of 200 initial points, has generated the histogram in Fig. 6 having a mean value = 178.1 and a standard deviation of the mean = 1.5, the trapezoid rule having been exploited again. Obviously, the mean value of the histogram is very close to the analytical calculation, while the value obtained by a single calculation can be very different from the real value.

Conclusion
The role of the Bayesian approach to the expression of the model uncertainty in measurements consists of the quantification of both the sampling and model contributions to the post-data uncertainty. The multimodal nested sampling has been fruitful in astrophysics, but at authors knowledge has not yet been exploited in metrology. In any case, to develop an algorithm and the relevant software to tackle numerical integration of multimodal functions over high-dimensional spaces can be fruitful from a more general point of view.