Constructions of statistical estimates of digital measurement error in the case of small samples

The paper presents a computational-experimental method for the construction of statistical estimates of the error of digital measurements carried out on complex technical objects with metrological support. The algorithm of construction of the estimation of the distribution density function of random error of measurements on the basis of complex application of methods of the theory of characteristic functions, the theory of Lie operator series and methods of statistical modeling in the processing of a limited amount of statistical information in the conditions of small samples is described. The solution of the practical problem on the construction of the estimation of the measurement error distribution density and the problem on the zero mark of the measuring instrument in the probabilistic formulation is given. The presented computational and experimental method can be used to construct statistical estimates for the probability distribution density of the general type, including those given by implicit functions, characteristic functions and operator series.


Introduction
At the current stage of economic development there is a need to manage projects for the creation of high-tech construction objects, construction materials and products [1][2][3][4][5][6][7][8][9][10][11], as well as projects for the creation of automated production facilities equipped with modern high-precision digital measuring instruments and control and measuring devices.At the same time, the share of digital and intelligent measurements is steadily increasing.Therefore, the development of methods for processing and interpretation of digital measurement results on the basis of statistical processing of sample data, including in the conditions of small samples, seems to be an urgent task.
In modern metrological practice [12][13][14][15][16], various distribution laws, including uniform distribution, are used to describe random measurement errors.The uniform distribution has: errors of observation results, rounded to the nearest side of the counts with inaccuracy of the whole (or fractional) division of the scale; error of approximate calculations with rounding to the nearest significant digit; adjustment errors within the permissible limits; backlash errors; variations in the readings of measuring instruments.
Currently, there is a need to develop specific methods and corresponding analytical apparatus, the main goal of which should be to ensure the most efficient processing and interpretation of a limited amount of statistical information [17][18][19][20][21][22][23], including in conditions of small samples [19][20][21][22][23].
The object of the study is complex technical systems with metrological support.The subject of the study are methods of statistical processing of measurement information.The aim of the study is to develop a computational and experimental method of building statistical estimates of the error of digital measurements under conditions of small samples.To achieve the goal of the research the methods of the theory of characteristic functions, Lie series (a kind of operator series) and statistical modeling are used.The novelty of the results presented in the article consists in the complex application of these methods for the problems of metrological support of complex technical systems.Practical and theoretical significance consists in solving the problem of the zero mark of the measuring instrument (about the sliding/absence of sliding of the grouping center) in a probabilistic formulation.

Methods
We will describe the main provisions of the Method of Selecting the Type of Distribution Law (MSTDL), the Method of Characteristic Functions (MCF), the main provisions of the method of operator series (MOS) and the main provisions of the Method of Statistical Modeling (MSM), which formed the basis for the development of the computational-experimental method of statistical processing of small samples proposed in the article.

MSTDL
Let there are N measurements -a sample.It is necessary to find an estimate * () Fxof the distribution function () Fx.This is a simplified formulation of the problem, since no requirements are made to the properties of the estimation.The problem will be considered solved if it is possible to find an estimate * () fx of the distribution density () fx. Figure 1 shows the area of the plane 12 ( , )  [20], where 1  is the square of the asymmetry coefficient,

2
 is the excess coefficient, which are defined as follows: where k  is the central moment of the random variable of k order, k = 2,3,4.The considered area is divided into subareas, each of which corresponds to a certain class of distributions.When choosing a model in this case, it is suggested to use the significant difference between different classes of distributions in terms of asymmetries form and sharp-peak form.
For example, for a normal distribution: 1 0  =  = , for an exponential distribution: 1 4  = , 2 9  = .Therefore, each of these distributions are displayed on the plane by a single point.Other distributions, for example, Student distribution, log-normal distribution, -distribution, correspond to different curves on the figure 1. Third distributions correspond to whole subareas.
To select a model using this approach, it is necessary to calculate the estimates coefficients 1  and 2  and then to find the point corresponding to the obtained estimates in figure 1.It should be noted that for large samples the calculation of estimates is not difficult.But for small samples, there may be some difficulties [20].

MCF
Let us first consider the most commonly used statistic, the sample mean x .Let the numbers 12 , ,..., x form a sample from a uniform distribution with density: here a and r are constant values.It is required to determine the characteristic function and the distribution law of the sample mean x .The characteristic function of the uniform distribution (1) and the distribution of the sample mean have the form [19]: Using ( 2) and the inversion formula for the Fourier transform [19], we represent the distribution density of the sample mean as: Note that at the sample size n →the statistics x is asymptotically normal.When the sample size n is small, it is not possible to represent (3) as an analytical dependence function () fx.It should be noted that characteristic functions give a simple and powerful method of finding the marginal distribution functions of mean values and sample sums, but in the real cases of small samples, except for some special cases, the calculation of an integral of type (3) is difficult.
In general, the method of distribution function inversion based on formula (4) can be formulated as follows.
Let the distribution function () Fx represent the values of a random variable 12 ( ,..., ) x with probability P and take values () . Let as order the values i x in ascending order () i Fx : ( where the distribution density function at a point 0 xx = is not equal to zero.Then, based on (4), it can be shown that the function is the inverse of the distribution function () Fx (quantile function).The condition for the existence of expression ( 5) is the analyticity of the function () Fxand the existence of its derivative not equal to zero 0 ( ) 0 Fx   .A necessary condition for the legitimacy of using the inverse ( 5) is the convergence of the series.The convergence is proved using Dalembert's sign.
Let us return to the problem of finding the distribution of the sample mean (3).At 0 xa = , write down the first three coefficients for the terms of the series (5): .
The determination of the subsequent terms does not cause fundamental difficulties, since it is reduced at the chosen value of the reference value 0 xa = to the calculation of tabular integrals of the form [24]: As a result, the quantile of the distribution of the sample mean from a uniform population can be represented as: Using the last relation, we can determine the confidence region for the estimation of the parameter Differentiating (6) by gives: Note that with a different sample size than the one considered in the example, the numerical values of the coefficients will be slightly different.
From ( 6) and (7) .(8) Recall that it turned out to be impossible to construct an estimate of the sampling mean distribution using the characteristic functions (3).Let us now study the problem of constructing an estimate of the grouping center of a random variable (the problem of the zero mark of a measuring instrument).Let a sample of volume n : 12 , ,..., ,...
x be extracted from a uniform general population and ordered in ascending order of magnitude k x : (1) . It is required to find a quantile function for the grouping center and, at the significance level, to draw a conclusion about "slippage" or "no slippage" of the grouping center.
The distributions of the extreme members of the variational series x and x have the form: ) , where is the distribution of a uniformly distributed random variable X .Without violating the generality of reasoning, we can further assume the coordinate of the grouping center to be equal to zero 0 a = .Then the statistic Let us represent the quantile of the distribution of statistics using operator series as follows: Note that in this case the beta function n n (11) can be defined using the relations: .
Similarly, we find: etc. Hence the critical region [16,20] for statistics z at the significance level  is defined as: The conclusion about the slippage (lack of slippage) of the grouping center is made based on the application of standard statistical hypothesis theory [16,20].x is usually carried out with the help of statistical criteria and statistics of Pearson or Kolmogorov [17,22].Obviously, in conditions of small samples it is not possible to use the marginal distributions of these statistics.On the other hand, for small samples it is often possible to form such statistics, which will depend only on standard random variables, and will not depend on the distribution parameters of the general population.It seems reasonable to determine the distribution function of such statistics either as a result of statistical modeling, or to construct analytically with the help of characteristic functions, if this is possible in a particular problem.

Results
Suppose that three numbers ,,    -random variables uniformly distributed in the interval (0,1).
Let us introduce the statistic ( ) ( ) ,, x x x is a sequence of measured values (variation series) ordered by value: x x x  .The statistic , does not depend on the distribution parameters of the general population, but is determined by a specific sample.Analytical definition of the distribution function of this statistic is difficult due to the mutual dependence of the numerator and denominator.Therefore, we apply the theory of statistical modeling.
The Monte Carlo method was used to calculate the probability values   () , which are presented in Table 1.

Discussion
The presented computational-experimental method can be applied to the construction of statistical estimates for other types of distribution density laws (normal, exponential), as well as distributions of general type, including those defined by implicit functions, characteristic functions and operator series.In this case, it is necessary that the sample volume should be at least one unit larger than the number of parameters of the distribution function under study.

Conclusion
A computational-experimental method of estimating parameters of complex technical systems with metrological support is developed.The results of modeling of the practical problem of constructing the estimation of the density function of digital measurement error distribution in the case of uniform distribution are presented.The solution of the problem of zero mark departure of the measuring instrument (about crawling/non-crawling of grouping center) in a probabilistic formulation is presented.

2. 3 .
MOS The function inverse of a single-valued analytic function () YX  = , in the neighborhood of the point 00 () yx  = , where the function () x  has at point 0 x a non-zero derivative, can be represented by a Lie operator series: the grouping center has a density distribution: are obtained as a result of tests.It is required to test the hypothesis that the measured quantities are uniformly distributed at the significance level  .Independent uniformly distributed random variables general density of distribution (1) can be represented in the form:

Figure 2 F
 and the distribution function, which allow us to conclude that the procedure of constructing the distribution function by Carlo method is statistically stable.

Figure 2 .
Figure 2. Assessment of statistical stability of the statistical modeling method.
, an estimate of the uniform distribution density of .can be found ˆ()