Validation metric of multi-output model based on energy distance

Model validation metrics have been developed to provide a quantitative measure that characterizes the agreement between predictions and observations. The validation metric of multi-output model has the problem of solving the joint probability density fuction (PDF). The traditional method is to reduce the dimension, but this may lead to the loss of some information. To avoid solving the joint PDF, this paper gives a validation metric method of multi-output model based on energy distance. At the same time, it avoids the problem of information loss caused by dimensionality reduction, and greatly reduces the difficulty of calculation. Then using mathematical examples, a set of numerical studies are designed to verify the correctness and stability of the method. Finally, we apply the metric to the Sandia validation challenge problem.


Introduction
Model validation originated from the fields of aviation, aerospace and nuclear industry.It is used to objectively evaluate the reliability of simulation models, thus guiding the development of high-precision prediction models to reduce or even completely replace physical experiments [1].With the application of numerical simulation models in other industrial fields, model validation has attracted much attention in recent years.However, for a long time, the fields of academic and engineering have not reached a complete consensus on the concept of model validation.Up to now, there is no completely unified definition and specification of model validation in China [2].
Model validation metrics is an important step in model validation, and it provides a quantitative measure of agreement between a predictive model and physical observations.In engineering design, it become useful for model selection and credibility assessment.At present, the classification of metrics methods for model validation [3], four main types of metrics, namely classical hypothesis testing, Bayes factor, frequentist's metric, and area metric.The results given by the classical hypothesis testing and Bayes factor methods are rejection or acceptance of the model, while the frequentist's method only considers the characteristic quantity of the mean value of sample, and does not consider other quantities such as the degree of sample dispersion under uncertainty.The traditional area metric is only applicable to one or independent multi-output models, while the multi-output models in engineering are often nonindependent.Li et al. [4] proposed the method, based on probability integral transformation (PIT), to solve the problem of model validation metrics in the case of non-independent multi-output models.Zhao Liang and Yang Zhanping [5] proposed a multi-output validation metric method based on area metric by constructing the covariance matrix related to the experimental observation data, which can quantify the correlation between the output data.Zhao Lufeng and Lyu Zhenzhou et al [6], based on the mathematical characteristics of random variables, constructed the mixed moment index of the multioutput model, with consisting of mathematical expectation matrix and covariance matrix.The advantage of PIT method is that the correlation of multi-outputs is considered during model validation, but this method needs to give the joint PDF of model outputs, which is difficult to obtain accurately when the output dimension is very high.In order to avoid solving the problem of joint PDF, Hu Jiarui and Lyu Zhenzhou [7] combined kernel principal component analysis with area metric to construct a model validation metrics method that is easy to calculate and has high stability, which overcomes the difficulty of solving the joint PDF in the traditional validation metrics method.Additionally, Xiao Zhao et al. [8] proposed a model validation metrics method based on interval variables based on area metric.Recently, the probability box method [9,10,11] has attracted much attention.In view of the difficulty in solving the PDF of multi-outputs, Zhang Baoqiang and Su Guoqiang [12] introduced the mahalanobis distance to model validation of the multi-output model.
This paper proposes a new method of multi-output validation metric based on energy distance.This method not only avoids the problem that the PIT method needs to solve the joint PDF of the model outputs, but also avoids the problem of information loss caused by the introduction of mahalanobis distance to reduce the dimension.

Validation metrics based on energy distance
The energy distance of two random vectors [13] is a weighted distance between their characteristic functions.The characteristic function is the Fourier transform of the PDF, which also contains all the distribution information of random vectors.Therefore, the energy distance can measure the difference of distributions between two random vectors.
Proposition [14]: If the d-dimensional random variablesand are independent with ‖‖ ‖‖ ∞ , and    ,    denote their respective characteristic functions, then their energy distance 2 and  is the complete gamma function, Thus  ,  0 with equality to zero if and only ifand are identically distributed.Based on this feature, the energy distance has been used to provide a characterization of equality of distributions [15,16].
Additionly, there is another significant feature of energy distance, that is, the estimation of energy distance can be easily obtained, and not depend on the distribution form.Let  , ⋯ ,  and  , ⋯ ,  are random samples of  and , then the estimation of the energy distance between and can be expressed as: ̂ The energy distance defined in the above proposition has no upper bound, so we use a standardized energy distance defined by Rizzo and Székelly [17] to confirm and measure the multi-output model, and the formula is 1 with equality to zero if and only if  and  are identically distributed.Here, the norm in the energy distance is the Euclidean distance norm.

Numerical Case Studies
In this section, a set of numerical studies are used to explore the validity and Stability of the new metric.The physical observations in this section are generated using the following formula, where  2.
2, the validity of the validation metrics method is analyzed.For set 1, take the sample size of random input variables of the physical model as 1000, and the sample size of random input variables of the predictive models as 10000.According to the validation metrics formula given in this paper, we can get the validation metrics results of the two predictive models in Table 3.
Table 3. Validation metric results in Set 1.
Model ID Validation metric results 1 0.0091 2 0.0969 For set 2, the interval input variables are considered as uniform distribution within the interval.Take the sample size of random input variables of the physical model as 1000, and the sample size of random input variables of the predictive models as 10000.Based on the validation metrics formula given in this paper, we can get the validation metric results of the two predictive models in Table 4.For the random-interval mixed input variables in set 3, we regard the mean interval of the input variables as uniform distribution.The sample size of the mean of the two input variables is 10.For each mean sample, the random input sample size of the physical model is 1000, and the predictive model is 10000.According to the validation metrics formula given in this paper, the 90% confidence interval of validation metrics of the two predictive models are shown in Table 5.As shown in Table 3, Table 4 and Table 5, the metric value/ confidence interval of Model 1 is smaller than that of Model 2 in three Sets, which is consistent with the expected metric results.According to the validation metrics formula given in this paper, the metric result of model 1 should be 0, but the metric result is not zero at this time.The reason is that the physical observations and predictive model output are obtained by limited input samples.It can be seen that the multi-output model validation metrics based on energy distance can effectively measure the difference between two predictive models in three test sets.

Stability verification
In order to verify the stability of the validation metric given in this paper, repeat the three test sets for 1000 times.
For set 1, we will analyze the impact of the different sample size of random input variables for the validation metric results.Set the sample size of random input variables of all models (physical model, model 1 and model 2) are 10 and 100, respectively, and the PDF of the metric results of the two predictive models is obtained, as shown in Figure 1.For set 2, same analyze as set 1, set the sample size of random input variables of all models (physical model, model 1 and model 2) are 10 and 100, respectively, and the PDF of the metric results of the two predictive models is obtained, as shown in Figure 2.For set 3, same analyze as set 1, the difference is that we set the sample size of random -interval mixed input variables of all models (physical model, model 1 and model 2) are 5 10 and 5 20 (the first number is interval sample size, the second number is the random sample size), respectively, and the PDF of the metric results of the two predictive models is obtained, as shown in Figure 3.It can be seen from Fig. 1-Fig.3, with the increase of the sample size of input variables of all models, as the number of physical observations increase, the PDF of model 1 and model 2 shifts toward the left, the metric results in maximum probability of model 1 is less than model 2, the PDF range of is all narrowed, and the PDF of model 1 is completely on the left side of model 2. This indicates that validation metrics of multi-output model based on energy distance can be used in model selection.

Sandia validation challenge problem
The validation metrics proposed in this paper is applied to the Sandia validation challenge problem [18] which proposed by Sandia National Laboratory of the United States.This problem takes the temperature response of the material in a specific environment as the specific object, and discusses to solve the key problem of model validation.The mathematical model is of the temperature under heating of a device constructed of some material and has the form: where  is temperature, is location within the material,  is time since the onset of heating,  is the initial ambient temperature (25∘ C),  is the heat flux, is the thickness of the material, and  and  are properties of the material with uncertainty.The problem also gives experimental observations of multiple validation locations, which are called the Ensemble (EN) Validation data and the Accreditation (AC) Validation data [5].One of the contents to be answered in studying this challenge is to use the given properties of the material data to describe the uncertainty of parameters  and, and to evaluate the accuracy of the model based on the provided experimental observations.
The challenge problem gives four sets of the model parameters and  corresponding to the EN data.In each set, the material temperature is measured four times from 0 to 1000 s every 100 s, and the material temperature at 0 s is regarded as  .The EN data includes 40 validation locations, and each validation location has 4 experimental observations.In addition, a set of the model parameter and  corresponding to the AC data is also proposed.The material temperature is measured twice every 50 s from 0 to 1000 s.The material temperature at 0 s is regarded as  , andx=0,  /2,, respectively.The EN data includes 60 validation locations, and each validation location has 2 experimental observations.The experimental observations are come form reference [18].
In the reference [19], the uncertainty parameter and  in formula (1) are normal distribution.However, according to the reference [20], there is a correlation between  and , and the modified model can be investigated by regression analysis.
Using the multi-output model validation metrics based on energy distance, we can measuring the agreement between the original/modified model and the experimental observations.The original/modified model generates 50 model predictions at each of the 40 validation locations and repeats 1000 times to obtain a 95% confidence interval for the validation metrics.The metric results are listed in Table 6.6, validation metrics results of the revised model are improved, compared with the original model, in both EN and AC.This is consistent with the literature research.

Conclusion
The multi-output model validation metrics based on energy distance, which avoids the problem of solving the multi-output joint PDF, greatly reduces the difficulty of calculation, and avoids the information loss that may be caused by the traditional dimensionality reduction method.The correctness and stability of the method is verified by mathematical examples, and then we apply the metric to the Sandia validation challenge problem, which further proves the feasibility of this method.It should be pointed out that this method is also applicable to the validation metrics of a single output model.

Figure 1 .
Figure 1.The PDF of the metric results of the two predictive models in set 1.

Figure 2 .
Figure 2. The PDF of the metric results of the two predictive models in set 2.

Figure 3 .
Figure 3.The PDF of the metric results of the two predictive models in set 3.

Table 1 .
Two predictive model formulas.The distribution form of the input variables of physical model and predictive models are in Table2.

Table 2 .
Distribution form of input variables.It can be seen from Table1that Model 1 is the correct model and Model 2 is the incorrect model, so the expected result of model validation metrics should be that model 1 is smaller than model 2. According to the three test set in Table

Table 4 .
Validation metric results in Set 2.

Table 5 .
Validation metric results in Set 3.

Table 6 .
Comparison of validation metric results before and after model correction.