COVRATIO statistic for replicated linear functional relationship model

Replicated linear functional relationship model (LFRM) can be categorized under the errors-in-variables model where variables involved are measured with error. However, the presence of outliers in dataset significantly impacts the parameter estimation. We extend the use of the COVRATIO statistic which has been successfully used in unreplicated LFRM for detecting the outliers. A simulation study is used to obtain the cut-off point at 10% upper percentiles. An illustration of this procedure is presented for its potential in a real data set. The procedure successfully identifies the outlier present in the data set.


Introduction
A linear functional relationship model (LFRM) is one of the branch under the errors-in-variables model including structural relationship model and ultrastructural relationship model. Furthermore, linear functional relationship model can be divided into unreplicated and replicated LFRM with certain recommendations [1]. In a replicated linear functional relationship model, the assumption on the ratio of error variances is no longer needed unlike the unreplicated LFRM [2]. Over the years, many authors have been discussed about parameter estimation for replicated model [3][4][5][6][7].
In replicated LFRM, there exists a linear relationship between and as follows: where is the intercept and is the slope parameters respectively. In this model, both and variables are subject to errors and respectively. The equation can be written as: = + and = + ; = 1,2, ⋯ , and = 1,2, ⋯ , (2) The errors terms are distributed with Normal distribution of ~(0, 2 ) and ~(0, 2 ). In this case, we assumed the size of elements in each group are the same, i.e. = .
However, when a single outlier exists in the dataset, the parameter estimation will be affected. Furthermore, the observation can be considered as an outlier if it does not follow any pattern with the remainder of the other observations. In the errors-in-variable model, the issue of outliers has received considerable critical attention that need to be investigated for example, when the observations in an experiment are incorrectly recorded or data set is mistakenly entered in the computer [8][9][10]. In unreplicated LFRM, the outlier detection has been established using COVRATIO statistic [11]. However, outlier detection in replicated LFRM has not been explored. Therefore, in this study, we proposed a method in detecting a single outlier using COVRATIO statistic by modifying the method that has been developed for unreplicated LFRM. Section 2 describes the maximum likelihood estimation method of replicated LFRM while in Section 3 describes the outlier detection using COVRATIO statistic. The determination of cut-off point is pointed out in Section 4. Results and Discussion in Section 5. Next, we illustrate the proposed method using a real dataset in Section 6. Lastly, the conclusion is stated in Section 7.

Maximum Likelihood Estimation of replicated linear functional relationship model
The estimation of parameters in replicated LFRM can be obtained by using the Maximum Likelihood Estimation (MLE) method which involves an iterative technique [2,12]. The log-likelihood function can be shown as log ( , , 2 , 2 , 1 , … , ) = − There are ( + 4) parameters to be estimated and can be obtained by differentiating the loglikelihood function as given in equation (1) with respect to ̂, ,̂2,̂2 and ̂. Thus the estimated parameters are as follows: The parameters can be solved iteratively by using unreplicated LFRM as a starting point. The estimated parameters for unreplicated LFRM are the ̂,̂, ̂2 and ̂2 =̂2 where the ratio of error variances, λ = 1 or ̂2 = λ̂2 to start the iteration [11].

Outlier detection using COVRATIO statistic
COVRATIO statistic has been introduced in a linear regression model to identify influential observations or outliers [13]. Since then, many authors have been used COVRATIO statistic for detecting outliers because this procedure is simple, widely used and had been well established in errorsin-variable models [14][15][16][17]. By following the steps suggested in unreplicated LFRM, the COVRATIO statistic was slightly changed to accommodate replicated LFRM [11]. In replicated linear functional relationship model, the ratio of covariance statistic is suggested by where | | is the determinant of covariance matrix for full data set and | The determinant of the covariance matrix can be found using the asymptotic variances of the estimators by inverting the estimated Fisher information matrix for replicated LFRM. The covariance matrix of the parameter in the model is given by Therefore, the determinant of the covariance matrix for this model is given by

Determining the cut-off point
The simulation study is performed in R software to obtain the cut-off point of COVRATIO statistic for replicated linear functional relationship model. Eight different sample sizes = 20, 40, 60 ,80 ,100 ,132 , 180 and 300. Without loss of generality, the intercept, the slope and error variance parameters of replicated LFRM are fixed at = 1, = 1 and 2 = 1 and different values of 2 = 0.2,0.4, 0.6,0.8 and 1.0 respectively. For each combination of sample size and 2 , the observed value of and are generated using equation (2). Subsequently, the generated data are fitted by using the parameters of replicated LFRM and the | | is calculated. By considering a 90% confidence level, thus the cut-off point is to be at a 10% significant level. The process is repeated 5000 times and the 10% upper percentiles of maximum value of | (− ) − 1| is obtained. The 10% upper percentiles values are used as the cut-off points in detecting a single outlier.

Results and discussions
The simulation result for 10% upper percentile can be shown in table 1. From table 1, we can see that the 10% upper percentile is a decreasing function of the sample size, .  Figure 1. Graph of the power series for the cut-off point at 10% significant level.
Then, the arithmetic mean of the values for each are calculated and the power series formula is plotted as shown in figure 1. From figure 1, the graph shows a good fit where the value of 2 is approximately equal to 1. The equation from figure 1, = 5.2418 −0.407 where is the sample size will be used as a cut-off point in detecting a single outlier. Any observations with | (− ) − 1| exceeding the cut-off point will be considered as an outlier at 10% significant level.

Application
The proposed method is applied to a real dataset by considering iron in a slag dataset with = 50 [18]. Since the dataset represents observations taken by each of two different techniques, namely magnetic test and chemical test which are subject to measurement errors, the dataset can be considered as unreplicated data because there is only single and observation for each level of [19]. The scatter plot of the dataset can be shown in figure 2. In unreplicated LFRM, the assumption on the ratio of error variances, λ, is needed to estimate the parameters. However, in the absence of knowledge on the ratio of error variances, the data is transformed into pseudo replicates and used MLE for balanced replicated LFRM to estimate all parameters. The data is divided into 5 groups to obtain the pseudo-replicates and each group has 10 observations that are balanced and equal. Since there is no outlier in the original iron in slag data, the original data is modified as shown in figure 3 [9].The purpose of using this dataset to test whether the proposed method can accurately detect the outlier present within the dataset. An outlier is inserted randomly namely at the 32 observation and then applied the proposed method. The COVRATIO statistic for each observation is calculated and any observation that exceeds the cut-off point will be considered as an outlier. When the sample size is 50, the cut-off point is

Conclusion
In conclusion, the COVRATIO statistic can be used for detecting a single outlier at 90% confidence level for replicated LFRM. The cut-off point is developed at 0.10 significant level by a simulation study. The COVRATIO statistic is used because it is a simple procedure, widely used and easy to implement to conform with replicated LFRM. As an illustration, the proposed method has been applied to a real dataset and can be used to identify a single outlier present in the data set.

Acknowledgement
We are most grateful to University of Malaya and National Defence University of Malaysia for supporting this work. We also wish to thank to referee for their helpful comments and suggestions.