Restricted Ridge Regression estimator as a parameter estimation in multiple linear regression model for multicollinearity case

Multiple Linear Regression analysis is one of the techniques in statistics that is used to analyse the relationship between a dependent variable and two or more independent (regressor) variables. Ordinary Least Square method is commonly used to estimate the parameters. Most frequently occurring problem in multiple linear regression analysis is the presence of multicollinearity. Multicollinearity in the least square estimation produces estimation with a large variance, so another method is needed to overcome the multicollinearity. The method is called ridge regression. In this method, a constant bias ridge k is added to X ′ X matrix. A study had developed the method by using the prior information of the parameter β and introduced the Restricted Ridge Regression method. Prior information of the parameter β was defined as a non-sample information arising from past experiences and the opinions of an expert with similar situations and containing the same parameters β. This study explains the use of Restricted Ridge Regression method in overcoming the multicollinearity in regression model. Based on an application on a data set, β of Restricted Ridge Regression has smallest mean square error (MSE) than β of Ordinary Least Square.


Introduction
Consider the multiple linear regression models denoted by ൌ ઽ which is ݊ ൈ ͳ observation (dependent variable) matrix, is ݊ ൈ ‫‬ regressor variables matrix, is ‫‬ ൈ ͳ parameter regression vector and is ‫ͳݔ݊‬ random error vector with ‫ܧ‬ሺઽሻ ൌ Ͳ and ܸܽ‫ݎ‬ሺઽሻ ൌ ߪ ଶ [1]. Gauss-Markov explains that the least square estimator ൌ ሺ ᇱ ሻ ି Ԣ is an unbiased estimator with minimum variance if the classical assumption of error is satisfied [2]. However, when there is a linear dependency between regressor variables, it can cause the least squares estimator become less effective. Multicollinearity is a condition in which regressor variables in multiple linear regression model is almost linearly dependent. This condition causes the variance of least squares estimator tends to be large and the estimator becomes unstable. This resulted in a less precise interpretation of the regression model [1]. 2 Multiple linear regression model that experienced multicollinearity, resulted in the least squares estimation to be unstable in assessing the coefficient of regression parameters. In 1970, Hoerl and Kennard introduced a new method to solve the problem, the method is known as ridge regression. The basic principle of the ridge regression method is to add the bias constant to the main diagonal of Ԣ matrix which is obtained by minimizing the sum of squares error function of multiple linear regression models with subject to ᇱ ൏ ܿ, c is an optimal value [3].
Ridge regression estimator is denoted by where: This method produces ridge estimator that is bias and depends on a constant ݇ ridge bias constant), so it is necessary to find a constant ݇ that can optimize ridge estimates by producing variance and a small bias [3]. GroȾ and Sarkar developed a method that utilized prior information from the Ⱦ parameters and the ridge regression (RR) method [4,5]. Restricted Ridge Regression (RRR) method is expected to overcome multicollinearity and produce small MSE values.

Prior information of parameter
Berger defines the prior information for the Ⱦ parameter which is a non-sample information arising from past experiences with similar situations and containing the same Ⱦ parameters [6]. Shalabh et al. said that prior information can be expressed in 3 forms of exact, stochastic, and inequality restriction [7]. This paper discusses more deeply about the exact linear restriction symbolized as the form of ൌ , where R is a nonzero matrix, mൈ p with rank(R) = m < p and r is an mൈ1 vector.

Restricted ridge regression estimator
The Restricted Ridge Regression method estimates the Ⱦ parameter by minimizing the sum of squares error function of the multiple linear regression model with its constraints. Minimize Subject to ൌ and ᇱ ൏ ‫ݑ‬ , where u is a limitation constant, ൌ is the prior information of the parameters and ᇱ ൏ ‫ݑ‬ are the limits required for a regression model to reduce multicollinearity problems. The Lagrange method is used to find Restricted Ridge Regression estimator as below [8,9].
or can be expressed as Covariance of Mean Square Error (MSE) of

Results and discussion
The data in this paper were taken from a paper entitled A New Biased Estimator in Linear Regression and A Detailed Analysis of the Analytical-Analyzed Dataset on Portland Cement by Kaciranlar et al. [11]. The standardized data, a dependent variable and 4 regressors are shown at table 1. The Restricted Ridge Regression method has a working principle that is by using the prior information of parameter in the form of linear restriction equation ൌ which obtained in previous research. In this study, the prior information is derived from the parameter value of in the ridge regression estimation, where, ridge estimators from equation 1 are Thus the linear restriction equation ൌ (from section 2) is as the following, where Ԣ is a transpose of . Table 3 shows the Restricted Ridge Regression estimator based on equation 3 for some values of ݇ ‫א‬ ሺͲǡͳሿ. MSE is mean square errors, where in this computation MSE is the mean of sum squares of the differences between observation values, ‫ݕ‬ and its predictions, ‫ݕ‬ ො .   Table 4 shows the Ⱦ ୗ , Ⱦ ୖ , Ⱦ ୖୖ , parameters estimation of ordinary least square, ridge regression, and Restricted Ridge Regression, respectively. It shows that parameters of Restricted Ridge Regression have the smallest MSE. It means this estimation gives stable regression estimator.

Conclusion
Multiple linear regression model has unknown value regression coefficient parameter so it is necessary to estimate parameters using least square method. However, regression model often has problems with the presence of multicollinearity, characterized by a linear relationship between regressor variables and it cause the least square estimator (Ⱦ ୪ୱ ) become unstable. This paper uses the application of the Restricted Ridge Regression method in the "Portland Cement datasets" and the results show that multicollinearity causes Ⱦ ௦ to be unstable to estimate the model, Ⱦ ோ produces stable estimates with MSE 0.037 and Ⱦ ோோ also produces stable estimates with MSE 1.387692e-09, where MSE (Ⱦ ோோ ) < MSE (Ⱦ ோ ).