A Comparison Two Ridge Regression Using LAD method with Simulation

The multicollinearity is the one of the important and contained problems in regression analysis, because its effect on model estimators, the problem is that the independent variables are so closely related that the results of the regression are not clear, the aim of this research is to solve the problem of multicollinearity. one of the solutions get of this problem has deal with, which is the ridge regression of least absolute deviation (LAD) estimators, by adding a proposed a ridge parameter which is considered as contribution to solving the problem of multicollinearity by modify B M Golam Kibria ( K^MED ) then compared it between them. The ( K^CN ) is the best estimator by simulation study and mean square error (MSE) critical.


Introduction
Multiple linear regressions is an advanced technique that guarantees accurate inference in order to obtain better results of the research by optimizing the use of data for identifying the causal relationships between the phenomena in question. The regression coefficient can be defined as the change that occurs in the response (dependent) variables as any change in the explanatory (independent) variables. The problem lies in the fact that the independent variables are closely related so that the results of the regression are unclear and thus estimating the individual effects of the variables in the regression equation is impossible because the multiple linear relationship can be considered as a case in which two or more independent variables more together and thus it is not possible to specify which of the independent variables caused the noticeable change in the dependent variable [13] . Therefore, in order to solve the problem of linearity, we used the ridge parameter with the Least Absolute Deviation (LAD) method. Extensive research has been done on the performance of ridge regression and the LAD method, separately. Both of methods are robust, but each of them suits a different type of the problems. An early attempt to take the two methods together in one estimation procedure is from Pfaffenberger and Dielman (1989) [15] , when they tried to find the ridge biasing parameter k through the LAD estimations of each of the parameters and  Hoerl et al (1975) [6] , but instead of using the Ordinary Least Square (OLS) estimate the parameters and the variance of error terms in the formula, they used the LAD estimations of them. The (LAD) method provides a strong alternative to Ordinary Least Square (OLS), especially when the data follows abnormal distribution and is subject to outliers. (LAD) estimates are not influenced too much by extreme values, relative to (OLS) estimates. On the other hand, although (LAD) was proposed as an alternative to (OLS) regression method, it is less used and therefore it can be considered a nontraditional technique [2] . Ridge regression was used because it is one of the solutions to solve the problem of linear multiplicity. A. E. Hoerl (1962) [5] he was the first to suggest the ridge parameter to control the inflation and general instability. Hoerl and Kennard (1970) [8] the value reduces MSE(̂( )), Hocking et al (1976) [7] he states that for the known optimum , generating estimator outperforms all other estimates within the class of biased estimators they considered. However, the optimum value of K is completely dependent on unknown parameters and must be estimated. Hoerl and Kennard (1970) [8] , suggested to replace unknown parameter by their corresponding unbiased estimators, Hoerl, Kennard and Baldwin (1975) [6] (̂ or HKB) proposed a different estimator of K by selecting the harmonic mean. From the Bayesians view point, suggested Lawless and Wang (1976) [11] (̂ or LW), Hocking, Speed and Lynn (1976) [7] (̂ or HSL) proposed estimator for K, B M Golam Kibria (2003) [9] he proposed estimating the ridge parameter using the arithmetic mean and geometric mean and median. In this research, took three ridge parameters above and proposed a new ridge parameter, then compared the estimator we proposed with the estimator B M Golam Kibria (̂), as the proposed method for the ridge parameter represents the ridge parameter with subtracting the condition number value which is regarded as a ratio of the largest to the smallest singular value [1] then simulation study used to compare the two estimators and find out which of them is the best through mean square error (MSE), as it is the least (MSE). The paper has been organized as following: In section (2), the main idea for (OLS) and (OLS-ridge) has been discussed. In section (3), the main idea for (LAD) and (LAD-ridge) has been discussed also, ridge parameter has been reviewed. In section (4), proposed ridge parameter. In section (5), the ridge parameters efficiency has been verified and using the simulation method to comparison between these parameters, the results have been analyzed.

Ordinary Least Square Estimator (OLS)
The (OLS) method is one of the best and oldest and it is one of the most common unbiased estimation methods are used to estimate the parameters of linear regression model:-= 0 + 1 1 + 2 2 +……+ +Ɛ , =1,2,….,n (1) Where n: Observations number.
: Independent variables number. : The independent variables. : The dependent variables. Ɛ : Error model.
: The regression coefficients. and the (OLS) method works to minimization the sum of squares deviations as which is unbiased estimators of the β as E( )=β ,has minimal variability as V( )= 2 ( ′ ) −1 which is attains the L.b. of Rao-Gramere inq.,so ( ) is best Linear unbiased estimators (BLUE) for β as: then by taking the derivation for last equation with respect to β and equal to zero get:- (2) X'Y ( ′ ) −1 =̂ As many scientists worked to find other solutions to processing the problem of linearity and one of these methods was that they proposed ridge parameters [13] .

Ordinary Ridge Regression Estimators
Different methods have been proposed to deal with common linear data by modifying the (OLS) method, by introduction some bias on the estimations of the regression parameters. The most common methods is the ridge regression method, it is used to solve the problem of linear multiplicity, where the ridge regression estimates depend on the parameter (K) called ridge parameter or biasing constant, its value K≥0 chosen by the researcher according to some suitable Criteria set by Hoerl and Kennard [8] then the ridge regression estimator ( ) as follows: Where the ordinary ridge estimator doesn't provide one solution to the linearity problem, but it is provides a set of solutions. These solutions depend on the K (ridge parameter) value an explicit ideal value for K cannot be found, however many random options have been proposed for this parameter [13] .

Least Absolute Deviation (LAD)
The (LAD) method is considered one of the most common techniques for the robust regression. (LAD) estimates are not affected considerably by extreme value, comparing (OLS) estimates. However, less understood the behavior of (LAD) estimates; particularly concern the small samples, and inference process means less straight forward [2] . Inference in (LAD) estimation is an active area of research. Koenker and Bassett [10] suggested employing (LAD) estimation in the Wald, likelihood ratio (LR), and Lagrange multiplier (LM) tests. These methods can be used to test the coefficient significance in the regression model. Dielman and Pfaffenberger [3] studied inference for regression based on (LAD) estimation in case data are independent and it is not conditioned to be normal, although (LAD) estimation has been proposed as an alternative to least squares regression, it is less used considerably and thus can be referred as a nontraditional technique. That is (LAD) parameter estimates are the parameter values which minimize the:- Where a and b are parameters which are the simple re gression case and this can be generalized to multiple regression. in fact it is simpler as the absolute value of residual is more direct measure of the size of the residual than the squared residual, However the calculation of estimates (LAD) is more complicated, to estimates there exist several algorithms for calculating them [16] , rather that the (LAD) method is powerful alternative to (OLS), especially when the data follows abnormal distribution and is subject to outliers. The (OLS) regression method of analysis produces unbiased estimates parameter and has a minimum variance with independent data, identical, and naturally distributed. However, if unusual errors occur, the (OLS) could not be good performance; particularly if the errors follow the distribution tends to produce the outliers, hence, lot of research was aimed to develop the estimation approaches to be robust for such producing outlier-error distribution [2] . The (LAD) method is considered to belong to the robust regression category and can be defined as a family of regression techniques and not a single regression technique, where the robust regression is intended to be strong against outliers thus tackling potential problems, whereas, the dataset may be not always comply with the requirements of the (OLS), especially, when the high kurtosis or skew under these conditions, the regression of the (OLS) may be less efficient than some particularly robust regression method. (LAD) may be more efficient than the (OLS) method when the residual distribution is skewed or high kurtosis (LAD) is less affected by these outliers than (OLS) [14] .

Least Absolute Deviation Ridge Regressions
Though (LAD) method of estimation is robust, there is still the possibility of having strong multicollinearity between the independent variables in a linear regression analysis application of (LAD) estimation instead of the (OLS) estimation for the ridge regression. The problem of multicollinearity naturally exist in most of the real life data sets, the issue is to which extent to solve the strong and middle of multicollinearity problem must be solved in one way or another.  (4) is [17] ̂= arg

+Ɛ
(6) where the singular value decomposition of the matrix is X= ′ Ʌ 1 2 ⁄ U', with a pᵪn orthogonal matrix, Ʌ a pᵪp diagonal matrix of eigenvalues of X'X and U a pᵪp orthogonal matrix of eigenvectors of X'X, and α=U'β the (LAD) estimation of α is In general, the optimum K differs from LAD-ridge to OLS-ridge, however, a LAD-based K and an OLSbased are aimed to find the same optimal value of K for the LAD-ridge estimation. Likewise, if they are used to estimate the OLS-ridge, then the aiming is to get a single optimal value of K for both. Therefore, the MSE for each of the bias parameters can be found in the estimation method for both the LAD-ridge and the OLS-ridge, when these biasing parameters (the LAD-based version and the OLS-based version) are taken the estimators of the optimal K value used for the OLS-ridge or the LAD-ridge estimation method. In general, LAD-ridge estimators are more robust than OLS-ridge estimators by efficiency of LAD-ridge and least (MSE) of the biasing parameter does not always imply the least total MSE of the regression parameter estimators. The reason can be a larger intercept bias compared to the slopes bias [17] . Many of solutions to solve the problem of multicollinearity, and one of these solutions is ridge regression, which is considered one of the methods of bias estimating, where the ridge parameter works to give a greater area for bias in order to reduce (MSE). Here are some of ridge estimators of (LAD) estimator that were previously proposed.

Ridge Parameter
The least absolute deviation ridge estimator does not provide a single solution to the problem of linearity, but it does provide a set of solutions, these solutions depend on the value of K (ridge parameter). An explicit ideal value for K cannot be found, however; many random options have been proposed for this ridge parameter. Some of these options can be summarized as follows:-Hoerl and Kennard (1970) [8] that the value of Ki which minimizes the MSE(̂(K)) where:- Where = 2 2 , 2 represents the error variance of linear regression model, is the ℎ element of α and α=U'β where U represent the orthogonal matrix and ‫ג‬ represent the eigenvalues. Hocking et al (1976) [7] shows for the known optimal , the generalized ridge regression estimator is superior to all other estimators in the class of biased estimators they considered. However, the optimal value of fully depends on the unknown 2 and , and they must be estimated from the observed data. Hoerl and Kennard (1970) [8] , suggested to replace 2 and 2 by their corresponding unbiased (LAD) estimators as: ̂=̂2 2 (8) Where ̂2= Ɛ′Ɛ ( − −1) is the residual mean square estimate, which is unbiased (LAD) estimator of 2 and ̂ is the ℎ element of ̂, which is an unbiased (LAD) estimator of α.  Kibria (2003) proposed to estimate K by using the geometric mean of ̂ in eq (8), as the following estimator [9] . ̂ =̂2 Kibria (2003) proposed to estimate K by using the median of ̂ in eq (8), which produces the following estimator for P≥3:- [9] ̂ =Median{̂2 2 } ,i=1,2,….,p Where p is represent the number of independent variables and its observed from eq (9) and (10) that for letting ̂=̂(i=1,2,….,p).

Proposed ridge estimator
A method will be proposed here for ridge parameter for (LAD) estimation that includes the condition number, denoted by the symbol CN, which can be defined as the ratio between the largest to the smallest singular value, that if the condition number is very large, the matrix is said to be ill conditioned. If it is infinite then the determinant of the matrix is equal to zero [1] . The proposed estimator denoted by ̂ is defined as follows:- Where CN referred to condition number, P represents independent variables number. The proposed estimator is modification ̂ in eq. (10) by subtracted 1 from ̂. This amount, however, varies with the strength of multicollinearity in the model. If the condition number is too large than ̂ would coincide with ̂ since in such case, the fraction 1 would approach to zero.
On the other hand if the condition number is too small (approximately equal to one) then the possibility that [Median {̂2 2 }-1 , =1,2,….,p] be negative is too large.
If it is equal to zero which means that the ridge regression estimator would coincide with the least absolute deviation and the data set is not influenced by the multicollinearity problem [13] .

Simulation Study
In this section we will discuss a simulation study to compare the performance of the ridge estimators in order to find out which of the ridge parameters are most efficient and having least (MSE). In this research, three independent variables P=3 were used in the linear regression model that represents the following model:-= 0 + 1 1 + 2 2 + 3 3 +Ɛ =1,2,…,n (12) McDonald and Galarneau (1975) [12] and Gibbons (1981) [4] the independent variables were generated by using the following device:- Where Ɛ~N(0, 2 ) is independent normal and ρ represents the correlation between two independent variables which is inactivated by ρ 2 in this paper it is assumed that it is 0 = 1 = 2 = 3 =1, then the following correlation values were (0.3 , 0.6 , 0.9) different sample sizes were used n=25,50,200 and the value of 2 is (0.5,1,2). The MATLAB (2020 a) program was used in order to compare the two ridge parameters to show which one is the best. Tables below represents the MSE values the best parameter with smallest MSE.

Analysis of simulation results
From table (1,2 and 3) which contains the simulation results of MSE of ridge parameter (K), we have observed that: When 2 =0.5 in experiment 1 and ρ=0.3, the best ridge parameter is ̂ in both sample size (25 and 50), both in large sample size (n=200) the best is ̂ , where in experiment 2 when ρ=0.6 the best is ̂.