Variable selection in Gamma regression model using binary gray Wolf optimization algorithm

In the real life applications, large amounts of variables have been accumulated quickly. Selection of variables is a very useful tool for improving the prediction accuracy by identifying the most relative variables that related to the study. Gamma regression model is one of the most models that applied in several science fields. Gray Wolf optimization algorithm (GWO) is one of the proposed nature-inspired algorithms that can efficiently be employed for variable selection. In this paper, chaotic GWO is proposed to perform variable selection for gamma regression model. The simulation studies and a real data application are used to evaluate the performance of our proposed procedure in terms of prediction accuracy and variable selection criteria. The obtained results demonstrated the efficiency of our proposed methods comparing with other popular methods.


Introduction
Gamma regression model is widely applied method for studying automobile insurance claims and medical science [1][2][3]. "Specifically, when the response variable under the study is distributed as gamma distribution [4,5].
In many real applications, recent developments in technologies have made the possibility to measure a large number of variables. In the regression modeling, the existence of huge number has a negative effect by overfitting the regression model. Therefore, identification of a small subset of important variables from a large number of variables set for accurate prediction is an important role for building predictive regression models [6].
When the number of variables increases, the traditional variable selection methods, such as stepwise selection, forward selection, and backward elimination computationally become an exhaustive search and require a long time for computing. Penalization methods, (lasso) [7], (scad) [8], elastic net [9], and adaptive lasso [10], are become an attractive methods for simultaneously performing variable selection and model estimation.

Gamma regression model
In epidemiology, social, and economic studies, positively skewed data are often arisen. Gamma distribution is a well-known distribution that fits such type of data. "Gamma regression model (GRM) is used to model the relationship between the non-negative skewed response variable and potentially variables [23].
Assume i y is the response variable which is following a gamma distribution with shape parameter  and scale parameter  , i.e.
The maximum likelihood method of Eq. (4) is the most common method of estimating the coefficients of GRM. Assuming that the observations are independent and 1/ T ii   x β , the loglikelihood function is given by the ML estimator is then obtained by computing the first derivative of the Eq. (3) and setting it equal to zero, as Depending on the iteratively weighted least squares (IWLS) algorithm, in each iteration, the parameters are updated by (

Chaotic grey wolf optimization algorithm
Mirjalili, et al. [24] presented a new metaheuristics algorithms as a swarm intelligence, "which is known as the grey wolf optimizer (GWO) algorithm. The GWO simulate the behavior of leadership and hunting in organisms of grey wolf. The GWO simulates the driving hierarchy in the environment and this distinguishes it from the rest of the swarm algorithms. The simulation of hunting in the GWO algorithm is done through the hierarchy of leadership, where the crowd is divided into different groups and levels such as alpha, beta, and omega [24]".
Gray wolves belong to the Canidae family and are classified as top predators because they belong to the top of the food chain. The first level of the leadership hierarchy is the alpha ( ) type and they represent the leaders, they may be female or male, and they are responsible for making all the decisions related to hunting, sleep, time to wake and so on . "The second level in the hierarchy of leadership is the beta (  ), where these wolves are helping wolves in the first level of the alpha in making decisions. Wolves in the second level (  ) respect wolves in the first level ( ) and reinforce decision-making and act as their consultant. In the third level, there is a type of omega (  ) and plays the role of scapegoat for the flock. All wolves from other levels are submitted to wolves of the omega type. It may seem that wolves in the third level are not an important person, but it is observed that the group without them face fighting and internal problems. This is due to the venting of vehemence and frustration of all wolves by the omega (  ). This helps in fulfilling the whole pack and preserve the dominance structure [25]. Wolves, which are not alpha ( ), beta (  ), or omega (  ), are called the subordinate or delta (  ), and wolves in this species must be subjugated to alpha ( ) and beta (  ), but they dominate the omega (  ) wolves.
Mathematical models for each level of the leadership pyramid of the GWO are calculated through the following: . , where t shows the current iteration, where the components of L are linearly reduced from 2 to 0 over the course of iterations and 12 , rr are random vectors in   0,1 [26].

Hunting
There are three main steps that are applied during hunting prey. There are: (1) the search for prey, (2) encircling, and, (3) attacking. The mathematical behavior of the gray wolf algorithm is simulated by assuming that alpha ( ), beta (  ), and delta (  ) have potential knowledge of the prey location.
Mathematical equations in this regard are developed by 1 2 3 . , .
, where a is a random value in the interval [ , ] LL  . The gray wolves are compelled to attack the prey when random value 1 a  . The prey is searched through exploration ability and attack prey the ability to exploit. The arbitrary values of L are utilized to force the search to move away from the prey [27]. The arbitrary values of L are applied to force the search to move away from the prey. x crossover x x x   (14) where ( , , ) crossover x y z is a suitable crossover between solutions ,, x y z and 1 2 3 ,, x x x are binary vectors representing the effect of a wolf in bGWO, which move towards the alpha; beta; delta gray wolves in order. 1 2 3 ,, x x x are calculated using Eqs. (15), (18), and (21),respectively, as where d where rand is a random number derived from the uniform distribution in the closed period   where rand is a random number derived from the uniform distribution in the closed period  

Computational results
In this section, the performance of our proposed variable selection method, CGWO is tested. Further, the performance of CGWO is compared with the GWO, Bayesian information criteria (BIC), and Akaike information criteria (AIC) that are defined as, respectively, (27) where ( ) β is the log-likelihood for PRM and q is the number of selected variables".

Simulation results
In this section, the same simulation settings of Algamal and Lee [29] and Wang, et al. [30] are used. The sample size is considered with  It shows from these tables that the CGWO method there has a significant improvement where it has a much better average of MSE than those GWO, AIC, and BIC methods. For instance, in Table 1  In terms of variable selection performance, our proposed method obviously selects a very few irrelevant variables comparing with GWO, AIC, and BIC, where the number of the true zero coefficients which are correctly set to zero is high comparing with others. For example, in Table 3  From the results of simulation 3 (Table 3), the model is dense, and, therefore, all the methods have zero values for the criterion C. On the other hand, CGWO is the best because the number of nonzero variables that have been identified as irrelevant variables is smaller compared with GWO, AIC, and BIC. It is worth noting that AIC has inferior performance in all simulation examples comparing with GWO, BIC, and CGWO methods".

Real application result
To make the benefit of the our proposed method in the real application, "a chemistry dataset with     , 65,15 np  , of imidazo [4,5-b]pyridine derivatives [31]. The response of interest is the biological activities (IC 50 ) [32]. A Chi-square test as a goodness of fit is used to check whether the biological activities variables has the gamma distribution. The result of the test equals to 9.3657 with FISCAS 2020 Journal of Physics: Conference Series 1591 (2020) 012036 IOP Publishing doi:10.1088/1742-6596/1591/1/012036 10 p-value equals to 0.9534. This indicating that the gamma distribution fits very well to this response variable. The estimation of the dispersion parameter is 0.0066". Table 4 summarizes the MSE and the selected variables for each used method for the real data application.
As seen from the result of Table 4, CGWO can remarkably reduce the MSE comparing with GWO, AIC, and BIC. In terms of selected variables, on the other hand, it clearly seen from Table 4 that CGWO only select 6 variables out of 15 variables when the gamma model is assumed. CGWO selected the explanatory variables 1 x , 2 x , 7 x , 8 x , 11 x , and 15 x . These selected variables are identified as relevant variables to the study. Comparing with GWO and BIC, CGWO includes few variables with the MSE is less than them".