Application of Generalized Poisson Regression for the development of modeling motorcycle accidents based on rider characteristics

Motorcycles are a mode of transportation that has experienced many traffic accidents in developing countries. This incident can occur intentionally or not and cause great loss of life and property. Therefore, this work aims to develop a model of the number of motorcycle accidents based on driver characteristics using Generalized Poisson Regression in East Java Province, Indonesia. The modeling can identify the relationship between the number of motorcycle accidents with variables that contribute to accident based on driver characteristics. This characteristic is a combination of socioeconomic factors and movements. Research analysis includes descriptive statistics and accident modeling using Generalized Poisson Regression Models. Model parameter values are estimated using the Maximum Likelihood Estimation (MLE) approach. The optimal solution for parameter estimation uses the Newton-Raphson algorithm. The results found two models and then selected the best model. Based on the parameter significance test, the population density, the percentage of low education, the gender ratio, and the percentage of accident perpetrators without a driver’s license significantly impact the number of motorcycle accidents in East Java Province.


Introduction
Motorcycle accidents are unpredictable events.These incidents occur intentionally or not and can involve motorcycles with other vehicles or road users.The loss of life and property is the result of the accident [1].In 2018, motorcycle accidents in Indonesia had the most significant percentage, around 63% of all traffic accidents.ASEAN ranks this incident significantly second [2].The highest accident losses occur in Indonesia, with an estimated 6.03 billion USD per year [3].Therefore, accident prediction needs to be done to minimize accidents on motorcycles.
Mathematical approaches are often used in predicting accidents, including motorcycle accidents.One of them is using the Generalized Linear Model (GLM) approach with Poisson distribution.The Poisson distribution is the basis for forming the Poisson Regression model.The equidispersion assumption is an assumption that must be met in the Poisson distribution [4,5].However, in several studies, there is an assumption violation, especially in instances of overdispersion [6].The dispersion value arises because of the mean and variance relationship.It causes the deviation value to be significant, so the model obtained is less accurate.Therefore, Generalized Poisson Regression (GPR) was chosen as an alternative regression.Overdispersion or underdispersion can be avoided by using it.
GPR is a development of Poisson Regression that allows for the analysis of the connection between the response variable and one or more predictor variables [7].The application of GPR can explain several discrete data.In addition, it maintains overdispersion or underdispersion properties.The regression is a better model for analyzing traffic accidents than Negative Binomial Regression, Poisson Regression, and Conway-Maxwell-Poisson (COM-Poisson) regression using traffic accident data [8].
The research aims to develop a model of motorcycle accidents in East Java based on the characteristics of the driver using the GPR method and determine the significant variables to identify the relationship between the response variable, namely the number of accident, and several predictor variables, are a combination of the characteristics of the rider.Variable characteristics of riders who are vulnerable to accident loads are socioeconomic and movement.Based on socioeconomic status, the factors suspected to cause the accident is adolescent age, low education level, gender, and the driver who caused the collision did not have a valid license.Adolescent drivers aged 15 to 29, including motor vehicle drivers, are often involved in road accidents [9,10].Research [11] discusses accident factors by gender and has identified that men have a higher accident rate than women.Research by [12] has shown that lower levels of education lead to higher death rates in accidents.Unlicensed drivers are more prone to driving recklessly.So, if they get into an accident, they are more likely to be at fault and seriously hurt than licensed drivers [13].Meanwhile, another factor that causes accidents based on movement is population density.People's attitudes about keeping things safe and in order and many people living in an area are two things that contribute to the rise in traffic accidents, which leads to a high number of traffic violations [14].
Several studies discussed previously analyzed the GPR model on the interaction of the relationship between the number of traffic accidents in general.Meanwhile, other studies examined the characteristics of drivers with other models used.However, to our knowledge, there is still no work that discusses modeling the prediction of the number of accidents, specifically on motorcycle, considering that this mode of transportation is a favorite mode that is widely used by the public and most often experiences high severity accidents in developing countries.

Generalized Poisson Regression Model
Generalized Poisson Regression is a development of Poisson regression that has better performance and overcomes overdispersion conditions.Let be the response variable count.The GPR model has a probability density function defined as follows.
(1) F or .The formulation of the GPR model is (2) where is the predictor variable and is the regression

Parameter Estimation Method
2.2.1.Maximum Likelihood Estimation.Maximum Likelihood Estimation (MLE) is method uses a distribution approach by maximizing the likelihood function [16].Maximum Likelihood Estimation (MLE) is used to get the Generalized Poisson Regression estimation parameter with the following steps.1. Determine the Likelihood function . (3) 2. Determine the Log-likelihood function .
(4) 3. Maximize the Log-likelihood function . ( In maximizing the likelihood function, the results obtained are in a nonlinear form that cannot be solved analytically, so a numerical algorithm is needed to determine the solution.This study uses the Newton-Raphson Algorithm to estimate the parameters.

Best Model Selection
The best model selection is based on the Akaike Information Criteria (AIC), which have the smallest value.Akaike Information Criteria (AIC) is a criterion for selecting models in econometrics.AIC is defined as follows: . ( 6) L(̂) is the probability value and is the number of parameters [17].

Descriptive Statistical Analysis
The area of East Java is is 46,428.57km² and divided into 38 regencies/cities, with details of 29 regencies and nine cities. Descriptive statistical analysis needs to be done before doing the modeling analysis.
Based on Table 1, the response variable (Y) is the number of motorcycle accidents in East Java Province, with 63.18 as the average and 55.365 as the standard deviation value.It means that in 2020, the number of motorcycle accidents in East Java province will be quite high.

Poisson Distribution Test
Poisson distribution testing is done to test a data Poisson distribution or not by doing the Kolmogorov-Smirnov test.The goodness-of-fit test of the Poisson distribution can be seen in Table 2.The results of the Poisson distribution test show that the sign value.(2-tailed) of 0.153, which is greater than the significance level of α/2 = 0.025 Based on the goodness-of-fit test, it can be concluded that the data has a Poisson distribution.After the response variable is proven to have a Poisson distribution, it is continued with multicollinearity testing and testing the equidispersion assumption.

Multicollinearity Test
In the regression model, it is expected that the independent variables are not correlated.A test for multicollinearity was used to evaluate if the independent variables were correlated.The model is said to lack multicollinearity symptoms if the Variance Inflation Factor (VIF) value is less than 10.Table 3 shows that a VIF value of predictor variables has less than 10.Thus, it can be concluded that multicollinearity was not found in the observed data, so the data analysis could be continued using the regression model.

Equidispersion Assumption Test
The Poisson distribution assumes an equidispersion condition, where the mean equals the variance.However, these assumptions are often not met because of overdispersion or underdispersion of the data.Figure 1 shows that the variance increases as the mean increases.It indicates that it does not meet the dispersion assumption.4 explains that the dispersion value is 29.83908, which is more than one, so it concludes that the data on the number of motorcycle accidents experienced a violation of the assumption that there was a case of overdispersion.Therefore, the GPR model can overcome the case of overdispersion in the data.Therefore, the GPR model can overcome the case of overdispersion in the data.

Generalized Poisson Regression Model
The model selection process using the stepwise method, where there are two steps, meaning that two models are obtained.Model 1 shows a full model where all predictor variables are included in the model.Model 2 shows that the model only involves four predictor variables without the X2 variable.Model selection can be seen in Table 5.
Step  6. Forthemore, the modeling can be written as follows.(7) The results of the parameter significance test show that population density (X1), percentage of low education (X3), gender ratio (X4), and percentage of accident perpetrators who do not have a driving license (X5) produce p-values < level of significance (a = 5%).Thus the variables that have a significant effect include: a) population density (X1), One cause of the increase in traffic accidents is caused by population density with the attitude of individuals who are less concerned about a safe and orderly environment, so vehicle drivers commit many traffic violations [18,19]; b) percentage of low education (X3), The highest mortality occurred in drivers with low education.The highest accident risk group is those who graduated from elementary school [12].As the education level increases, the accident rate decreases; c) gender ratio (X4), Research [11] identified men as having a higher accident rate than women.It may happen if men population dominates than women in an area.Meanwhile, female drivers are more careful and responsible for traffic, so the accident rate among female drivers is lower than that of male drivers; d) percentage of accident perpetrators who do not have a driving license (X5), According to research [13,20], unlicensed drivers are more likely to engage in risky driving behavior.Therefore, they are more likely to be guilty than licensed drivers and more seriously injured when involved in an accident.

Conclusion
The study focuses on analyzing the GPR model on the data on the number of motorcycle accidents in East Java Province in 2020.The model of GPR can overcome the overdispersion problem that occurs in data with a Poisson distribution.The best model is selected based on the AIC and SBC values, which have the smallest values with four predictor variables used.The significance test conducted shows that population density (X1), percentage of low education (X3), gender ratio (X4), and the percentage of accident perpetrators who do not have a driver's license (X5) each have a significant effect on the growth of motorcycle accidents in East Java Province.For further research, GPR can be used to analyze other factors are thought to affect the increase in motorcycle accidents so that it can be input for the government responsible for providing policies related to handling accident cases based on the factors causing the accident.

Figure 1 .
Figure 1.The plot of visualization of mean against variance.The existence of this dispersion case can cause inaccuracies in the resulting model.The mean and variance relationship in Poisson Regression is shown by dispersion parameter.If dispersion parameter more than 1 then the data is overdispersion, but if dispersion parameter less than 1 then the data is underdispersion.

Table 2 .
Goodness-of-fit test for Poisson distribution.

Table 3 .
VIF value of each variable.
The best model selection is based on the AIC criteria, which have the smallest value.Based on AIC, it is shown that model 2 is the best model for motorcycle accident data in East Java Province.

Table 6 .
The best result of Generalized Poisson Regression model.