Bayesian spatial modeling of poverty risk in Kelantan

Poverty data vary in rural areas, in certain states or regions, and some urban areas. These areal data tends to have spatial autocorrelation. A Bayesian hierarchical model is commonly used to estimates the risks using a combination of available covariate data and a set of spatial random effects. These random effects are commonly modelled by conditional autoregressive (CAR) prior distributions, a type of Markov random field model. Spatial autocorrelation between the random effects, ϕ in CAR models is induced by a n × n neighbourhood matrix, W. However, many studies assumed that the W is fixed when fitting the model. Therefore, this study evaluates the performance of the Poisson-log linear Leroux Conditional Autoregressive (CAR) model with m-nearest neighbourhood weight matrices using a simulation study. This study creates simulated poverty data for 66 districts of Kelantan with different scenarios that related with random effects and covariate. A Poisson log-linear Leroux CAR model with m = 1, 5 and 10 nearest neighbours are applied to the simulated poverty data. The performance of the models is evaluated using bias, Root Mean Square of Error (RMSE) and Deviance Information Criterion (DIC). The results show that the choice of m = 1, 5 and 10 neighbourhood matrices and scenarios do not affect the bias for either the regression parameter β or the risk Rk and RMSE for the risk Rk . Nevertheless, there is the dissimilarity of the performance of the models in the RMSE of regression parameter β. The results suggest that the Poisson log-linear Leroux CAR model with the m = 5 nearest neighbours performed overall best for simulated poverty data. It consistently gives good results across different strength of spatial autocorrelation of random effects ϕ and covariate. The model also gives the lowest DIC in all the scenarios, indicating a better fitting model than other models. The findings of this study give guidance in choosing the suitable m-nearest neighbourhood matrices to estimate the poverty risk in Kelantan.


Introduction
Poverty persists as a global problem that has long plagued humanity regardless of rich or non-developing countries. Hence, broad varying definitions of poverty have emerged. Poverty could be defined as the basis number of households or families having a total income of less than half or two-thirds of the community's average income [1]. Many effects of poverty such as limited or lack of access to education, disease, increased suicide rates, unsafe and degraded environment and social discrimination have been discussed in studies to minimize the effects, particularly on the productive resources to ensure sustainable livelihood [2,3,4,5,6]. The number of poor households varies from one place to another place. Modelling the data related to the adjacent spatial units is a common problem in several statistical applications. To model these data, one must consider spatial autocorrelation. That is, areas close together have more similar values on average than areas further apart. Most studies overcome this problem by adding a set of autocorrelated random effects to the linear predictor of the regression model. These random effects are most often modelled by a Conditional AutoRegressive(CAR) prior as part of a hierarchical Bayesian model [7,8,9]. A × neighbourhood matrix, W induce spatial autocorrelation in CAR models., However, many studies assumed that the W is fixed when fitting the model. Therefore, this research will compare the performance of a Poisson log-linear Leroux CAR model with different m-nearest neighbourhood matrix, W. This study will apply each W specification to simulated poverty data under various scenarios.

Materials and Methods
This section shows the methodology used in this study.

Distance-based neighbour
The spatial autocorrelation between the random effects in the Poisson log-linear Leroux CAR model is determined by a binary × neighbourhood matrix, W. The element wkj represents a measure of closeness between area k and area j for k=1,…,n areas. One of the alternatives to defining neighbours is a function of geographical distance. Distance between areas k and j is often measured as the Euclidean distance between their respective centroids. wkj=1 if area k is one of the m-nearest neighbours of area j and 0 otherwise. This study will use three different m-nearest neighbours where m=1, 5 and 10.

Poisson log-linear Leroux Conditional Autoregressive (CAR) model
A Bayesian Hierarchical model is adopted to model the data using covariates information = ( 1 , … , ) and a random effect . The random effects = ( 1 , … , ) are included to model any spatial autocorrelation in the data that persist after adjusting for the available covariate information. The random effects are modelled by a CAR prior distribution, a Gaussian Markov Random Field (GMRF) model. The model is determined by a set of univariate full conditional distributions ( | − ), where − = ( 1 , … , −1 , +1 , … , ) for = 1, … , . In this study, the random effects are given the Leroux CAR prior [10].
In this study, the poverty data used are counts data. Thus the Poisson log-linear Leroux CAR model is employed. Therefore, the formulation of the Poisson log-linear Leroux CAR model used in this analysis is shown below: In the above equation, is the poverty risk in area . If = 1, then ( ) = is thus the average risk. While if = 1.2, then ( ) = 1.2 which means 20% more cases than expected. Here is the level of spatial autocorrelation in the random effects, where = 1 shows strong spatial autocorrelation between random effect and corresponds to the intrinsic model and = 0 corresponds to independence ( ∼ (0, 2 )). Finally, 2 the conditional variance of | − . Inference for this type of model is typically based on Markov Chain Monte-Carlo (MCMC) simulation, using a combination of Gibbs sampling and Metropolis-Hasting steps. The software used for this study is [11], which is an R package for Bayesian spatial modelling with conditional autoregressive priors.

Simulation Study
This section provides simulation research that compares the performance of the Poisson log-linear Leroux CAR model with m=1, 5, 10 nearest neighbours.

Data generation
Simulated data are generated with irregular lattices comprising a real map of the 66 districts of Kelantan. Simulated poverty counts Yk are generated from the model (1). For simplicity, a covariate is used and the regression coefficient is fixed at β = 0:10. The expected numbers of poverty cases, E, are the expected of poverty cases in 66 districts of Kelantan. This study follows data generation from Lee [4] in disease mapping study, which used in their simulation study n = 271 intermediate geographies (IG) in the Greater Glasgow health board, the expected numbers of disease cases are the expected of cancer cases for 271 IG, and there are several scenarios regarding the random effects. A covariate (x) and the random effects are generated from multivariate normal distributions given by ~N(0,τ 2 R) and x~N(0,σ 2 V). Both τ 2 and σ 2 are fixed at 0.10. Two different measures of R are used corresponding to two different structures of the covariate, which are independent and spatially autocorrelated. While two different measures of V are used, they correspond to weakly and strongly spatially autocorrelated random effects . The poverty data are simulated under each of the following scenarios: 1. Scenario 1: spatial autocorrelation between are weakly autocorrelated and independent spatial autocorrelation of a covariate at different locations. 2. Scenario 2: spatial autocorrelation between are strongly autocorrelated and independent spatial autocorrelation of a covariate at different locations. 3. Scenario 3: spatial autocorrelation between are weakly autocorrelated and strong spatial autocorrelation of a covariate at different locations. 4. Scenario 4: spatial autocorrelation between are strongly autocorrelated and strong spatial autocorrelation of a covariate at different locations.
In this study, a total number of one hundred simulated data sets are generated under each of the four scenarios. The Poisson log-linear Leroux CAR model with m=1, 5, 10 nearest neighbours was applied in each scenario. The bias, root mean square error (RMSE) of the estimated β and poverty risks Rk and Deviance Information Criterion (DIC) are used to evaluate the relative performance of the three models.     The results of the simulation study are represented in Table 1. The choice of W matrices and scenarios does not affect the bias for either the regression parameter β or the risk Rk. In all cases, the biases are less than 0.021 (β) and 0.003 (R) in absolute value, which is approximately 0. This shows that there is a small difference between the expected value and the true value of the parameter being estimated. All the model can estimate the true value of β and R. The Poisson log-linear Leroux CAR model with m=1, 5, 10 nearest neighbours can estimate the true value of β and Rk.

Results and Discussion
Overall, all the models produce a similar result in RMSE for poverty risk Rk in all scenarios. However, the performance of the models is different in terms of RMSE for the regression parameter β. In the weakly autocorrelated between and independent spatial autocorrelation of the covariate, the Poisson log-linear Leroux CAR model with the m=10 performs the worst in terms of RMSE for the β with the value of 0.49. If spatial autocorrelation between is strongly autocorrelated and independent spatial autocorrelation of the covariate, the m=10 model also performs the worst with the highest RMSE (0.114). On the other hand, if there are strongly autocorrelated between and independent spatial autocorrelation of the covariate, the Poisson log-linear Leroux CAR model with m=1 nearest neighbours performs the worst with RMSE is 0.229. The model also performed the worst in the presence of strong spatial autocorrelation between and strong spatial autocorrelation of the covariate.
Finally, in each scenario, the DIC, a measure of how well a model fits a collection of data, was calculated and presented in the bottom part of Table 1

Conclusion
This study has evaluated the performance of a Poisson log Linear Leroux CAR model with m=1, 5, and 10 nearest neighbours. There were clear differences in the number of neighbours assigned by m=1, 5, and 10 nearest neighbours. The performance of these models has been evaluated through simulation, particularly the reliability in estimating β and poverty risk Rk. In all scenarios, all the models produce bias approximately 0 for both β and Rk. There are similarities in the RMSE of Rk regardless of the different strength of spatial autocorrelation of random effects and covariate. The choice of W matrices for by m=1, 5, and 10 nearest neighbours and scenarios does not affect the RMSE for the risk Rk. Nevertheless, there is dissimilarity in the RMSE of regression parameter β. If the data contains weakly autocorrelated between and difference strength of spatial autocorrelation of the covariate, the Poisson log-linear Leroux CAR model with the m=10 has the highest RMSE. While in the presence of strongly autocorrelated between and difference strength of spatial autocorrelation of the covariate, the Poisson log-linear Leroux CAR model with the m=1 has the highest value of β. For DIC, the model with m=5 nearest neighbours gave the low values of the DIC in all the scenarios, which indicate a better fitting model compared with other models.
Overall, these results suggest that the Poisson log-linear Leroux CAR model with the m=5 nearest neighbours performed overall about the best for simulated poverty data since it consistently gives good results across different strength of spatial autocorrelation of random effects and covariate. This study may assist other researchers in conducting studies involving GIS applications and spatial data related to districts of Kelantan, especially in poverty study. Therefore, future works will involve a different measure of spatial closeness, including adjacency and other distance-based neighbourhood matrices.