Spatial autoregressive stochastic frontier model with application to Indonesia’s aquaculture

Stochastic frontier analysis (SFA) is one of the popular methods for measuring efficiency and productivity. SFA generally assumes that decision making units are independent. However, this assumption may be inappropriate since decision making units are probable interdependent depending on their spatial proximities. This paper aims to incorporate spatial lag into standard SFA model. The performance of spatial autoregressive stochastic frontier model (SAR-SFA) is compared to standard SFA using simulation and real data. Simulation study is undertaken using various degrees of spatial autocorrelation. Furthermore, SAR-SFA is applied to estimate technical efficiency of aquaculture in Indonesia. Simulation results showed that the SAR-SFA generally produces better performance than the standard SFA. If spatial dependencies are significant, then the estimation of production frontier from standard SFA could be misleading. It affects the distribution of technical efficiency and its ranking. Moran’s I test indicated that Indonesia’s aquaculture has significant positive spatial autocorrelation. If standard SFA is applied to estimate technical efficiency of Indonesia’s aquaculture, then the resulted technical efficiency is implausible. This issue is well handled by the SAR-SFA model. Moreover, SAR-SFA model serves procedure to estimate efficiency spillovers.


Introduction
Various methods have been proposed for measuring productivity. Stochastic frontier analysis (SFA) is one of the alternative methods that uses regression techniques to estimate productivity. The primary feature of SFA is that the error has two components, one component to explain inefficiency and another one to explain noise or random shocks such as engine failure and bad weather [1,2]. SFA generally assumes that inefficiency and noise terms between decision making units (DMUs) are independent [3]. However, the non-autocorrelation assumptions in SFA may be inappropriate. DMUs could be interdependent via supply-chain networks or various externalities. Economic activities in a specific area attracts labor and intermediate goods in its neighbor areas or vice versa. Moreover, spatial impendence between DMUs causes the knowledge spillovers.
Fusco [4] stated that if spatial autocorrelation is significant, then conventional method used to estimate the parameters of SFA can produce considerable bias. In order to deal with these issues, several studies have been conducted to incorporate spatial lag into standard SFA model. Druska and Horrace [5] integrated stochastic frontier model and spatial error model using generalized moments method. Fusco and Vidoli [6] incorporated spatial lag of inefficiency term into standard SFA and estimating efficiency using maximum likelihood method. Adetutu et al. [7] accounted for spatial dependencies by allowing spatial lag of inputs and exogenous variables into SFA model. Glass et al. [8,9,10] proposed a ICMSDS 2020 Journal of Physics: Conference Series 1863 (2021) 012044 IOP Publishing doi: 10.1088/1742-6596/1863/1/012044 2 stochastic frontier model with spatial autoregressive structure. Spatial autoregressive stochastic frontier model has another advantage as it allows to study efficiency spillovers. Tsukamoto [11] developed a spatial autoregressive stochastic frontier model that can simultaneously estimates the production frontier and determinants of inefficiency.
This paper employs spatial autoregressive stochastic frontier model to estimate production frontier and technical efficiency using simulation and real data. Simulation is conducted to evaluate the performance of spatial autoregressive stochastic frontier model compared to non-spatial stochastic frontier model using various degrees of spatial autocorrelation. Furthermore, spatial autoregressive stochastic frontier model is applied to measure technical efficiency of Indonesia's aquaculture. Efficiency spillovers for all provinces are also derived to determine whether specific province is net efficiency importer or net efficiency exporter.

Stochastic Frontier Model
Stochastic frontier model was proposed by Aigner et al. [1] and Meeusen and van Den Broeck [2] at the same time. Stochastic frontier model can be expressed as follows: ̃= (̃; ) + − ; = 1,2, … , where ̃ is the output of ith DMUs in natural logarithmic form, ̃ is the vector of inputs of ith DMUs in natural logarithmic form, is the vector of parameters, (̃; ) is the specified production function, ~ (0, 2 ) is noise term, and ~ + (0, 2 ) is inefficiency term. The error structure can be written as = − and the log-likelihood function is given by [3]: where = √ 2 + 2 , = / , and (. ) is the cumulative distribution function of the standard normal distribution. The parameter is called inefficiency parameter. If → ∞ then the deviance to the frontier is caused by inefficiency. Conversely, if → 0 then the deviance to the frontier is caused by noise and the model can be estimated by ordinary least squares. Finally, technical efficiency score (TE score) for each DMUs can be estimated as follows [12]: where * = − 2 / 2 , * = / , and (. ) is the density function of the standard normal distribution.

Spatial Autoregressive Stochastic Frontier Model
This paper applies spatial autoregressive stochastic frontier model proposed by Glass et al. [10] for cross-section data as follows: where ̃ is the output of ith DMUs in natural logarithmic form, ̃ is the vector of inputs of ith DMUs in natural logarithmic form, is the vector of parameters, (̃; ) is the specified production function, is spatial lag parameter, ~ (0, 2 ) is noise term, ~ + (0, 2 ) is inefficiency term, and is the spatial weights. This paper uses threshold method to construct spatial weights where the distances are calculated using Great Circle Distance (GCD). The formula of GCD and threshold method are as below [13,14]: where is the Earth's radius, ( , ) and ( , ) are the respective coordinates. The value of is 3959 if the distance in miles and is 6371 if the distance in kilometres. The optimal upper distance bound is chosen when it produces the smallest p-value of Moran's I test. The log-likelihood function can be specified as [10]: where = √ 2 + 2 , = / , (. ) is the cumulative distribution function of the standard normal distribution, is the identity matrix, is spatial lag parameter, and is spatial weights matrix. Following Fan et al. [15] approach, spatial autoregressive stochastic frontier model can be rewritten as ̃≡ (̃| , ) + ƍ( , 2 ) + where ƍ( , 2 ) = (√2 )/[ (1 + 2 )] 1/2 and the spatial lag parameter ( ) can be estimated using following concentrated log-likelihood function: where is a constant, 0 is residual of ordinary least squares from regressing ̃ and ̃, 1 is residual of ordinary least squares from regressing ̃ and ̃, is identity matrix, is spatial lag parameter, and is spatial weights matrix. Using the resulted estimation of spatial parameter (̂), the production function parameters ( ) can be estimated as: Furthermore, using pseudo-likelihood method, the inefficiency parameter ( ) is estimated by maximizing the following log-likelihood function: The solution of above log-likelihood function is the pseudo-likelihood estimate of inefficiency parameter (̂). By substituting ̂, the pseudo-likelihood estimates for ̂2 and ̂ are also derived. Therefore, the estimates for 2 and 2 are calculated as below: Following Jondrow et al. [12], the technical efficiency score for each DMUs can be estimated as: is the density function of the standard normal distribution and (. ) is the cumulative distribution function of the standard normal distribution.
Glass et al. [10] introduced the concept of efficiency spillover using spatial autoregressive stochastic frontier model. The relative direct, relative indirect and relative total efficiencies are calculated as below: where is efficiency spillover to the ith area from all other areas and is efficiency spillover from the jth area to all other areas.

Design of Simulation
This study uses coordinates of 34 provinces in Indonesia as spatial layout. The spatial weights matrix ( ) is constructed using threshold method where the distances are calculated using Great Circle Distance (GCD). The data for input variable ( ) are generated from uniform distribution with interval (0, 10). Inefficiency terms ( ) are randomly chosen from half normal distribution + (0,1) and noise components ( ) are randomly drawn from normal distribution (0, 0.001 2 ). Furthermore, the intercept and slope are set to be 0 = 2 and 1 = 0.3, while spatial lag parameter ( ) is simulated from 0.1 to 0.9 to reflect various degrees of spatial autocorrelation. Finally, the output variable is calculated as follows: For each setting, the calibration process is replicated 100 times with new sample of and . The parameters 0 , 1 , 2 , 2 , input , and spatial weights matrix are fixed for each replication. To evaluate the performance of the models, this study uses Root Mean Squared Error (RMSE) criteria as follows: where ̂ is the corresponding estimate to be evaluated and is the number of replications.

Application to Indonesia's Aquaculture
Data was acquired for Indonesia's aquaculture from the Ministry of Marine Affairs and Fisheries Republic of Indonesia. Units of analysis are 34 provinces in Indonesia. There are two input variables and one output variable that are used in this study. Table 1 lists the input and output variables.
where is the aquaculture production of ith province, 1 is the area of aquaculture of ith province, 2 is the number of aquaculture producers of ith province, is the spatial lag parameter, ~ (0, 2 ) is the noise term, ~ + (0, 2 ) is the inefficiency term, and is the spatial weights. Table 2 presents the comparison of RMSE between standard stochastic frontier model (SFA) and spatial autoregressive stochastic frontier model (SAR-SFA) for various values of . For intercept (̂0), RMSE of SFA is smaller than SAR-SFA until = 0.3, but become larger since = 0.4. If the data has high spatial autocorrelation, then the deviance of SFA estimate to the true intercept is substantial. For ̂1 , ̂2, ̂2, and ̂2, RMSE of SFA is not much different with RMSE of SAR-SFA if the data has low spatial autocorrelation. If spatial autocorrelation is considerable ( = 0.8 and = 0.9), then RMSE of SAR-SFA is much smaller than the SFA. Interestingly, for inefficiency parameter (̂), RMSE of SAR-SFA is always smaller than the SFA. Spatial autocorrelation affects the estimate of 0 and hence the production frontier considerably. Figure 1 shows that the estimate of 0 from standard SFA tends to be biased as its expected value deviates from the true parameter (red dashed line). The bias become larger as the spatial autocorrelation increases. On the other hand, the estimate of 0 from SAR-SFA tends to be unbiased. The estimate of 0 from SAR-SFA model has higher probability to be closed to the true parameter as the spatial autocorrelation increases. The estimate of 0 in stochastic frontier model is important since it reflects the technology index. The estimate of 0 determines the position of production frontier and hence affects the technical efficiency scores.   Figure 2 depicts the comparison of production frontier estimates between SFA and SAR-SFA for various . If spatial autocorrelation is insignificant, then SFA and SAR-SFA frontiers are not much different. Both are close but below the true frontier line. If spatial autocorrelation is highly significant, that is if = 0.8 or = 0.9, then the deviance of SFA frontier to the true frontier are larger. Meanwhile, SAR-SFA frontier is closer to the true frontier. This affects the technical efficiency score of each DMUs. All DMUs are in SFA frontier line if spatial autocorrelation is highly significant. As a result, all DMUs are recorded as fully efficient if standard SFA is employed. Conversely, SAR-SFA frontiers are always above the DMUs.

Simulation Results
The distribution of technical efficiency scores of SFA and SAR-SFA for any degrees of spatial autocorrelation ( ) are presented in Figure 3. For the range of from 0.1 to 0.6, the technical efficiency scores of SFA and SAR-SFA are not substantial different in median and interquartile range. For = 0.7, the interquartile range of SFA is much smaller than the SAR-SFA, but with a slightly different in median. If spatial autocorrelation is very high, represented by = 0.8 and = 0.9, then the difference of technical efficiency scores between SFA and SAR-SFA are large. SFA produces high technical efficiency scores for all DMUs and hence are recorded as fully efficient. As a result, the median of technical efficiency scores of SFA is larger than the SAR-SFA, while the interquartile range of SFA is smaller than the SAR-SFA. The technical efficiency scores of SAR-SFA are relatively persistent for any degrees of spatial autocorrelation.

Application to Indonesia's Aquaculture
Spatial distribution of Indonesia's aquaculture productions is shown in Figure 4. Some provinces in the middle of Indonesia such as South Sulawesi, Central Sulawesi, Southeast Sulawesi, East Nusa Tenggara, and West Nusa Tenggara along with some provinces in Java such as West Java and East Java are the largest aquaculture producers in Indonesia. Provinces in the west side of Indonesia are generally moderate aquaculture producers, while provinces in the east of Indonesia (Papua and West Papua) are small aquaculture producers.

Figure 4. Spatial distribution of aquaculture in Indonesia
This study chooses optimal upper distance bound before calculating spatial weights using threshold method. Optimal upper distance bound is chosen when it produces smallest p-value of Moran's I test. Figure 5 presents p-value of Moran's I test for various upper distance bound. Smallest p-value is produced when upper distance bound is 1000 kilometres. Figure 6 depicts Moran's scatterplot of aquaculture production in Indonesia using optimal upper distance bound. The resulted p-value is 0.0017, thus indicates that there is significant positive spatial autocorrelation in Indonesia's aquaculture. Table 3 presents the comparison of estimation results using SFA and SAR-SFA models. Both are resulting insignificant intercept. Coefficient of number of producers (̂2) is not significant in SFA model but become significant in SAR-SFA model. Moreover, ̂2 is insignificant in SFA model but is significant in SAR-SFA model. Conversely, ̂2 is significant in SFA model but is insignificant in SAR-SFA model. This indicates that in SFA model the deviance to the frontier is caused by noise, while in SAR-SFA model the deviance to the frontier is caused by inefficiency. The estimate of spatial lag   Input coefficients in Cobb-Douglas production function could be interpreted as elasticities. However, coefficients in SAR-SFA model should be interpreted carefully since the marginal effect of inputs is a function of the SAR variable. Therefore, elasticities in SAR-SFA model are calculated from direct effect only. Table 4 presents the elasticities for each input from SFA and SAR-SFA models. In SFA model, elasticity of area of aquaculture is 0.6234 (if area of aquaculture increases in 1%, aquaculture production will increase 0.62%). Elasticity of number of aquaculture producers is 0.2933 (if number of aquaculture producers increases in 1%, aquaculture production will increase 0.29%). Therefore, in SFA model, the coefficient of return to scale is 0.9167 (decreasing return to scale). Elasticity of area of aquaculture from SAR-SFA model is lower than SFA model, that is 0.5237 (if area of aquaculture increases in 1%, aquaculture production will increase 0.52%). On the other side, elasticity of number of aquaculture producers from SAR-SFA model is higher than SFA model, that is 0.4836 (if number of aquaculture producers increases in 1%, aquaculture production will increase 0.48%). Therefore, in SAR-SFA model, the coefficient of return to scale is 1.0073 (increasing return to scale). Spatial autocorrelation affects the score of technical efficiency. Figure 7 and Figure 8 visualize the distribution of technical efficiency scores of SFA and SAR-SFA, respectively. Technical efficiencies of SFA are distributed in very high scores, while technical efficiencies of SAR-SFA are varies from low to high scores. If traditional SFA method is employed, then all aquaculture producers are recorded as fully efficient. This result is implausible since according to Ministry of Marine Affairs and Fisheries report, the productivity of aquaculture in Indonesia is still low as the majority of aquaculture producers are small scale/traditional with the limitation to capital and technology access. Conversely, SAR-SFA produces more reasonable technical efficiency scores. The average of technical efficiency scores from SAR-SFA model is 0.51. This shows that aquaculture in Indonesia is still inefficient. There is opportunity to enhance aquaculture production up to 49% by eliminating the inefficiency effects. The comparison of spatial distributions between SFA and SAR-SFA technical efficiency scores are presented in Figure 9 and Figure 10. The figures show that several provinces experience considerable shifting in their ranking after spatial dependencies are considered. The ranking of West Sumatera increases from middle quantile to higher quantile after SAR-SFA is applied. Meanwhile, the ranking of South Kalimantan decreases from middle quantile to low quantile.  The advantage of SAR-SFA is that it enables to study efficiency spillovers. Table 5 shows the relative direct, indirect and total efficiencies of aquaculture in all Indonesian provinces. The average estimate of direct relative efficiency is 0.613. The estimates of are generally higher than the , thus

Conclusion
Stochastic frontier model incorporating spatial effects has better performance than the traditional SFA, especially if spatial autocorrelation is significant. If the data has serious spatial autocorrelation, then the production frontier estimates using traditional SFA could be misleading. As a result, the resulted technical efficiency score is implausible. This issue is well handled by the spatial autoregressive stochastic frontier model (SAR-SFA). SAR-SFA provides another advantage that it enables to study efficiency spillovers. Application to Indonesia's aquaculture shows that most provinces in Indonesia are net efficiency importer. However, there are several provinces that are net efficiency exporter such as West Sumatera, Central Sulawesi, South Sulawesi, and Southeast Sulawesi. Finally, this paper recommends that stochastic frontier model incorporating spatial effect should be used to compute technical efficiency because economic activities are likely interdependent depending on their geographical proximities.