Detecting Spatial Autocorrelation for a Small Number of Areas: a practical example

Moran’s I is commonly used to detect spatial autocorrelation in spatial data. However, Moran’s I may lead to underestimating spatial dependence when used for a small number of areas. This led to the development of Modified Moran’s I, which is designed to work when there are few areas. In this paper, both methods will be presented. Many R programs enable calculating Moran’s I, but to date, none have been available for calculating Modified Moran’s I. This paper aims to present both methods and provide the R code for calculating Modified Moran’s I, with an application to a case study of dengue fever across 14 regions in Makassar, Indonesia.


Introduction
Everything with a geographic location inevitably creates a spatial pattern [1]. Spatial analysis is used to analyse data related to geographic location. Spatial data can be in the form of discrete or continuous data and consists of three types, namely: areal data (also known as lattice data), geostatistical data (also known as point-reference), and point patterns [2]. Areal data are based on administrative boundaries such as regions, districts, counties, municipalities, postcode sections, and census tracts. Geostatistical data are suitable when data are observed at point locations. When focusing on the occurrence of an event and location are random, spatial point patterns are suitable.
Moran's I statistic proposed by Moran [3] in 1950 is one of the global indexes of spatial autocorrelation which is commonly used as a first step to detect spatial dependence in data [4]. Other global indexes are the Gamma index of spatial autocorrelation [5], Geary's C [6], Getis and Ord's G [7], and Join Count Statistics [8]. Moran's I measures the strength of spatial autocorrelation with values ranging from -1 to 1 [9]. Several modified versions of Moran's I have been suggested to improve its power [10][11][12][13]. However, none of these modifications address the case when there are a small number of areas. Moran's I may lead to underestimating spatial dependence when used for a small number of areas [14,15]. This led to the development of a modified Moran's I (MMI) [14] which is designed to work when there are few areas.
where and represent the attribute values at areas i and k, n is the number of areas, ̅ is the average of the y values over the n areas, and is the i-k th element of the spatial weights matrix, W, which measures the spatial dependence between areas i and k [9, 16]. Moran's I formula can be written in notation matrix as follows: = , then Moran's I simplifies to:

Modified Moran's I (MMI)
A modified Moran's I (MMI) that can detect spatial dependence even for very few areas was proposed by Carrijo and Da Silva [14] and is given as follows: (2) or in matrix notation: is the row-standardised spatial weights matrix transposed. Based on equations (1) and (2), we can see that the difference between Moran's I and MMI is that in MMI formulae (equation 2), both the numerator and denominator consider neighbouring structure, while in Moran's I only the numerator considers the neighbouring structure. Another difference is that Moran's I is asymptotically normally distributed [17] while MMI is Student's t distributed [14].

Spatial neighbouring matrix
For areal data, the simplest spatial weights matrix is defined by the binary neighbourhood matrix as follows [16]: 1 if areas and share a boundary 0 otherwise.
For point referenced data, other criteria are used to determine whether points are neighbours. The resulting matrix is necessarily symmetric, that is = , and the constraint = 0 is imposed. The concept of a neighbourhood matrix is important in exploring areal data. Different forms of spatial neighbourhood matrices include rook contiguity (common border), bishop contiguity (common vertex), and queen contiguity (common border and vertex) [9]. Explanation visual illustration of these spatial proximities is given as follows:

Modified Moran's I ( )
By using the binary neighbourhood structure and queen contiguity, a column-standardised weight matrix ( ) and are given as follows:  The following code calculates the MMI in R.
Code 3. R code for calculating MMI # Compute modified Moran's I Z<-(y-mean(y)) Zw <-t(Wr) %*% y -mean(y) a <-t(Z) %*% Zw b <-sqrt(t(Z) %*% Z) c <-sqrt(t(Zw) %*% Zw) I.mod <-(a/(b*c)) I.mod Result is as follows: Using the specified weights matrix above, the value of Moran's I is negative (-0.44) while modified Moran's I (MMI) is positive (≈1). The latter statistic provides a much better indication of the spatial autocorrelation of this data, whereas Moran's I indicates that the spatial autocorrelation is negative and not very strong in magnitudean apparent contradiction to that which is readily observable [14].

Application on Dengue Fever dataset
The values of Moran's I and MMI using a dengue fever dataset in Makassar from 2002 and 2017 are provided below.  Table 1 shows that Moran's I is always smaller in magnitude than the MMI for each year. The largest difference between these two statistics is observed in 2013, where the value of Moran's I result is negative (-0.1) while the MMI value is positive (0.2). However, neither of these statistics suggests there is substantive spatial autocorrelation present in any year.

Conclusion
Moran's I can be misleading when there are a small number of areas. This can potentially affect the magnitude of the spatial autocorrelation and even the sign (negative/positive). This can indicate weak negative autocorrelation when the data has substantive positive clustering, for example. MMI is a valuable approach to obtain a more accurate measure of spatial autocorrelation when there are a small number of areas. The availability of the R code for calculating MMI is beneficial for researchers, especially for spatial modellers, and we hope this encourages greater usage of MMI.