Spatial autocorrelation

Lets start off with clarifying two terms that are easy to get confused by, autocorrelation and autoregression. Time series autocorrelation is where the error terms are correlated across time. In other words, past errors in the model affects present outcome. This violates the assumption of homoscedasticity needed for BLUE resulting in biased standard errors. Biased standard errors are bad because you cannot say for certain whether an independent variable’s effect on the dependent variable is statically valid or not. An example of time series autocorrelations is when past forecasting errors affect the present value of the dependent variable. If a system exhibits memory this can also lead to autocorrelations. Autoregession is a means to correct for this bias by regressing a variable on lagged values of itself.

Spatial autocorrelation is another type of autocorrelation but instead of spanning time it spans space.  If a variable is correlated with itself through space it is said to be spatial autocorrelation.  This can be due to misspecification of the model, measurement bias or many other reasons.  Another term for spatial autocorrelation is spatially dependent errors. Moran I and Geary C tests are the most commonly used to detect spatial autocorrelation.  Another example is when an area affect near by regions. For example, imagine a high crime neighborhood.  The surrounding areas should also exhibit a higher than average crime rate due to spillover effects.  This spillover effect will degrade the further away you move from the crime epicenter.  The mechanics of the spatial autocorrelation in this example could be transportation routes or poor police coverage that extend the crime to outlying areas. An example could be large shopping centers on the number of stores. Stores could be concentrated at the shopping center and outline areas may be devoid of stores following Hotelling s Law.

Spatial Autoregression (SAR) models correct for spatial autocorrelation by adding surrounding territories dependent variables (referred to as spatial lagged values of the dependent variable) as regressors. If you were modeling crime across zip codes, you would include the crime rate of near-by zip codes for each zip code as an independent variable. This is similar to how Time series Autoregression models add lagged values of the dependent variables as regressors.

Further Reading

Hunter College

Canada Forestry Service

Cornell U

North Carolina State University

University of West Alabama