Spatial Lattice Models

Download all the  files named NY8_utm.* as well as the file NY_nb.gal in http://www.colorado.edu/geography/class_homepages/geog_4023_s09/NY_data/ to your computer and change the path in Matlab to the same location.

The NY data represent the number of cases of leukemia (of all types) ni an eight county region of upstate New York for the years 1978-1980.  Available ancillary data include the Population for each census tract in 1980, the population percentage over 65 years of age, and the distance of each tract centroid to a number of waste sites containing trichloroethane (TCE) (the subject of the movie A Civil Action starrring John Travolta and Robert Duvall)

Global Moran's I


###########################################################
###########################################################
library(spdep)
library(rgdal)
library(RColorBrewer)

# Load the shapefile
NY <- readOGR(".", "NY8_utm18") #readOGR is in rgdal
# Neighbors given by author
NY_nb <- read.gal("NY_nb.gal", region.id=row.names(as(NY, "data.frame")))
summary(NY_nb)

# Plot boundaries and edges
par(mar=c(3,3,3,1)+0.1) # Make margins smaller
plot(NY, border="grey60", axes=TRUE)
title('Spatial Connectivity')
plot(NY_nb, coordinates(NY), pch=19, cex=0.6, add=TRUE)

# Construct row-standardized spatial weights
NY_W <- nb2listw(NY_nb, style="W", zero.policy=TRUE)
# Construct unstandardized spatial weights
NY_B <- nb2listw(NY_nb, style="B", zero.policy=TRUE)


###########
# Calculate Moran's I of cases
moran(NY$Cases, NY_W, n=length(NY$Cases), S0=Szero(NY_W))
moran(NY$Cases, NY_B, n=length(NY$Cases), S0=Szero(NY_B))
###########################################################
###########################################################


While the Moran's I measures autocorrelation, it is difficult to interpret the actual level.  The range of allowable Moran's I statistics depends on the W matrix used, and more specifically on the "eigenvalues" of the weight matrix.  If you don't know what an eigvenvalue is, that is OK,  just use the following code and take my word for it.


###########################################################
###########################################################
# Calculate the range of possible I statistics
1/range(eigenw(NY_W))
1/range(eigenw(NY_B))

######################
# Calculate tests assuming normality
# randomisation =T applies a correction for kurtosis
moran.test(NY$Cases, NY_W)
moran.test(NY$Cases, NY_W, randomisation=F)
moran.test(NY$Cases, NY_B)
moran.test(NY$Cases, NY_B, randomisation=F)
# Just to show that we are calculating a z-score for I
pnorm((0.110387402--0.003571429)/sqrt(0.001282228),lower.tail=F)
# This should correspond to the last moran's I p-value.

###########################################################
###########################################################

What is the general formula for calculating the p-values above?


###########################################################
###########################################################
# Calculate tests using permutation
set.seed(1234)
Wperm <- moran.mc(NY$Cases, NY_W, nsim=999)
Wperm

set.seed(1234)
Bperm <- moran.mc(NY$Cases, NY_B, nsim=999)
Bperm


###########################################################
###########################################################

Rhetorical Question: Why do we simulate 999 and not 1000 values?  We aready have the 1000th value, the I statistic from our data.  If the null hypothesis is true, than our data are data could also have been obtained via random permutation.  Always simulate 1 less than you need for hypothesis testing.  (This is different than the geostatistical simulations we did earlier, there we did not already have a random field).  For publication, I would do more permulations, at least 10,000 if you have the time (and you do, if the dataset does not have thousands of observations).  Generally, do as many replications as you can afford to (and computer power is cheap)

We will now simulate Moran's I under the assumption that each count is a Poisson random variable, not a Gaussian random variable.  A Poisson distribution has one parameter: the expected number of cases, which we will calculate as the global disease rate times the local population.

 
###########################################################
###########################################################

# Create a function for  Moran's I of counts assuming constant risk
# Calculate global rate of disease
r <- sum(NY$Cases)/sum(NY$POP8)
# Calculate predicted number of cases in each tract
rni <- r*NY$POP8
# Function to simulate poisson RV with local risk rate rni
CR <- function(var, mle) rpois(length(var), lambda=mle)
# We will use the boot function, which will return something caled mle, but we will tell it to use the local risk rate
# Wrapper function to calculate Moran's I
MoranI.pboot <- function(data, i, listw, n, S0, ...) {
  return(moran(x=data, listw=listw, n=n, S0=S0)$I)
}
set.seed(1234)

boot2 <- boot(NY$Cases, statistic=MoranI.pboot, R=999, sim="parametric",
  ran.gen=CR, listw=NY_B, n=length(NY$Cases), S0=Szero(NY_B), mle=rni)

pnorm((boot2$t0 - mean(boot2$t))/sd(boot2$t), lower.tail=FALSE)


#####
# Plot the histograms of I: These are also at Figure 7.8 in the Waller and Gotway text
oopar <- par(mfrow=c(1,2))
xlim <- range(c(Bperm$res, boot2$t[,1])) # Make the range big enough for both histograms
hist(Bperm$res[-length(Bperm$res)],main="Permutation bootstrap", xlab=expression(I[std]), xlim=xlim, density=15, angle=45, ylim=c(0,260))
abline(v=Bperm$statistic, lty=2) #Add line at sample I
hist(boot2$t, col=rgb(0.7,0.7,0.7), main="Parametric (Poisson) bootstrap", xlab=expression(I[CR]), xlim=xlim, ylim=c(0,260))
hist(Bperm$res[-length(Bperm$res)], density=15, angle=45, add=TRUE)
abline(v=boot2$t0, lty=2) #Add line at sample I
par(oopar)

###########################################################
###########################################################

In your own words, summarize the permutation approach and the Constant Risk approach to calculating Moran's I.
What are the assumptions underlying the Null Hypothesis of each of the three ways of calculating the p-statistic.
Is there evidence for the clustering of leukemia cases?


NOTE, you will be queried for a legend in the following code.


Local Moran's I

###########################################################
###########################################################
################################
# Moran's I scatterplot
oopar <- par(mfrow=c(1,2))

msp <- moran.plot(NY$Cases, listw=nb2listw(NY_nb, style="W"), quiet=TRUE)
title("Moran scatterplot")


infl <- apply(msp$is.inf, 1, any) % find the influential statistics
x <- NY$Cases
# Cut data into "Low" and "High"
lhx <- cut(x, breaks=c(min(x), mean(x), max(x)), labels=c("L", "H"), include.lowest=TRUE)
# Calculate spatial lag
wx <- lag(nb2listw(NY_nb, style="W"), NY$Cases)
# Cut lag into "Low" and "High"
lhwx <- cut(wx, breaks=c(min(wx), mean(wx), max(wx)), labels=c("L", "H"), include.lowest=TRUE)
# Interact (L H) and (L H) to form (LL, LH, HL, HH)
lhlh <- interaction(lhx, lhwx, infl, drop=TRUE)
cols <- rep(1, length(lhlh))
cols[lhlh == "H.L.TRUE"] <- 2
cols[lhlh == "L.H.TRUE"] <- 3
cols[lhlh == "H.H.TRUE"] <- 4
# There are no L.L.TRUEs
# Create Four Color scheme
Blue4 <- brewer.pal(4,'Blues')
plot(NY, col=Blue4[cols])
legend(locator(), legend=c("None", "HL", "LH", "HH"), fill=Blue4, bty="n", cex=0.8, y.intersp=0.8)
title("Tracts with influence")
par(oopar)

###########################################################
###########################################################

###################################################
# Local Moran's I
# Standard Moran's I
lm1 <- localmoran(NY$Cases, listw=nb2listw(NY_nb, style="B"))

###############################
r <- sum(NY$Cases)/sum(NY$POP8)     # Global Rate
rni <- r*NY$POP8                 # Rate for each region
lw <- nb2listw(NY_nb, style="B")     # W matrix
sdCR <- (NY$Cases - rni)/sqrt(rni)    # Standardized CR values
wsdCR <- lag(lw, sdCR)
I_CR <- sdCR * wsdCR


###################
# Plot Standard and CR local Moran's I
# This is the left of Figure 7.10 in book (with different color scale)
NY$Standard <- lm1[,1]
NY$"Constant_risk" <- I_CR
Blue5 <- brewer.pal(5,'RdBu')
cut.points <- quantile(I_CR,c(0,.2,.4,.6,.8,1))
cut.points[6] <- max(lm1)
spplot(NY, c('Standard','Constant_risk'), at=cut.points, col.regions=Blue5)



###################################################
# Simulate local Moran's I for Constant Rate
set.seed(1234)
nsim <- 999
N <- length(rni)
sims <- matrix(0, ncol=nsim, nrow=N)  # Matrix of results
for (i in 1:nsim) {
  # Simulate a Poisson number with local rate
  y <- rpois(N, lambda=rni) 
  # Standardize the simulated value
  sdCRi <- (y - rni)/sqrt(rni)
  # Calculate spatial lag of standardized values
  wsdCRi <- lag(lw, sdCRi)
  # Determine Local I (=Z times Lag Z)
  sims[,i] <- sdCRi * wsdCRi
}
# Rank the results
xrank <- apply(cbind(I_CR, sims), 1, function(x) rank(x)[1])
diff <- nsim - xrank
diff <- ifelse(diff > 0, diff, 0)
# Determine percentiles
pval <- punif((diff + 1)/(nsim + 1))


###############################


NY$Standard <- lm1[,5]
NY$Constant_risk <- pval
Blue7 <- brewer.pal(7,'RdBu')
spplot(NY, c("Standard", "Constant_risk"), at=c(0,0.01,0.05,0.1,0.9,0.95,0.99,1), col.regions=Blue7)

#########################################################333
############################################################3
Comment on the differences in the local Moran's I statistic using these measures.  Where does clustering appear to be in the data?

Repeat the analysis using the GLOBAL moran's I using Poisson Bootstrap and for the Standardized Risk assuming a constant rate, i.e. Eq. 7.10 in Waller and Gotway.  Hint, you can modify the code of the local moran's I, but instead of reporting N local values, sum them up for one global I.


Spatial Autocorrelation Models


##################################
# Spatial Autoregression
nylm <- lm(Z~PEXPOSURE+PCTAGE65P+PCTOWNHOME , data=NY)
summary(nylm)
nysar<-spautolm(Z~PEXPOSURE+PCTAGE65P+PCTOWNHOME , data=NY, listw=NY_W)
summary(nysar)
# Moran's Test
lm.morantest(nylm, NY_W)


# Repeat with population weights
nylmw <- lm(Z~PEXPOSURE+PCTAGE65P+PCTOWNHOME, data=NY, weights=POP8)
summary(nylmw)
nysarw<-spautolm(Z~PEXPOSURE+PCTAGE65P+PCTOWNHOME , data=NY, listw=NY_W, weights=POP8)
summary(nysarw)
# Moran's Test
lm.morantest(nylmw, NY_W)

################################################
################################################

Compare the regression results.  Is there evidence of spatial clustering of residuals?  Does the answer change is one controls for population size?  If so, explain why?

Calculate the p-value using the Wald-Statistic and the likelihood ratio.  Show your calculations.  Hint: to extract the log-likelihood of a model, use the logLik function.

Now fit a spatial lag model
################################################
################################################
# Fit lag sar model
nylag <- lagsarlm(Z~PEXPOSURE+PCTAGE65P+PCTOWNHOME, data=NY, listw=NY_W)
################################################
################################################

Report both the coefficient values, and estimate the spatial multiplier effect of the coefficients.

.