Spatial Lattice Models
Download
all the files named NY8_utm.* as well as the file NY_nb.gal in
http://www.colorado.edu/geography/class_homepages/geog_4023_s09/NY_data/
to your computer and change the path in Matlab to the same location.
The
NY data represent the number of cases of leukemia (of all types) ni an
eight county region of upstate New York for the years 1978-1980.
Available ancillary data include the Population for each census
tract in 1980, the population percentage over 65 years of age, and the
distance of each tract centroid to a number of waste sites containing
trichloroethane (TCE) (the subject of the movie A Civil Action
starrring John Travolta and Robert Duvall)
Global Moran's I
###########################################################
###########################################################
library(spdep)
library(rgdal)
library(RColorBrewer)
# Load the shapefile
NY <- readOGR(".", "NY8_utm18") #readOGR is in rgdal
# Neighbors given by author
NY_nb <- read.gal("NY_nb.gal", region.id=row.names(as(NY, "data.frame")))
summary(NY_nb)
# Plot boundaries and edges
par(mar=c(3,3,3,1)+0.1) # Make margins smaller
plot(NY, border="grey60", axes=TRUE)
title('Spatial Connectivity')
plot(NY_nb, coordinates(NY), pch=19, cex=0.6, add=TRUE)
# Construct row-standardized spatial weights
NY_W <- nb2listw(NY_nb, style="W", zero.policy=TRUE)
# Construct unstandardized spatial weights
NY_B <- nb2listw(NY_nb, style="B", zero.policy=TRUE)
###########
# Calculate Moran's I of cases
moran(NY$Cases, NY_W, n=length(NY$Cases), S0=Szero(NY_W))
moran(NY$Cases, NY_B, n=length(NY$Cases), S0=Szero(NY_B))
###########################################################
###########################################################
While
the Moran's I measures autocorrelation, it is difficult to interpret
the actual level. The range of allowable Moran's I statistics
depends on the W matrix used, and more specifically on the
"eigenvalues" of the weight matrix. If you don't know what an
eigvenvalue is, that is OK, just use the following code and take
my word for it.
###########################################################
###########################################################
# Calculate the range of possible I statistics
1/range(eigenw(NY_W))
1/range(eigenw(NY_B))
######################
# Calculate tests assuming normality
# randomisation =T applies a correction for kurtosis
moran.test(NY$Cases, NY_W)
moran.test(NY$Cases, NY_W, randomisation=F)
moran.test(NY$Cases, NY_B)
moran.test(NY$Cases, NY_B, randomisation=F)
# Just to show that we are calculating a z-score for I
pnorm((0.110387402--0.003571429)/sqrt(0.001282228),lower.tail=F)
# This should correspond to the last moran's I p-value.
###########################################################
###########################################################
What is the general formula for calculating the p-values above?
###########################################################
###########################################################
# Calculate tests using permutation
set.seed(1234)
Wperm <- moran.mc(NY$Cases, NY_W, nsim=999)
Wperm
set.seed(1234)
Bperm <- moran.mc(NY$Cases, NY_B, nsim=999)
Bperm
###########################################################
###########################################################
Rhetorical
Question: Why do we simulate 999 and not 1000 values? We aready
have the 1000th value, the I statistic from our data. If the null
hypothesis is true, than our data are data could also have been
obtained via random permutation. Always simulate 1 less than you
need for hypothesis testing. (This is different than the
geostatistical simulations we did earlier, there we did not already
have a random field). For publication, I would do more
permulations, at least 10,000 if you have the time (and you do, if the
dataset does not have thousands of observations). Generally, do
as many replications as you can afford to (and computer power is cheap)
We
will now simulate Moran's I under the assumption that each count is a
Poisson random variable, not a Gaussian random variable. A
Poisson distribution has one parameter: the expected number of cases,
which we will calculate as the global disease rate times the local
population.
###########################################################
###########################################################
# Create a function for Moran's I of counts assuming constant risk
# Calculate global rate of disease
r <- sum(NY$Cases)/sum(NY$POP8)
# Calculate predicted number of cases in each tract
rni <- r*NY$POP8
# Function to simulate poisson RV with local risk rate rni
CR <- function(var, mle) rpois(length(var), lambda=mle)
# We will use the boot function, which will return something caled mle, but we will tell it to use the local risk rate
# Wrapper function to calculate Moran's I
MoranI.pboot <- function(data, i, listw, n, S0, ...) {
return(moran(x=data, listw=listw, n=n, S0=S0)$I)
}
set.seed(1234)
boot2 <- boot(NY$Cases, statistic=MoranI.pboot, R=999, sim="parametric",
ran.gen=CR, listw=NY_B, n=length(NY$Cases), S0=Szero(NY_B), mle=rni)
pnorm((boot2$t0 - mean(boot2$t))/sd(boot2$t), lower.tail=FALSE)
#####
# Plot the histograms of I: These are also at Figure 7.8 in the Waller and Gotway text
oopar <- par(mfrow=c(1,2))
xlim <- range(c(Bperm$res, boot2$t[,1])) # Make the range big enough for both histograms
hist(Bperm$res[-length(Bperm$res)],main="Permutation
bootstrap", xlab=expression(I[std]), xlim=xlim, density=15, angle=45,
ylim=c(0,260))
abline(v=Bperm$statistic, lty=2) #Add line at sample I
hist(boot2$t, col=rgb(0.7,0.7,0.7), main="Parametric (Poisson) bootstrap", xlab=expression(I[CR]), xlim=xlim, ylim=c(0,260))
hist(Bperm$res[-length(Bperm$res)], density=15, angle=45, add=TRUE)
abline(v=boot2$t0, lty=2) #Add line at sample I
par(oopar)
###########################################################
###########################################################
In your own words, summarize the permutation approach and the Constant Risk approach to calculating Moran's I.
What are the assumptions underlying the Null Hypothesis of each of the three ways of calculating the p-statistic.
Is there evidence for the clustering of leukemia cases?
NOTE, you will be queried for a legend in the following code.
Local Moran's I
###########################################################
###########################################################
################################
# Moran's I scatterplot
oopar <- par(mfrow=c(1,2))
msp <- moran.plot(NY$Cases, listw=nb2listw(NY_nb, style="W"), quiet=TRUE)
title("Moran scatterplot")
infl <- apply(msp$is.inf, 1, any) % find the influential statistics
x <- NY$Cases
# Cut data into "Low" and "High"
lhx <- cut(x, breaks=c(min(x), mean(x), max(x)), labels=c("L", "H"), include.lowest=TRUE)
# Calculate spatial lag
wx <- lag(nb2listw(NY_nb, style="W"), NY$Cases)
# Cut lag into "Low" and "High"
lhwx <- cut(wx, breaks=c(min(wx), mean(wx), max(wx)), labels=c("L", "H"), include.lowest=TRUE)
# Interact (L H) and (L H) to form (LL, LH, HL, HH)
lhlh <- interaction(lhx, lhwx, infl, drop=TRUE)
cols <- rep(1, length(lhlh))
cols[lhlh == "H.L.TRUE"] <- 2
cols[lhlh == "L.H.TRUE"] <- 3
cols[lhlh == "H.H.TRUE"] <- 4
# There are no L.L.TRUEs
# Create Four Color scheme
Blue4 <- brewer.pal(4,'Blues')
plot(NY, col=Blue4[cols])
legend(locator(), legend=c("None", "HL", "LH", "HH"), fill=Blue4, bty="n", cex=0.8, y.intersp=0.8)
title("Tracts with influence")
par(oopar)
###########################################################
###########################################################
###################################################
# Local Moran's I
# Standard Moran's I
lm1 <- localmoran(NY$Cases, listw=nb2listw(NY_nb, style="B"))
###############################
r <- sum(NY$Cases)/sum(NY$POP8) # Global Rate
rni <- r*NY$POP8 # Rate for each region
lw <- nb2listw(NY_nb, style="B") # W matrix
sdCR <- (NY$Cases - rni)/sqrt(rni) # Standardized CR values
wsdCR <- lag(lw, sdCR)
I_CR <- sdCR * wsdCR
###################
# Plot Standard and CR local Moran's I
# This is the left of Figure 7.10 in book (with different color scale)
NY$Standard <- lm1[,1]
NY$"Constant_risk" <- I_CR
Blue5 <- brewer.pal(5,'RdBu')
cut.points <- quantile(I_CR,c(0,.2,.4,.6,.8,1))
cut.points[6] <- max(lm1)
spplot(NY, c('Standard','Constant_risk'), at=cut.points, col.regions=Blue5)
###################################################
# Simulate local Moran's I for Constant Rate
set.seed(1234)
nsim <- 999
N <- length(rni)
sims <- matrix(0, ncol=nsim, nrow=N) # Matrix of results
for (i in 1:nsim) {
# Simulate a Poisson number with local rate
y <- rpois(N, lambda=rni)
# Standardize the simulated value
sdCRi <- (y - rni)/sqrt(rni)
# Calculate spatial lag of standardized values
wsdCRi <- lag(lw, sdCRi)
# Determine Local I (=Z times Lag Z)
sims[,i] <- sdCRi * wsdCRi
}
# Rank the results
xrank <- apply(cbind(I_CR, sims), 1, function(x) rank(x)[1])
diff <- nsim - xrank
diff <- ifelse(diff > 0, diff, 0)
# Determine percentiles
pval <- punif((diff + 1)/(nsim + 1))
###############################
NY$Standard <- lm1[,5]
NY$Constant_risk <- pval
Blue7 <- brewer.pal(7,'RdBu')
spplot(NY, c("Standard", "Constant_risk"), at=c(0,0.01,0.05,0.1,0.9,0.95,0.99,1), col.regions=Blue7)
#########################################################333
############################################################3
Comment
on the differences in the local Moran's I statistic using these
measures. Where does clustering appear to be in the data?
Repeat
the analysis using the GLOBAL moran's I using Poisson Bootstrap and for
the
Standardized Risk assuming a constant rate, i.e. Eq. 7.10 in Waller and
Gotway. Hint, you can modify the code of the local moran's I, but
instead of reporting N local values, sum them up for one global I.
Spatial Autocorrelation Models
##################################
# Spatial Autoregression
nylm <- lm(Z~PEXPOSURE+PCTAGE65P+PCTOWNHOME , data=NY)
summary(nylm)
nysar<-spautolm(Z~PEXPOSURE+PCTAGE65P+PCTOWNHOME , data=NY, listw=NY_W)
summary(nysar)
# Moran's Test
lm.morantest(nylm, NY_W)
# Repeat with population weights
nylmw <- lm(Z~PEXPOSURE+PCTAGE65P+PCTOWNHOME, data=NY, weights=POP8)
summary(nylmw)
nysarw<-spautolm(Z~PEXPOSURE+PCTAGE65P+PCTOWNHOME , data=NY, listw=NY_W, weights=POP8)
summary(nysarw)
# Moran's Test
lm.morantest(nylmw, NY_W)
################################################
################################################
Compare
the regression results. Is there evidence of spatial clustering
of residuals? Does the answer change is one controls for
population size? If so, explain why?
Calculate the p-value
using the Wald-Statistic and the likelihood ratio. Show your
calculations. Hint: to extract the log-likelihood of a model, use
the logLik function.
Now fit a spatial lag model
################################################
################################################
# Fit lag sar model
nylag <- lagsarlm(Z~PEXPOSURE+PCTAGE65P+PCTOWNHOME, data=NY, listw=NY_W)
################################################
################################################
Report both the coefficient values, and estimate the spatial multiplier effect of the coefficients.
.