Kriging and Conditional Simulation
In this lab we will consider how to answer questions of the type, "How
much of the study region has lead concentrations above 80 ppb?"
This question depends on the value of the map everywhere, not
just at one point. Thus, kriging is inappropriate for
considering this question.
Load the following libraries: gstat, RColorBrewer, RandomFields and
lattice.
Now load the jura data and create spatial data frames our of the data.
##########################################
# Load and summarize the raw datadata
data(jura)
# Turn categorical data into factors.
jura.pred$Landuse <- factor(prediction.dat$Landuse,
labels=levels(juragrid.dat$Landuse))
jura.pred$Rock <- factor(prediction.dat$Rock,
labels=levels(juragrid.dat$Rock))
coordinates(jura.pred) <- ~Xloc + Yloc
summary(jura.pred)
jura.grid$Landuse <- factor(juragrid.dat$Landuse,
labels=levels(juragrid.dat$Landuse))
jura.grid$Rock <- factor(juragrid.dat$Rock,
labels=levels(juragrid.dat$Rock))
coordinates(jura.grid) <- ~Xloc + Yloc
summary(jura.grid)
gridded(jura.grid) = TRUE
# Create a color scale that we will use across maps for visual
consistency
# Create color scale, from 0 in increments of 25 ppb
col.breaks <- seq(from=0,by=25,length=8)
col.breaks[8] <- 300 # The last class goes from 150 to 300
##########################################
Exploratory Statistics.
Plot the raw data:
##############################
# Plot the data
locs <- coordinates(jura.pred)
Z <- jura.pred$Pb
scatter.color(locs[,1],locs[,2],Z,"YlGnBu",col.breaks,legend=T)
# Histogram of the data
hist(Z)
# The data are clearly nonGaussian
hist(log(Z))
########################
We are in a tough position here, the data are clearly nonGaussian, and
are perhaps lognormal.
We could proceed as if the data are normal, lognormal, or
unspecified. We will procede first with the untransformed
data.
Aside:
Note, there is an interesting spatial sampling system here.
There is a square grid (oriented approximatly 20 degrees East
of North) of 107 grid nodes with spacing of 250 meters. 38
nodes were then selected (via a stratified random sample) for clustered
sampling. From each of these nodes, a second point was
selected 100 meters away, followed by a third point 40 meters from the
second, a fourth point 16 meters from the third, and a fifth point 6
meters from the fourth. The direction of these points was
randomly chosen.
Their are two main reasons for designing a sample like this: 1) the
initial grid ensures fairly uniform accuracy of the kriging map, and 2)
the purposeful creation of points at many short, and intermediate
distances allows for more accurate estimation of the semivariogram at
short and intermediate distances.
Simulation with Gaussian data
We will now calculate the variogram using the tools in the gstat
library. I will fit the variogram automatically, but I must first fit
it by eye to provide a useful starting point.
#######################################################
# Gaussian kriging
# Create a gstat object of lead data
Pb.gstat <- gstat(id="Pb",formula=Pb~1, data=jura.pred,
nmax=50,nmin = 20)
# Calculate variogram
Pb.var <- variogram(Pb.gstat,width = .05)
plot(Pb.var)
fit.eye <- vgm(400,"Sph",.8,400)
# Fit using weighted least squares (N/h^2)
Pb.fit.WLS <- fit.variogram(Pb.var,fit.eye)
# Fit using s.d. as weights
Pb.fit.sd <- fit.variogram(Pb.var,fit.eye,fit.method=2)
plot(Pb.var,Pb.fit.sd)
Pb.gstat<- gstat(Pb.gstat,model=Pb.fit.sd,fill.all=T)
# Krige these data
# Now Krige the data
Pb.krige <- predict.gstat(Pb.gstat,jura.grid,nsim=0)
# Now plot the data
plot.OK <-
spplot(Pb.krige[1],at=col.breaks,col.regions=brewer.pal(7, "YlGnBu"),
main = "Ordinary Kriging Prediction")
plot(plot.OK)
# Simulate maps
n.sim = 250
Pb.cs <- predict.gstat(Pb.gstat,jura.grid,nsim=n.sim)
# Now plot four realizations
spplot(Pb.cs[1:4],at=col.breaks,col.regions=brewer.pal(7, "YlGnBu"))
#################################################################
Question:
Comment on the similarities and difference between the kriging map and
the conditional simulations.
Suppose that we are interested in the probability that lead exceeds
100 ppb (This amount of lead in the soil is considered unhealthy by the
EPA).
############################################
# Calculate pct of map above 100 ppb
# Turn the simulation data into a 5957 x n.sim matrix
sim.matrix <- matrix(rep(0,5957*n.sim),nrow=5957,ncol=n.sim) #
Create empty matrix
for (i in 1:n.sim){ sim.matrix[,i] <- Pb.cs[[i]]} #
Turn each map into a column of sim.matrix
pct.100 <- apply(sim.matrix>100,2,mean)
mean(pct.100)
hist(pct.100)
quantile(pct.100,c(.025,.95)) # 95% confidence intervals
#####################################
# Now compare the mean of the kriging simulations with the kriging map
########################################
# Calculate map of pixel means across all simulations
# Since each map is one column, we can calculate pixel means by
averaging across rows
map.mean <-
data.frame(mean=apply(sim.matrix,1,mean),x=jura.grid$Xloc,y=jura.grid$Yloc)
gridded(map.mean)<-~x+y
plot.mean <-
spplot(map.mean,zcol="mean",at=col.breaks,col.regions=brewer.pal(7,
"YlGnBu"),
main = "Mean of conditional simulations")
plot(plot.OK, split=c(1,1,2,1), more=T)
plot(plot.mean, split=c(2,1,2,1))
##############################
Question:
Compare
the kriging map with the mean across all the conditional simulations.
What does this tell you about the relationsip between kriging
and
conditional simulations.
Simulation with the normal score transform
One workaround for
when the data are not normal is to transform the data to
look normal. The normal score transform works as
follows.
1) Rank the data
2) Assign to the ranks "normal scores," i.e. quantiles from the normal
distribution.
Example, if you have 10 data, they are ranked and assigned the
.05,.15,...,.95 percentiles of the normal distribution.
3) Conduct your analysis using the normal scores
4) Back transform from the normal scores to the original histogram.
#######################################
# Normal Score kriging of the data
temp.Z <- Z+1e-7 *rnorm(length(Z)) #break any ties
with small random numbers.
nZ <- qnorm(length(Z))[rank(temp.Z)] # Create normal score
transform
hist(nZ) # This should appear normal
jura.pred$ns.Pb <- nZ
nsPb.gstat <- gstat(id="ns.Pb", formula=ns.Pb~1,
data=jura.pred,nmin=20,nmax=50)
# Calculate variogram
nsPb.var <- variogram(nsPb.gstat,width = .075)
plot(nsPb.var)
fit.eye <- vgm(.5,"Sph",.8,.5)
# Fit using s.d. as weights
nsPb.fit.sd <- fit.variogram(nsPb.var,fit.eye,fit.method=2)
plot(nsPb.var,nsPb.fit.sd)
#Update the gstat object with the model
nsPb.gstat<- gstat(nsPb.gstat,model=nsPb.fit.sd,fill.all=T)
# Krige these data
nsPb.krige <- predict.gstat(nsPb.gstat,jura.grid,nsim=0)
#Back transform these data
nsPb.krige[[1]] <-
approx(sort(nZ),sort(temp.Z),nsPb.krige[[1]],rule=2)[[2]]
plot.ns <-
spplot(nsPb.krige[1],at=col.breaks,col.regions=brewer.pal(7, "YlGnBu"),
main="Normal Score Prediction")
# Plot the OK and normal score kriging maps
print(plot.OK, split=c(1,1,2,1),more=T)
print(plot.ns, split=c(2,1,2,1))
# Now simulate the data
nsPb.cs <- predict.gstat(nsPb.gstat,jura.grid,nsim=n.sim,)
nsPb.cs2 <- nsPb.cs
# Back transform data
for (i in 1:n.sim) {nsPb.cs2[[i]] <-
approx(sort(nZ),sort(temp.Z),nsPb.cs2[[i]],rule=2)[[2]]}
# Now plot four realizations
spplot(nsPb.cs2[1:4],at=col.breaks,col.regions=brewer.pal(7, "YlGnBu"))
# Calculate pct of map above 100 ppb
# Turn simulation data into a 5957 x n.sim matrix
# Create an empty matrix
nssim.matrix <- matrix(rep(0,5957*n.sim),nrow=5957,ncol=n.sim)
# Put each map into a column of nssim.matrix
for (i in 1:n.sim){ nssim.matrix[,i] <- nsPb.cs2[[i]]}
nspct.100 <- apply(nssim.matrix>100,2,mean)
#############################################
Questions
Compare the simulations of the normal score kriging with those from
Ordinary Kriging.
Compare
1) the empirical semivariogram of the OK kriging map and the model
used, 2) the empirical semivariogram of one of the OK simulations
with the semivariogram model used, and 3) the semivariogram of one of
thenormal score simulations with the semivariogram model used.
Comment on the similarities and differences.
To plot the semivariogram of, for example, the second conditional
simulation, use
##############################################
# Plot semivariogram of conditional simulation
sim.var <-
variogram(sim2~1,~Xloc+Yloc,data=as.data.frame(Pb.cs[2]),width=.01)
plot(sim.var,Pb.fit.sd,ylim=c(0,1000))
##########################
Questions
Using
qqplot, compare 1) the distribution of one of the normal score
simulations and the original Z data, with 2) the distribution of one of
the OK simulations and the original Z data. Plot these side
by
side. Comment on these plots
Comment on the data
attributes that are preserved by OK simulation and normal score
simulation. (Hint, what semivariogram is modeled in normal
score
simulation, it isn't obvious)
Compare the expected proportion of
the map above 100 ppb from the two simulation methods. Plot
histograms of the two percentages on the same page (Use
par(mfrow=c(2,1)).
Simulation with Indicator Kriging (IK)
Correlation is a measure of similarity. In the normal score
transform, one is implicitly measuring similarity not of the original
data values, but of the ranks. Many researchers may be
uncomfortable with such a radical reinterpretation of the data.
Indicator Kriging is often a viable candidate. In
indicator kriging, one recodes the data to be zero is they
are below a certain value, and one if above. In our case, we
are interested in predicting how much of the map is above 100 ppb.
We don't care about other values, just this one threshold, so
indicator kriging is an option here.
###########################
# Indicator kriging of the data
# Create an indicator variable
jura.pred$i.Pb <- jura.pred$Pb>100
iPb.gstat <- gstat(id="i.Pb", formula=i.Pb~1,
data=jura.pred,nmin=20,nmax=50)
##############################
Do the variography on your own.
#################################
iPb.krige <-
predict.gstat(iPb.gstat,jura.grid,nsim=0,indicator=T)
# Plot the kriging map
spplot(iPb.krige[1],at=seq(0,by=.2,1),col.regions=brewer.pal(5,
"YlGnBu"))
# Now simulate the data
iPb.cs <-
predict.gstat(iPb.gstat,jura.grid,nsim=n.sim,indicator=T)
# Now plot four realizations
spplot(iPb.cs[1:2],at=c(0,.5,1),col.regions=brewer.pal(2, "YlGnBu"))
# This will return a warning about wanting three colors, not two
# Calculate pct of map above 100
# Turn simulation data into a 5957 x n.sim matrix
isim.matrix <- matrix(rep(0,5957*n.sim),nrow=5957,ncol=n.sim)
for (i in 1:n.sim){ isim.matrix[,i] <- iPb.cs[[i]]}
ipct.100 <- apply(isim.matrix,2,mean)
##################################
Question
Comment on the differences and similarities between the underlying
assumptions and the final results of these three methods.