Kriging and Conditional Simulation

In this lab we will consider how to answer questions of the type, "How much of the study region has lead concentrations above 80 ppb?"  This question depends on the value of the map everywhere, not just at one point.  Thus, kriging is inappropriate for considering this question.

Load the following libraries: gstat, RColorBrewer, RandomFields and lattice.

Now load the jura data and create spatial data frames our of the data.

##########################################
# Load and summarize the raw datadata
data(jura)
# Turn categorical data into factors.
jura.pred$Landuse <- factor(prediction.dat$Landuse, labels=levels(juragrid.dat$Landuse))
jura.pred$Rock <- factor(prediction.dat$Rock, labels=levels(juragrid.dat$Rock))
coordinates(jura.pred) <- ~Xloc + Yloc
summary(jura.pred)

jura.grid$Landuse <- factor(juragrid.dat$Landuse, labels=levels(juragrid.dat$Landuse))
jura.grid$Rock <- factor(juragrid.dat$Rock, labels=levels(juragrid.dat$Rock))
coordinates(jura.grid) <- ~Xloc + Yloc
summary(jura.grid)
gridded(jura.grid) = TRUE

# Create a color scale that we will use across maps for visual consistency
# Create color scale, from 0 in increments of 25 ppb
col.breaks <- seq(from=0,by=25,length=8)
col.breaks[8] <- 300 # The last class goes from 150 to 300
##########################################


Exploratory Statistics.
Plot the raw data:
##############################
# Plot the data
locs <- coordinates(jura.pred)
Z <- jura.pred$Pb
scatter.color(locs[,1],locs[,2],Z,&quot;YlGnBu&quot;,col.breaks,legend=T)

#  Histogram of the data
hist(Z)
#  The data are clearly nonGaussian
hist(log(Z))
########################


We are in a tough position here, the data are clearly nonGaussian, and are perhaps lognormal.  We could proceed as if the data are normal, lognormal, or unspecified.  We will procede first with the untransformed data.

Aside:
Note, there is an interesting spatial sampling system here.  There is a square grid (oriented approximatly 20 degrees East of North) of 107 grid nodes with spacing of 250 meters.  38 nodes were then selected (via a stratified random sample) for clustered sampling.  From each of these nodes, a second point was selected 100 meters away, followed by a third point 40 meters from the second, a fourth point 16 meters from the third, and a fifth point 6 meters from the fourth.  The direction of these points was randomly chosen.

Their are two main reasons for designing a sample like this: 1) the initial grid ensures fairly uniform accuracy of the kriging map, and 2) the purposeful creation of points at many short, and intermediate distances allows for more accurate estimation of the semivariogram at short and intermediate distances.


Simulation with Gaussian data

We will now calculate the variogram using the tools in the gstat library. I will fit the variogram automatically, but I must first fit it by eye to provide a useful starting point.


#######################################################
# Gaussian kriging
# Create a gstat object of lead data
Pb.gstat <- gstat(id=&quot;Pb&quot;,formula=Pb~1, data=jura.pred, nmax=50,nmin = 20)

# Calculate variogram
Pb.var <- variogram(Pb.gstat,width = .05)
plot(Pb.var)
fit.eye <- vgm(400,&quot;Sph&quot;,.8,400)

# Fit using weighted least squares (N/h^2)
Pb.fit.WLS <- fit.variogram(Pb.var,fit.eye)
# Fit using s.d. as weights
Pb.fit.sd <- fit.variogram(Pb.var,fit.eye,fit.method=2)
plot(Pb.var,Pb.fit.sd)

Pb.gstat<- gstat(Pb.gstat,model=Pb.fit.sd,fill.all=T)

# Krige these data
# Now Krige the data
Pb.krige <- predict.gstat(Pb.gstat,jura.grid,nsim=0)
# Now plot the data
plot.OK <- spplot(Pb.krige[1],at=col.breaks,col.regions=brewer.pal(7, "YlGnBu"),
    main = "Ordinary Kriging Prediction")
plot(plot.OK)

# Simulate maps
n.sim = 250
Pb.cs <- predict.gstat(Pb.gstat,jura.grid,nsim=n.sim)
# Now plot four realizations
spplot(Pb.cs[1:4],at=col.breaks,col.regions=brewer.pal(7, "YlGnBu"))
#################################################################


Question:
Comment on the similarities and difference between the kriging map and the conditional simulations.

Suppose that we are interested in the probability that lead exceeds 100 ppb (This amount of lead in the soil is considered unhealthy by the EPA).

############################################
# Calculate pct of map above 100 ppb
# Turn the simulation data into a 5957 x n.sim matrix
sim.matrix <- matrix(rep(0,5957*n.sim),nrow=5957,ncol=n.sim) # Create empty matrix
for (i in 1:n.sim){ sim.matrix[,i] <- Pb.cs[[i]]}  # Turn each map into a column of sim.matrix

pct.100 <- apply(sim.matrix>100,2,mean)

mean(pct.100)
hist(pct.100)
quantile(pct.100,c(.025,.95)) # 95% confidence intervals
#####################################

# Now compare the mean of the kriging simulations with the kriging map

########################################
# Calculate map of pixel means across all simulations
# Since each map is one column, we can calculate pixel means by averaging across rows
map.mean <- data.frame(mean=apply(sim.matrix,1,mean),x=jura.grid$Xloc,y=jura.grid$Yloc)

gridded(map.mean)<-~x+y
plot.mean <- spplot(map.mean,zcol=&quot;mean&quot;,at=col.breaks,col.regions=brewer.pal(7, "YlGnBu"),
    main = &quot;Mean of conditional simulations&quot;)
plot(plot.OK, split=c(1,1,2,1), more=T)
plot(plot.mean, split=c(2,1,2,1))
##############################


Question:
Compare the kriging map with the mean across all the conditional simulations.  What does this tell you about the relationsip between kriging and conditional simulations.



Simulation with the normal score transform

One workaround for when the data are not normal is to transform the data to look normal.  The normal score transform works as follows.
1) Rank the data
2) Assign to the ranks "normal scores," i.e. quantiles from the normal distribution.
Example, if you have 10 data, they are ranked and assigned the .05,.15,...,.95 percentiles of the normal distribution.
3) Conduct your analysis using the normal scores
4) Back transform from the normal scores to the original histogram.


#######################################
# Normal Score kriging of the data
temp.Z <- Z+1e-7 *rnorm(length(Z))  #break any ties with small random numbers.
nZ <- qnorm(length(Z))[rank(temp.Z)] # Create normal score transform
hist(nZ) # This should appear normal
jura.pred$ns.Pb <- nZ
nsPb.gstat <- gstat(id=&quot;ns.Pb&quot;, formula=ns.Pb~1, data=jura.pred,nmin=20,nmax=50)

# Calculate variogram
nsPb.var <- variogram(nsPb.gstat,width = .075)
plot(nsPb.var)
fit.eye <- vgm(.5,&quot;Sph&quot;,.8,.5)
# Fit using s.d. as weights
nsPb.fit.sd <- fit.variogram(nsPb.var,fit.eye,fit.method=2)
plot(nsPb.var,nsPb.fit.sd)
#Update the gstat object with the model
nsPb.gstat<- gstat(nsPb.gstat,model=nsPb.fit.sd,fill.all=T)

# Krige these data
nsPb.krige <- predict.gstat(nsPb.gstat,jura.grid,nsim=0)

#Back transform these data
nsPb.krige[[1]] <- approx(sort(nZ),sort(temp.Z),nsPb.krige[[1]],rule=2)[[2]]

plot.ns <- spplot(nsPb.krige[1],at=col.breaks,col.regions=brewer.pal(7, "YlGnBu"),
    main="Normal Score Prediction")
# Plot the OK and normal score kriging maps
print(plot.OK, split=c(1,1,2,1),more=T)
print(plot.ns, split=c(2,1,2,1))

# Now simulate the data
nsPb.cs <- predict.gstat(nsPb.gstat,jura.grid,nsim=n.sim,)
nsPb.cs2 <- nsPb.cs
# Back transform data
for (i in 1:n.sim) {nsPb.cs2[[i]] <- approx(sort(nZ),sort(temp.Z),nsPb.cs2[[i]],rule=2)[[2]]}

# Now plot four realizations
spplot(nsPb.cs2[1:4],at=col.breaks,col.regions=brewer.pal(7, "YlGnBu"))

# Calculate pct of map above 100 ppb
# Turn simulation data into a 5957 x n.sim matrix
# Create an empty matrix
nssim.matrix <- matrix(rep(0,5957*n.sim),nrow=5957,ncol=n.sim)
# Put each map into a column of nssim.matrix
for (i in 1:n.sim){ nssim.matrix[,i] <- nsPb.cs2[[i]]}

nspct.100 <- apply(nssim.matrix>100,2,mean)
#############################################



Questions
Compare the simulations of the normal score kriging with those from Ordinary Kriging.

Compare 1) the empirical semivariogram of the OK kriging map and the model used, 2) the empirical semivariogram of one of the OK simulations with the semivariogram model used, and 3) the semivariogram of one of thenormal score simulations with the semivariogram model used.  Comment on the similarities and differences.

To plot the semivariogram of, for example, the second conditional simulation, use
##############################################
# Plot semivariogram of conditional simulation
sim.var <- variogram(sim2~1,~Xloc+Yloc,data=as.data.frame(Pb.cs[2]),width=.01)
plot(sim.var,Pb.fit.sd,ylim=c(0,1000))

##########################


Questions
Using qqplot, compare 1) the distribution of one of the normal score simulations and the original Z data, with 2) the distribution of one of the OK simulations and the original Z data.  Plot these side by side.  Comment on these plots

Comment on the data attributes that are preserved by OK simulation and normal score simulation.  (Hint, what semivariogram is modeled in normal score simulation, it isn't obvious)

Compare the expected proportion of the map above 100 ppb from the two simulation methods.  Plot histograms of the two percentages on the same page (Use par(mfrow=c(2,1)).

Simulation with Indicator Kriging (IK)


Correlation is a measure of similarity.  In the normal score transform, one is implicitly measuring similarity not of the original data values, but of the ranks.  Many researchers may be uncomfortable with such a radical reinterpretation of the data.  Indicator Kriging is often a viable candidate.  In indicator kriging, one recodes  the data to be zero is they are below a certain value, and one if above.  In our case, we are interested in predicting how much of the map is above 100 ppb.  We don't care about other values, just this one threshold, so indicator kriging is an option here.

###########################
# Indicator kriging of the data
# Create an indicator variable
jura.pred$i.Pb <- jura.pred$Pb>100
iPb.gstat <- gstat(id=&quot;i.Pb&quot;, formula=i.Pb~1, data=jura.pred,nmin=20,nmax=50)

##############################


Do the variography on your own.

#################################
iPb.krige <- predict.gstat(iPb.gstat,jura.grid,nsim=0,indicator=T)
# Plot the kriging map
spplot(iPb.krige[1],at=seq(0,by=.2,1),col.regions=brewer.pal(5, "YlGnBu"))

# Now simulate the data
iPb.cs <- predict.gstat(iPb.gstat,jura.grid,nsim=n.sim,indicator=T)
# Now plot four realizations
spplot(iPb.cs[1:2],at=c(0,.5,1),col.regions=brewer.pal(2, "YlGnBu"))
# This will return a warning about wanting three colors, not two

# Calculate pct of map above 100
# Turn simulation data into a 5957 x n.sim matrix
isim.matrix <- matrix(rep(0,5957*n.sim),nrow=5957,ncol=n.sim)
for (i in 1:n.sim){ isim.matrix[,i] <- iPb.cs[[i]]}

ipct.100 <- apply(isim.matrix,2,mean)

##################################



Question
Comment on the differences and similarities between the underlying assumptions and the final results of these three methods.