These data represent average December-February sea surface temperatures
for a portion of the equatorial Pacific Ocean, for the fifty years from
1950-1951 through 1999-2000.
library(lattice)
load('Pac_Temp.Rdata')
grid <- expand.grid(x=lon,y=lat)
grid$Temp <- Temp
p1 <- levelplot(Temp[,1]~x+y,grid,aspect='iso',main='1951')
p2 <- levelplot(Temp[,2]~x+y,grid,aspect='iso',main='1952')
p3 <- levelplot(Temp[,3]~x+y,grid,aspect='iso',main='1953')
p4 <- levelplot(Temp[,4]~x+y,grid,aspect='iso',main='1954')
print(p1,position=c(0,.5,.5,1),more=T)
print(p2,position=c(.5,.5,1,1),more=T)
print(p3,position=c(0,0,.5,.5),more=T)
print(p4,position=c(.5,0,1,.5))
# Remove time average from each row (each map)
T.demean <- sweep(Temp,1,apply(Temp,1,mean))
# T = U*S*V'
T.svd <- svd(T.demean)
dim(T.svd$u) #U values
length(T.svd$d) # singular values (square to get eigen values)
dim(T.svd$v) #eigen values
sum(T.demean^2) # N*variance of data
sum(T.svd$d^2) #Also N*variance of data
The
variance of each component is the square of the singular value d.
How many principal components are there? What fraction of
the variance is in the first PC? The first 5 PC's?
cov(T.svd$v[,1:5])*49
# Note that N=50. I multiplied by (N-1) because this is the demoninator of sample covariance.
What are the variance and covariance of the eigen vectors v?
# Map the first 2 components
grid$eof <- T.svd$u
e1 <- levelplot(eof[,1]~x+y,grid,aspect='iso',main='EOF 1')
e2 <- levelplot(eof[,2]~x+y,grid,aspect='iso',main='EOF 2')
print(e1,position=c(0,.5,1,1),more=T)
print(e2,position=c(0,0,1,.5))
# Plot the time series of the first two components
Time1 <- T.svd$v[,1]*sqrt(50)
Time2 <- T.svd$v[,2]*sqrt(50)
par(mfrow=c(2,1))
plot(seq(from=1951,length=50),Time1,type='l',xlab = 'Year', ylab='Temp',main='Time Series or EOF 1')
plot(seq(from=1951,length=50),Time2,type='l',xlab = 'Year', ylab='Temp',main='Time Series or EOF 2')
Calculate a smoothed spectral periodogram for the first two time series. Can you interpret the first EOF?
Compare
the first two time series. Can you interpret the second Time
Series relative to the first? Hint, what is the correlation
between these two time series if you exclude observations 33 and
48 (i.e. excluding 1983 and 1997).
#Regress each map pixel on time series
beta <- rep(0,400)
r2 <- beta
for (i in 1:400) {
pixel_lm <- lm(Temp[i,]~Time1)
r2[i] <- summary(pixel_lm)$r.squared
beta[i] <- summary(pixel_lm)$coefficients[2]
}
grid$r2 <- r2
grid$beta <- beta
p1 <- levelplot(eof[,1]*T.svd$d[1]/sqrt(50)~x+y,grid,aspect='iso',main='EOF 1 (rescaled)')
# Note, I multiplied the eof by its standard deviation
# The variance is d^2/N, so the std. dev. is d/sqrt(N)
p2 <- levelplot(beta~x+y,grid,aspect='iso',main='Beta')
p3 <- levelplot(r2~x+y,grid,aspect='iso',main='R squared')
print(p1,position=c(0,2/3,1,1),more=T)
print(p2,position=c(0,1/3,1,2/3),more=T)
print(p3,position=c(0,0,1,1/3))
Comparing the first EOF and beta, explain the EOF map in terms of the
relationship between original data and the time series. Looking
at the R-squared plot, where is the original data most strongly
correlated with the first time series.
These
data are just the DJF averages. Suppose we data for every month
over these 50 years. Thinking back to the previous lab, what
source of temperature variation do you expect to be picked up by EOF1
in monthly data, (i.e. El Nino, seasonal, diurnal, etc.)? Given
that, what will happen to EOF2 with the monthly data? Will the
EOF's from this series be comparable to the EOF's from these data?
How so, (or how not)? Hint, think about the relationship
between the first two EOFs in these data.
One more thing to keep
in mind: The time series are instantaneously uncorrelated (by
construction). But are they necessarily uncorrelated in time?
No.
acf(T.svd[,1:2])
The will be uncorrelated in time if the components exihibit
autocorrelation of the same time scale (i.e. if their ACF plots are the
same). The same holds for the EOF spatial fields...
library(gstat)
eof <- cbind(T.svd$u[,1]-mean(T.svd$u[,1]),T.svd$u[,2]-mean(T.svd$u[,2]))
g <- gstat(id='A',formula=eof[,1]/sd(eof[,1])~1,locations= ~ x+y,data=(grid2))
g <- gstat(g,id='B',formula=eof[,2]/sd(eof[,2])~1,locations= ~ x+y,data=(grid2))
plot(variogram(g))
Are the fields spatially uncorrelated?