Spectral Analyis
soi <- scan("soi.dat")
soi12 <- ts(scan("soi.dat"), start=1950, frequency=12)
# Three ways of calculating the raw periodogram
par(mfrow=c(3,1))
P1 <- spec.pgram(soi,log='no',taper=0,pad=0,fast=FALSE,demean=TRUE,detrend=FALSE)
soi_demean <- soi-mean(soi)
P2 <- abs(fft(soi_demean))^2/length(soi_demean)
freq <- seq(from=0,to=1,length=length(soi))
plot(freq,P2,type='l')
P3 <- spec.pgram(soi12,log='no',taper=0,pad=0,fast=FALSE,demean=TRUE,detrend=FALSE)
You should see that the second plot is equivalent to the first plot,
except that it also a a mirror image for frequencies between 1/2 and 1.
The third plot is equivalent to the first, except it is
scaled by a factor or 12.
There is an identity called Parcival's inequality: which states that
the average of the periodogram from 0 to 1 should be equal to the
sample variance:
mean(P2)
var(soi_demean) # The difference is due to dividing by N-1 rather than N
N <- length(soi_demean)
var(soi_demean)*(N-1)/N
There is at least one other "bizarre" element of the periodogram: the
periodogram at frequency 0 divide by N is equal to the sample mean
squared.
P2[1]/N # Should be 0 because we demeaned
#suppose we did not demean...
P3 <- abs(fft(soi))^2/length(soi)
P3[1]/N
mean(soi)^2
In effect, it is impossible to estimate the periodogram at frequency
zero, the estimate is always relatd to the mean. Related to
this, is the effect of a trend in the data. You will see the
impact of the trend later, but for the moment, let's just remove any
trends.
P <- spec.pgram(soi,log='no',taper=0,pad=0,fast=FALSE,demean=TRUE,detrend=TRUE)
abline(v=1/12,lty='dotted')
abline(v=1/48,lty='dotted')
abline(v=2/12,lty='dotted')
abline(v=3/12,lty='dotted')
Their is an obvious peak at the one year cycle, and there appears to be
a small peak at around 4 years, as well as peaks at 1/2 year, and 1/3
year. The yearly peak is obvious. The 4 year peak corresponds to El
Nino. The peaks at 1/2 and 1/3 year intervals are more troublesome.
These are likely not to be real, but are 'harmonics' of the
one year cycle. Harmonics sometimes occur when there is a
very large peak that is not exactly sinusoidal, but only approximately
sinusoidal.
Estimating the autocovariance function from the periodogram:
par(mfrow=c(2,1))
Cov_est <- Re(fft(P2,inverse=TRUE))/N
lag <- seq(from=0, to=8, by=1/12)
plot(lag,Cov_est[1:(length(lag))],ylab='Covariance',type='l',main='Covariance')
acf(soi12,type='covariance',lag.max=12*8,main='Covariance')
The only difference between these two values, is that the denominator
in the top plot is N, and in the bottom plot the denominator is N-1 for
the first lag, N-2 for the second lag, N-3 for the third, etc.
Compare the autocovariance plots and the periodogram.
Confidence Intervals and Consistency
The periodogram is extraordinarily noisy. It is possible to
construct 1-2α confidence intervals for a periodogram value, according to
2 I/ [χ22(α)] < f
< 2 I/ [χ22(1-α)]. (This is a chi-square with 2 degrees of freedom)
For example:
P = spec.pgram(soi,log='no',taper=0,pad=0,fast=FALSE,demean=TRUE,detrend=TRUE,plot='no')
U = qchisq(.025,2)
L = qchisq(.975,2)
c(2*P1$spec[10]/L, P1$spec[10], 2*P1$spec[10]/U)
c(2*P1$spec[38]/L, P1$spec[38], 2*P1$spec[38]/U)
Note, that the confidence interval is not a function of N. If
we collect more data, we will not be able to estimate the periodogram
any better! The periodogram is unbiased, but it is
inconsistent. (Remember, consistency means that not only is
the estimate asymptotically unbiased, but consistency also means that
the estimate get better as N get larger.)
To consistently estimate the periodogram, one has two choices: 1)
smooth periodogram by averaging over small windows, and 2) splitting
the data into sections, calucating a periodogram for each section, and
then averaging.
par(mfrow=c(3,2))
window_1 <- kernel('daniell',3)
window_2 <- kernel('daniell',6)
window_3 <- kernel('daniell',c(2,2))
plot(window_1)
spec.pgram(soi,kernel=window_1,log='no',taper=0,fast=F,detrend=T)
plot(window_2)
spec.pgram(soi,kernel=window_2,log='no',taper=0,fast=F,detrend=T)
plot(window_3)
spec.pgram(soi,kernel=window_3,log='no',taper=0,fast=F,detrend=T)
Try other window sizes as well, and explain the effect of window size
on the periodogram estimate. Calculate confidence intervals for the maximum value using window 3. (Note, the degrees of freedom are returned by spec.pgram)
The second approach is given below. Note, that I use the time series
version of the data, simply so that I can take advantage of the window
function.
k <- matrix(0,nrow=48,ncol=5)
k[,1] <- spec.pgram(window(soi12,start=1950,end=1958),log='no',taper=0,fast=F,detrend=T)$spec
k[,2] <- spec.pgram(window(soi12,start=1956,end=1964),log='no',taper=0,fast=F,detrend=T)$spec
k[,3] <- spec.pgram(window(soi12,start=1962,end=1970),log='no',taper=0,fast=F,detrend=T)$spec
k[,4] <- spec.pgram(window(soi12,start=1968,end=1976),log='no',taper=0,fast=F,detrend=T)$spec
k[,5] <- spec.pgram(window(soi12,start=1974,end=1982),log='no',taper=0,fast=F,detrend=T)$spec
# This could have been automated or put in a loop, but I wanted to make it more obvious.
freq <- spec.pgram(window(soi12,start=1974,end=1982),log='no',taper=0,fast=F,detrend=T)$freq
plot(freq,apply(k,1,mean),type='l')
abline(v=1/4,lty='dotted')
abline(v=1,lty='dotted')
When there is a very strong periodoc signal, it is commonly the case
that is upsets the estimation at other frequencies. This
effect is called 'leakage,' essentially, some of the signal
leaks out to other frequencies. The harmonics are one example
of this, but it can happen throughout the entire spectrum. It
is often desirable to filter out strong periodicities. We
will do this by subtracting a 12 month moving average from the series.
This should reduce the effect of all cycles with periods less
than or equal to a year. Note, that it will also dampen any
cycles that are slighly longer than one year as well, (if such exist).
####################
# Filtering
k = kernel("modified.daniell", 6) #-- 12 month filter
plot(k)
soi_filter = kernapply(soi,k)
plot.ts(soi_filter)
spec.pgram(soi_filter,kernel('daniell',3),log='no',taper=0,fast=F)
Describe
the effect of taking a 12 month moving average on the data series and
periodogram. Find the precise period of the maximum frequency
in
the periodogram.
Finally, we finish our analysis of the periodogram by looking at the
effect of a trend on the periodogram.
##################################################
#
N <- length(soi)
summary(lm(soi~seq(1,N)))
# Effect of not removing trend
par(mfrow=c(2,1))
spec.pgram(soi,kernel('daniell',c(2,2)),log='no',taper=0,fast=F,main='With
detrending')
spec.pgram(soi,kernel('daniell',c(2,2)),log='no',taper=0,fast=F,detrend=F,demean=T,main='Without
detrending')
What is the effect of removing or not removing the trend?
What is the significance of a trend for the detection of long
wave cycles?
Bivariate Analysis.
Along with the soi index, we also have data on fish recruitment in the
North Pacific.
rec <- scan("recruit.dat")
acf(cbind(soi,rec),lag.max=12*6)
ccf(soi,rec)
Compare the cross correlation functions and explain these results.
How do you interpret the larger magnitudes on the
negative portion than the positive portion? What is leading
what here?
It is also useful to determine which frequencies are contributing to
the cross-corrlation. This is done using the 'coherence
function'.
x = ts(cbind(soi,rec))
s = spec.pgram(x, kernel("daniell",c(2,2)), taper=0)
s$df # degrees of freedom
f = qf(.999, 2, s$df-2)
c = f/(18+f) # confidence interval
plot(s, plot.type = "coherency", ci.lty = 2)
abline(h=c)
What frequencies appear to be contributing most to the
cross-correlation between the soi and recruitment series?
t
t