Spectral Analyis

soi <- scan("soi.dat")
soi12 <- ts(scan("soi.dat"), start=1950, frequency=12)

# Three ways of calculating the raw periodogram
par(mfrow=c(3,1))
P1 <- spec.pgram(soi,log='no',taper=0,pad=0,fast=FALSE,demean=TRUE,detrend=FALSE)
soi_demean <- soi-mean(soi)
P2 <- abs(fft(soi_demean))^2/length(soi_demean)
freq <- seq(from=0,to=1,length=length(soi))
plot(freq,P2,type='l')
P3 <- spec.pgram(soi12,log='no',taper=0,pad=0,fast=FALSE,demean=TRUE,detrend=FALSE)


You should see that the second plot is equivalent to the first plot, except that it also a a mirror image for frequencies between 1/2 and 1.  The third plot is equivalent to the first, except it is scaled by a factor or 12.

There is an identity called Parcival's inequality: which states that the average of the periodogram from 0 to 1 should be equal to the sample variance:
mean(P2)
var(soi_demean) # The difference is due to dividing by N-1 rather than N
N <- length(soi_demean)
var(soi_demean)*(N-1)/N
There is at least one other "bizarre" element of the periodogram: the periodogram at frequency 0 divide by N is equal to the sample mean squared.
P2[1]/N # Should be 0 because we demeaned
#suppose we did not demean...
P3 <- abs(fft(soi))^2/length(soi)
P3[1]/N
mean(soi)^2
In effect, it is impossible to estimate the periodogram at frequency zero, the estimate is always relatd to the mean.  Related to this, is the effect of a trend in the data.  You will see the impact of the trend later, but for the moment, let's just remove any trends.


P <- spec.pgram(soi,log='no',taper=0,pad=0,fast=FALSE,demean=TRUE,detrend=TRUE)
abline(v=1/12,lty='dotted')
abline(v=1/48,lty='dotted')
abline(v=2/12,lty='dotted')
abline(v=3/12,lty='dotted')

Their is an obvious peak at the one year cycle, and there appears to be a small peak at around 4 years, as well as peaks at 1/2 year, and 1/3 year. The yearly peak is obvious. The 4 year peak corresponds to El Nino. The peaks at 1/2 and 1/3 year intervals are more troublesome.  These are likely not to be real, but are 'harmonics' of the one year cycle.  Harmonics sometimes occur when there is a very large peak that is not exactly sinusoidal, but only approximately sinusoidal.

Estimating the autocovariance function from the periodogram:



par(mfrow=c(2,1))
Cov_est <- Re(fft(P2,inverse=TRUE))/N
lag <- seq(from=0, to=8, by=1/12)
plot(lag,Cov_est[1:(length(lag))],ylab='Covariance',type='l',main='Covariance')
acf(soi12,type='covariance',lag.max=12*8,main='Covariance')
The only difference between these two values, is that the denominator in the top plot is N, and in the bottom plot the denominator is N-1 for the first lag, N-2 for the second lag, N-3 for the third, etc.

Compare the autocovariance plots and the periodogram.

Confidence Intervals and Consistency

The periodogram is extraordinarily noisy.  It is possible to construct 1-2α confidence intervals for a periodogram value, according to
2 I/ [χ22(α)] <  f < 2 I/ [χ22(1-α)]. (This is a chi-square with 2 degrees of freedom)

For example:

P = spec.pgram(soi,log='no',taper=0,pad=0,fast=FALSE,demean=TRUE,detrend=TRUE,plot='no')
U = qchisq(.025,2)
L = qchisq(.975,2)
c(2*P1$spec[10]/L, P1$spec[10], 2*P1$spec[10]/U)
c(2*P1$spec[38]/L, P1$spec[38], 2*P1$spec[38]/U)

Note, that the confidence interval is not a function of N.  If we collect more data, we will not be able to estimate the periodogram any better!  The periodogram is unbiased, but it is inconsistent.  (Remember, consistency means that not only is the estimate asymptotically unbiased, but consistency also means that the estimate get better as N get larger.)

To consistently estimate the periodogram, one has two choices: 1) smooth periodogram by averaging over small windows, and 2) splitting the data into sections, calucating a periodogram for each section, and then averaging.

par(mfrow=c(3,2))
window_1 <- kernel('daniell',3)
window_2 <- kernel('daniell',6)
window_3 <- kernel('daniell',c(2,2))
plot(window_1)
spec.pgram(soi,kernel=window_1,log='no',taper=0,fast=F,detrend=T)
plot(window_2)
spec.pgram(soi,kernel=window_2,log='no',taper=0,fast=F,detrend=T)
plot(window_3)
spec.pgram(soi,kernel=window_3,log='no',taper=0,fast=F,detrend=T)

Try other window sizes as well, and explain the effect of window size on the periodogram estimate.  Calculate confidence intervals for the maximum value using window 3.  (Note, the degrees of freedom are returned by spec.pgram)

The second approach is given below. Note, that I use the time series version of the data, simply so that I can take advantage of the window function.

k <- matrix(0,nrow=48,ncol=5)
k[,1] <- spec.pgram(window(soi12,start=1950,end=1958),log='no',taper=0,fast=F,detrend=T)$spec
k[,2] <- spec.pgram(window(soi12,start=1956,end=1964),log='no',taper=0,fast=F,detrend=T)$spec
k[,3] <- spec.pgram(window(soi12,start=1962,end=1970),log='no',taper=0,fast=F,detrend=T)$spec
k[,4] <- spec.pgram(window(soi12,start=1968,end=1976),log='no',taper=0,fast=F,detrend=T)$spec
k[,5] <- spec.pgram(window(soi12,start=1974,end=1982),log='no',taper=0,fast=F,detrend=T)$spec
# This could have been automated or put in a loop, but I wanted to make it more obvious.
freq <- spec.pgram(window(soi12,start=1974,end=1982),log='no',taper=0,fast=F,detrend=T)$freq
plot(freq,apply(k,1,mean),type='l')
abline(v=1/4,lty='dotted')
abline(v=1,lty='dotted')

When there is a very strong periodoc signal, it is commonly the case that is upsets the estimation at other frequencies.  This effect is called 'leakage,'  essentially, some of the signal leaks out to other frequencies.  The harmonics are one example of this, but it can happen throughout the entire spectrum.  It is often desirable to filter out strong periodicities.  We will do this by subtracting a 12 month moving average from the series.  This should reduce the effect of all cycles with periods less than or equal to a year.  Note, that it will also dampen any cycles that are slighly longer than one year as well, (if such exist).
####################
# Filtering
k = kernel("modified.daniell", 6) #-- 12 month filter
plot(k)
soi_filter = kernapply(soi,k)
plot.ts(soi_filter)
spec.pgram(soi_filter,kernel('daniell',3),log='no',taper=0,fast=F)

Describe the effect of taking a 12 month moving average on the data series and periodogram.  Find the precise period of the maximum frequency in the periodogram.

Finally, we finish our analysis of the periodogram by looking at the effect of a trend on the periodogram.   
##################################################
#
N <- length(soi)
summary(lm(soi~seq(1,N)))

# Effect of not removing trend
par(mfrow=c(2,1))
spec.pgram(soi,kernel('daniell',c(2,2)),log='no',taper=0,fast=F,main='With
detrending')
spec.pgram(soi,kernel('daniell',c(2,2)),log='no',taper=0,fast=F,detrend=F,demean=T,main='Without
detrending')

What is the effect of removing or not removing the trend?  What is the significance of a trend for the detection of long wave cycles?

Bivariate Analysis.

Along with the soi index, we also have data on fish recruitment in the North Pacific.
rec <- scan("recruit.dat")
acf(cbind(soi,rec),lag.max=12*6)
ccf(soi,rec)
Compare the cross correlation functions and explain these results.  How do you interpret the larger magnitudes on the negative portion  than the positive portion? What is leading what here?

It is also useful to determine which frequencies are contributing to the cross-corrlation.  This is done using the 'coherence function'.
x = ts(cbind(soi,rec))
s = spec.pgram(x, kernel("daniell",c(2,2)), taper=0)
s$df # degrees of freedom
f = qf(.999, 2, s$df-2)
c = f/(18+f) # confidence interval
plot(s, plot.type = "coherency", ci.lty = 2)
abline(h=c)
What frequencies appear to be contributing most to the cross-correlation between the soi and recruitment series?
t
t