Singular spectrum analysis - The Sea Ice Thickness in the Atlantic Sector of the Southern Ocea

4.2 Methods

4.2.3 Singular spectrum analysis

Singular Spectrum Analysis (SSA) is a very powerful tool in modern time series analysis. SSA is an EOF-based method which can be applied for various purposes: Trend analysis, smoothing, extraction of seasonality or cycles with varying periods and amplitudes, filling of data gaps, and for finding structures in short time series [Hassani, 2007]. In climate science SSA is mostly used to detect modulated oscillations in meteorological or geophysical records. In the presented work it was applied to ice draft anomalies.

The basic idea of SSA is sliding a window down a time series and looking for patterns that account for a high proportion of the variance [Allen and Smith, 1996]. One may think of an EOF analysis of a single time series, rather than of a field of time series. SSA analysis typically involves four steps which are known as thecaterpillar method:

Step 1: Embedding:

The considered time seriesdof length N,d(t) : t= 1. . . N, is first centred to have zero mean.

Sliding a window of widthM down the time series produces a set of M lagged vectors, which are then arranged in a (N⁰×M) matrix D, which is called the trajectory matrix:

The window widthM defines the longest periodicity that can be detected by SSA. IncreasingM increases the spectral resolution of the method. Dis aHankel matrix, i.e. all elements along the skew diagonals are equal. It consists ofM lag-shifted copies ofd(t), which areN⁰ =N−M+ 1 long [Kondrashov and Ghil, 2006].

Step 2: EOF Analysis:

In the next step the trajectory matrix is used to form the lag-covariance matrix C_D:

C_D = 1

N⁰D^TD. (4.7)

The lag-covariance matrix has the dimensions (M ×M). This way of computing CD is the method ofBroomhead and King [1986]. An alternative approach is the method ofVautard and Ghil [1989]¹. Once the lag-covariance matrix is computed, it is diagonalised and its eigenvalues are ranked in decreasing order:

Λ_D =E^T_DC_DE_D, (4.8)

where ΛD is a diagonal matrix containing the eigenvalues and ED contains the eigenvectors (EOFs) of C_D. The total variance of the time series d(t) is given by the trace of C_D, and the eigenvaluesλ_k explain the partial variance in the direction ofE_k. Theprincipal components a_k can be obtained by projecting the time series onto each EOF:

a_k(t) =

j=1

d(t+j−1)e_k(j). (4.9)

The trajectory matrix can be reconstructed through D =

j=1

a_j e^T_j =D⁰₁+D⁰₂+. . .+D⁰_M, (4.10) where the matricesD⁰_j are elementary matrices of rank one. Each elementary matrix is related to one eigenvector and an eigenvalue ofC_D which explains a certain fraction of the variance in d(t)(Fig. 4.2).

Step 3: Grouping:

In the grouping step the elementary matricesD⁰_j can be splitted into groups, corresponding to a certain kind of signal in d(t), which can then be summed. From the sum a new trajectory matrix D₂ can be constructed containing only signal components corresponding to the group of selected eigenvalues [Hassani, 2007]. The selection of the eigenvalues and thus the amount of variance to retain can be based on a "scree diagram" (eigenvalue spectrum). Summing up all elementary matricesD⁰_1,...,M would give the initial trajectory matrix D.

Step 4: Diagonal Averaging:

The final step extracts the time series from D₂, which is an additive component of the initial time series d(t) [Hassani, 2007]. The fist operation in this step is also called Hankelisation, as it turns the matrix D₂ into a Hankel matrix by replacing every element by the average of the respective skew diagonal. The elements of the new time series are then read from the skew diagonals, like in matrixD on the right hand side of equation 4.6.

WithKas the set of EOFs on which the reconstruction is based,M_tas the normalisation factor, L_t as the lower bound of summation and U_t as the upper bound of summation, the new time series rK(t)can thus be obtained by

rK(t) = 1

1Both versions of computingC_Dare explained in the paper ofAllen and Smith [1996].

(Mt, Lt, Ut) =

For extracting significant oscillations in a time series it is sufficient to stop at step two of the SSA procedure and to evaluate the spectrum of eigenvalues. According to SSA theory, every harmonic oscillation present in the time series produces two eigenvalues, a sine and a cosine with equal frequencies, amplitudes and phases [Hassani, 2007]. As both eigenvalues of such a pair explain the same amount of variance, oscillations in d(t) can be detected by finding plateaus in the plotted eigenvalue spectrum. However, as noted byAllen and Smith [1996] the pair selection criterion can be misleading when trends are present in the data.

Fig. 4.2: (a) Light thin line: Daily mean ice draft time series (in metres). Dark bold line: The same series reconstructed with the first 10 eigenvalues. (b) Dark bold line: The same series as in (a), reconstructed with the first 50 eigenvalues.

Furthermore, considering only the high ranked eigenvalues of the spectrum as significant may be wrong, as true systems are mostly contaminated with coloured noise. Also the stability of an EOF pair to varying window widths is not a sufficient condition for significant oscillations.

SSA should therefore never be used without an appropriate significance test [Allen and Smith, 1996].

The significance test developed byAllen and Smith [1996] is based on a Monte Carlo approach.

First, the mean, variance and lag-1 autocorrelation are estimated fromd(t)in order to simulate a large number of random AR(1) processes corresponding to these parameters². From each process asurrogate lag-covariance matrix CR is computed and projected onto the EOFs of the original data:

Λ_R =E^T_DC_RE_D. (4.12)

The statistical distribution of the diagonal elements in Λ_R is used to find the 2.5th and 97.5th percentiles. These are plotted as error bars in the eigenvalue spectrum of the initial lag-covariance matrix C_D. 95% of the surrogate realisations then lie within those limits. The null hypothesis to be rejected is: The time series has been generated by AR(1) noise. But for

2For details of the algorithm seeAllen and Smith[1996].

a pair of sinusoidal EOFs considered as signal the true confidence level is in fact lower than 95% [Allen and Smith, 1996]. When the signal EOFs are determined, one aims to proof at the 95% (or 97.5%) level that the remainder of the time series has been generated by AR(1) noise.

The described test therefore includes a second stage: If some components of the time series are believed to represent significant signals after stage one of the test, the second stage checks whether the remainder of the time series is attributable to AR(1) noise. This test makes use of steps three and four of the caterpillar method.

First, matrixD₂ is constructed using only the eigenvalues which were identified as signal. The AR(1) process parameters are then estimated, such that when added to the time series rK(t), this composite has the same variance and lag-1 autocorrelation as the original seriesd(t)[Allen and Smith, 1996]. Like in stage one, the AR(1) surrogates are centred, and new lag-covariance matrices C_R are calculated. Their eigenvalue spectrum is used to calculate the desired error bars. To overcome the problem of artificial variance-compression described byAllen and Smith [1996], the matricesC_D and C_R are projected onto the EOFs of the composite null hypothesis E_N, rather than the EOFs derived from the data itself:

Λ⁰_D =E^T_NCDEN, (4.13)

Λ⁰_R=E^T_NC_RE_N. (4.14)

The eigenvectors EN are calculated by diagonalising the lag-covariance matrix CN, which is based on the composite null hypothesis that the data only consist of the selected signal com-ponent plus AR(1) noise. In spectral analysis the results of SSA are presented as EOF scree diagrams, ordered by frequency.

Im Dokument The Sea Ice Thickness in the Atlantic Sector of the Southern Ocean (Seite 83-86)