The break signal in climate records:
Brownian motion or Random deviations?
Ralf Lindau
Break signal
Climate records are affected by
breaks resulting from relocations or changes in the measuring
techniques.
For the detection, differences of neighboring stations are considered to reduce the dominating natural variance.
Homogenization algorithms identify breaks by searching for the
maximum external variance
Benchmark datasets
Benchmarking data sets are used to assess the skill of homogenization algorithms.
These are artificial data sets with known breaks so that an evaluation of the algorithms is possible.
However, benchmark datasets should reflect as much as possible the statistical properties of real data .
An important question is how to model the breaks:
1. As free random walk (Brownian motion)
2. As random deviation from a fixed level (random noise)
Conceptual model
Same signal, two approaches:
Which of the two DT is assumed to be an independent random variable?
The deviations or
the jumps?
Random deviations
Approach
To distinguish BM and RD type breaks we use to following approach.
We assume that the climate time series consists of four superimposed signals:
Climate, noise, BM and RD type breaks
Breaks and noise are assumed to be normal distributed. The climate signal is expected to be more complicated, but will be cancelled out in the next step.
Breaks occur randomly with an average probability (say 5%).
Spatial difference
The difference between two neighboring stations x1 and x2 is:
The climate signal is cancelled out, because it is the same at two
neighboring stations. However noise due to the different weather at the two stations remains.
Spatiotemporal difference D
Now we have the difference time series of station pairs. Within these time series the temporal difference between two time points i and i+L is built:
D is the sum (or difference) of 12 random numbers.
Finally, we calculate the variance of D for classes of constant time lags L:
Var(D(L))
Variance of D
A common rule is:
12 variance terms. Covariance only for breaks of the same station. These occur two times (for each station):
Covariance of RD breaks
For external pairs E(Cov) = 0 For internal pairs E(Cov) = Var(d)
The probability to find k breaks within a time span L:
Variance of BM breaks
A classical BM is defined as:
At time step i it consists of the sum of i random numbers:
Breaks do not occur each year, but only with a probability pb:
Analogously for i+L:
Covariance of BM breaks
Our previous findings for the variance were:
Together they give:
The covariance of two time steps within a Brownian motion is equal to the variance of the earlier one, because both values have all random numbers in common that constitutes the first:
Var ( � (� ) ) + Var ( � (� + �) ) − 2 Cov ( � ( � ) , � ( � + � ) ) = ��
��
�2We obtain a linear function in L.
Variance of D
We return to the original formula :
and inserted our findings:
The variance of D(L) has three additive components:
1. Linear function for BM type breaks
2. Exponential function for RD type breaks 3. Constant offset for the noise
Test with simulated data
RD breaks + noise BM breaks + noise RD + BM + noise
sb = 0.0 pb = 0.00 sd = 0.1 pd = 0.05 sb = 0.1
sb = 0.1 pb = 0.05 sd = 0.0 pd = 0.00 sb = 0.1
sb = 0.1 pb = 0.05 sd = 0.1 pd = 0.05 sb = 0.1
The variance follows exactly the theory when the known parameters are inserted. But how good is a retrieval without a priori knowledge?
Retrieval approach
We had:
Shortly written:
Two tangents, one at the beginning, one at the end:
Retrieval application
Two-step retrieval:
1. Two tangents as first guess 2. Exhausting search around it.
Nice geometrical interpretation
Retrieval test for sparse data
100 station pairs:
Large scatter for high lags.
But the retrieval works good, the data itself varies.
Data
ISTI data restricted to US and 1900 - 2000:
At least 80 years of data.
Distance less than 100 km.
1459 station pairs result.
Result
At short time lags the 1 – e-x increase caused by RD type breaks is visible.
For long time lags the linear increase indicates BM type breaks.
The offset determines the noise.
BM: pb sb2 = 0.45 K2cty-1 RD: pd = 17.1 cty-1
sd2 = 0.12 K2
Conclusion
Brownian motion and random deviation break types can be
distinguished by calculating the variance of the spatiotemporal difference.
The application shows that US data contain both break types.
But we did not consider: Possible trend effects Stationarity of the variance
Lag covariance for RD
The covariance is an
exponential function of the time lag.
C(L) = a exp (-bL) break
a = sb2 strength sb b = k/(n-k) number k
As byproduct we have a nice method to retrieve also
Input:
sb = 1.000 k = 5.000 Output:
sb = 1.000 k = 4.984
US data, not normalized
The covariance reflects mainly the mean difference between two stations.
Therefore, the covariance (and variance) is strongly depended on the distance.
Averaging over different distance classes would be dangerous.
50 km 150 km 250 km 350 km
10.0
US data, normalized
Normalization with the time series mean helps. The expected function of the break covariance (e-
function) becomes visible.
But now the variance makes weird things.
Minimum at L/4. Reaching the original value at L/2, increasing further for larger L.
50 km 150 km 250 km
0.5 350 km
Simulated data
The normalization causes a deformation and a shift of both the covariance and the variance function.
not normalised normalised
Rational
The covariance of two time points a and b is:
The mixed product is:
Normally we say:
Then we have just the shift:
However, the mixed product is not zero, but depends on the lengths of the segment.
For long segments: