The break signal in climate records:Brownian motion or Random deviations?

(1)

The break signal in climate records:

Brownian motion or Random deviations?

Ralf Lindau

(2)

Break signal

Climate records are affected by

breaks resulting from relocations or changes in the measuring

techniques.

For the detection, differences of neighboring stations are considered to reduce the dominating natural variance.

Homogenization algorithms identify breaks by searching for the

maximum external variance

(3)

Benchmark datasets

Benchmarking data sets are used to assess the skill of homogenization algorithms.

These are artificial data sets with known breaks so that an evaluation of the algorithms is possible.

However, benchmark datasets should reflect as much as possible the statistical properties of real data .

An important question is how to model the breaks:

1. As free random walk (Brownian motion)

2. As random deviation from a fixed level (random noise)

(4)

Conceptual model

Same signal, two approaches:

Which of the two DT is assumed to be an independent random variable?

The deviations or

the jumps?

Random deviations

(5)

Approach

To distinguish BM and RD type breaks we use to following approach.

We assume that the climate time series consists of four superimposed signals:

Climate, noise, BM and RD type breaks

Breaks and noise are assumed to be normal distributed. The climate signal is expected to be more complicated, but will be cancelled out in the next step.

Breaks occur randomly with an average probability (say 5%).



(6)

Spatial difference

The difference between two neighboring stations x₁ and x₂ is:

The climate signal is cancelled out, because it is the same at two

neighboring stations. However noise due to the different weather at the two stations remains.



(7)

Spatiotemporal difference D

Now we have the difference time series of station pairs. Within these time series the temporal difference between two time points i and i+L is built:

D is the sum (or difference) of 12 random numbers.

Finally, we calculate the variance of D for classes of constant time lags L:

Var(D(L))



(8)

Variance of D

A common rule is:

12 variance terms. Covariance only for breaks of the same station. These occur two times (for each station):



(9)

Covariance of RD breaks

For external pairs E(Cov) = 0 For internal pairs E(Cov) = Var(d)

The probability to find k breaks within a time span L:



(10)

Variance of BM breaks



A classical BM is defined as:

At time step i it consists of the sum of i random numbers:

Breaks do not occur each year, but only with a probability p_b:

Analogously for i+L:

(11)

Covariance of BM breaks



Our previous findings for the variance were:

Together they give:

The covariance of two time steps within a Brownian motion is equal to the variance of the earlier one, because both values have all random numbers in common that constitutes the first:

Var ( ^� ^(� ⁾ ) ⁺ ^Var ( ^� ^(� ⁺ ^�) ) ⁻ ² ^Cov ( ^� ⁽ ^� ⁾ ^, ^� ⁽ ^� ⁺ ^� ⁾ ) ⁼ ^��

_�

^�

_�²

We obtain a linear function in L.

(12)

Variance of D

We return to the original formula :

and inserted our findings:

The variance of D(L) has three additive components:

1. Linear function for BM type breaks

2. Exponential function for RD type breaks 3. Constant offset for the noise



(13)

Test with simulated data

RD breaks + noise BM breaks + noise RD + BM + noise

s_b = 0.0 p_b = 0.00 s_d = 0.1 p_d = 0.05 s_b = 0.1

s_b = 0.1 p_b = 0.05 s_d = 0.0 p_d = 0.00 s_b = 0.1

s_b = 0.1 p_b = 0.05 s_d = 0.1 p_d = 0.05 s_b = 0.1

The variance follows exactly the theory when the known parameters are inserted. But how good is a retrieval without a priori knowledge?

(14)

Retrieval approach

We had:



Shortly written:

Two tangents, one at the beginning, one at the end:

(15)

Retrieval application

Two-step retrieval:

1. Two tangents as first guess 2. Exhausting search around it.

Nice geometrical interpretation

(16)

Retrieval test for sparse data

100 station pairs:

Large scatter for high lags.

But the retrieval works good, the data itself varies.

(17)

Data

ISTI data restricted to US and 1900 - 2000:

At least 80 years of data.

Distance less than 100 km.

1459 station pairs result.

(18)

Result

At short time lags the 1 – e^-x increase caused by RD type breaks is visible.

For long time lags the linear increase indicates BM type breaks.

The offset determines the noise.

BM: p_b s_b² = 0.45 K²cty^-1 RD: p_d = 17.1 cty^-1

s_d² = 0.12 K²

(19)

Conclusion

Brownian motion and random deviation break types can be

distinguished by calculating the variance of the spatiotemporal difference.

The application shows that US data contain both break types.

But we did not consider: Possible trend effects Stationarity of the variance

(20)

Lag covariance for RD

The covariance is an

exponential function of the time lag.

C(L) = a exp (-bL) break

a = s_b² strength s_b b = k/(n-k) number k

As byproduct we have a nice method to retrieve also

Input:

s_b = 1.000 k = 5.000 Output:

s_b = 1.000 k = 4.984

(21)

US data, not normalized

The covariance reflects mainly the mean difference between two stations.

Therefore, the covariance (and variance) is strongly depended on the distance.

Averaging over different distance classes would be dangerous.

50 km 150 km 250 km 350 km

10.0

(22)

US data, normalized

Normalization with the time series mean helps. The expected function of the break covariance (e-

function) becomes visible.

But now the variance makes weird things.

Minimum at L/4. Reaching the original value at L/2, increasing further for larger L.

50 km 150 km 250 km

0.5 ^{350 km}

(23)

Simulated data

The normalization causes a deformation and a shift of both the covariance and the variance function.

not normalised normalised

(24)

Rational

The covariance of two time points a and b is:

The mixed product is:

Normally we say:

Then we have just the shift:

However, the mixed product is not zero, but depends on the lengths of the segment.

For long segments:

The break signal in climate records:Brownian motion or Random deviations?