The break signal in climate records:
Random walk or random deviations?
Ralf Lindau
Dipdoc Seminar β 30. May 2016
Break signal
Climate records are affected by
breaks resulting from relocations or changes in the measuring
techniques.
For the detection, differences of neighboring stations are considered to reduce the dominating natural variance.
Homogenization algorithms identify breaks by searching for the
maximum external variance (explained by the jumps).
Benchmark datasets
Benchmarking data sets are used to assess the skill of homogenization algorithms.
These are artificial data sets with known breaks so that an evaluation of the algorithms is possible.
However, benchmark datasets should reflect as much as possible the statistical properties of real data .
An important question is how to model the breaks:
1. As free random walk (Brownian motion)
2. As random deviation from a fixed level (random noise)
Dipdoc Seminar β 30. May 2016
Conceptual model
Same signal, two approaches:
Which of the two DT is assumed to be an independent random variable?
The deviations or
the jumps?
Depending on our choice
different statistical properties of
break signal will result.
Dipdoc Seminar β 30. May 2016
Random deviations
Brownian motion
Different effects of identical s
Dipdoc Seminar β 30. May 2016
Difference:
The introduced break variance of random deviations is larger by a factor of 2 compared to Brownian motion.
Reason:
All jumps are created by the sum of two random numbers, while it is only one in case of Brownian motion.
Preliminary:
π½ππ π©π΄ = π
π π½ππ πΉπ«
Random deviations
Brownian motion
Linearly growing variance of BM
Dipdoc Seminar β 30. May 2016
The variance of a Brownian motion grows linearly in time.
At the end of a BM time series the variance is:
Var = k s2 with k: number of breaks and s2: break variance
The average variance (over time) is VarBM = k/2 s2
For RD the variance is (shown before):
VarRD = 2 s2
k is in the order of 5, for difference time series twice: 10.
Thus k/4 is in the order of 2.5.
Brownian motion created by the same s is much easier to detect.
π½ππ π©π΄ = π
π π½ππ πΉπ«
Which type is more realistic?
Dipdoc Seminar β 30. May 2016
There are indications for both of the two break types:
For random deviations:
Relocations are bound to fixed position.
Stations have geographical names and their positions are not free to fluctuate away.
For random walk:
Changes in measuring techniques can be seen as elimination of error sources one after the other.
Ideal case: Today most errors are eliminated.
Then the break signal can be seen as Brownian motion backward in time.
Different βschoolsβ
For a long time we were not aware that there are these two approaches.
Williams et al. (2012) modelled random walk.
Venema et al. (2012) modelled random deviations.
Only the standard deviations applied were communicated.
But these are not comparable for RD and BM.
Dipdoc Seminar β 30. May 2016
Platforms & Stairs
Venema et al. (2012) analyzed the statistics of the retrieved signal to decide whether breaks are BM or RD type.
Platforms Stairs
p (RD) = 0.67 p (actual) = 0.59 p (BM) = 0.50
But, the result was hardly significant due to the small number.
And (more important):
The result is dependent on the performance of the homogenization algorithm.
Dipdoc Seminar β 30. May 2016
T3 T1
T2 T3
T1
T2
Platforms are difficult to detect
Running a homogenization algorithm with artificial pure RD data results in 0.62 β 0.64 platform frequency ( < 0.67 ).
In the retrieved signal, the platforms are underestimated.
The detected frequency is not suited as independent indication parameter to distinguish RD from BM.
Therefore, it would be convenient to be independent from the retrieved break signal and instead able to derive break parameters directly from the data.
Dipdoc Seminar β 30. May 2016
Two superimposed signals
We assume that the climate time series consists of two superimposed signals:
Inhomogeneities and noise
π₯ π = ππ π π + ππ π , ππ ~ π 0, ππ2 , ππ ~ π 0, ππ2
Each yearly value can be thought as the sum of two random numbers, eb and en, where eb depends on segment number S, which is defined as the number of breaks lying temporally behind.
Dipdoc Seminar β 30. May 2016
Random deviation breaks
In case of random deviation breaks we calculate the
Lag-covariance C(L):
πΆ πΏ = 1
π β πΏ π₯ π β π₯ π₯ π + πΏ β π₯
πβπΏ
π=1
For external pairs E(C(L)) = 0 For internal pairs E(C(L)) = sb2
πΆ πΏ = ππππ‘ β ππ2
Dipdoc Seminar β 30. May 2016
Probability of internal pairs
ππ¦πππ(π) = π π + 1 π
π β 1 β π π β 1 π β 1
π
ππππππ¦ π = π β min(π, πΏ) π
ππππ‘(πΏ) = ππ¦πππ(π) β ππππππ¦(π, πΏ)
πβπ
π=1
Dipdoc Seminar β 30. May 2016
Probability of a specific year to belong to segment of length l:
Probability of a specific year to have sufficient spacing to the next break:
Probability of internal pairs is the sum over all length of the product.
The probability for internal pairs increase with segment length l and decrease with time lag L.
Probability of internal pairs
ππππ‘ = π π + 1 π
πβπ
π=1
β
π β 1 β π π β 1 π β 1
π
β π β min π, πΏ
π
ππππ‘ =
π β 1 β πΏ π β 1π
π
ππππ‘ = πβ πβπππΏ
Dipdoc Seminar β 30. May 2016
The long version of the product :
By some purely arithmetic transformations we get:
By some further approximations we get:
Lag covariance for RD
Dipdoc Seminar β 30. May 2016
The covariance is an
exponential function of the time lag.
C(L) = a exp (-bL)
break
a = sb2 strength sb b = k/(n-k) number k
As byproduct we have a nice method to retrieve also
strength and number of breaks directly from the data.
Input:
sb = 1.000 k = 5.000 Output:
sb = 1.000 k = 4.984
Brownian motion type
Dipdoc Seminar β 30. May 2016
For Brownian motion type breaks the covariance depends only on the segment number of the earlier of the two years , because they have all random numbers eb constituting the break signal at x(i) in common.
πΆππ£ π₯(π), π₯ π = π π + 1 ππ2 , π < π
The segment number is a stochastic variable growing linearly in time:
π π = π β 1
π β 1 π , π β€ π
Consequently, also the covariance grows linearly with time:
πΆππ£ π₯ π , π₯(π) = 1 + π β 1
π β 1 π ππ2 , π < π β€ π
Time dependent Cov for BM
Dipdoc Seminar β 30. May 2016
The covariance is a linear function in time.
C(i) = a i + b
a = k/(n-1) sb2 b = ( 1 - k/(n-1)) sb2
Input:
sb = 1.000 k = 5.000 Output:
sb = 1.005 k = 4.920
Mixed applications
Dipdoc Seminar β 30. May 2016
Very small break size.
Input:
sb = 1.000 k = 5.000 Output:
sb = 0.046 k = 14.95
Input:
sb = 1.000 k = 5.000 Output:
sb = 3.471 k = 0.729
Temporal covariance
Random deviations
Lag covariance Brownian motion
Very small break number.
Conclusion
Brownian motion and random deviation break types can be distinguished by calculating:
1. Lag covariance C(L)
2. Time dependent covariance C(i)
For Random deviations C(L) is decreasing with L.
For Brownian motion C(i) is increasing with j.
The two other combinations yield either small size or small number.
As byproduct we get an estimate for break size and number without running a full homogenization algorithm.
Dipdoc Seminar β 30. May 2016
Platforms & Stairs
Venema et al. (2012) analyzed the statistics of the retrieved signal to decide whether breaks are BM or RD type.
They distinguish platforms:
from stairs:
Dipdoc Seminar β 30. May 2016
T3 T1
T2
T3
T1
T2
Platform probability for RD
For RD break types T1, T2, T3 are iid random variables (not the case for BM).
There are 6 possibilities of rank order, which all have the same probability:
Dipdoc Seminar β 30. May 2016
T1 < T2 < T3 T1 < T3 < T2 T2 < T1 < T3 T2 < T3 < T1 T3 < T1 < T2 T3 < T2 < T1
Upward and downward stairs have both the probability 1/6.
Every other combination is a
platform. (Either T2 is the smallest or T2 is the largest element of the
triple.)
Downward stair Upward stair
For RD break types the probability of platforms is 2/3.