Detection of inhomogeneities in Daily climate records to Study Trends in Extreme Weather
Detection of Breaks in Random Data,
in Data Containing True Breaks, and in Real Data
Ralf Lindau
Internal and External Variance
Consider the differences of one station compared to a neighbour or a
reference.
Breaks are defined by abrupt changes in the station-reference time series.
Internal variance within the subperiods External variance
between the means of different subperiods
Criterion:
Maximum external variance attained by
a minimum number of breaks
Decomposition of Variance
n total number of years N subperiods
n
iyears within a subperiod
The sum of external and
internal variance is constant.
Three Questions
How do random data behave?
Needed as a stop criterion for the number of significant breaks.
How do real breaks behave theoretically?
How do real data behave?
Segment averages
with stddev = 1
Segment averages x
iscatter randomly mean : 0
stddev: 1/
Because any deviation from zero can be seen as inaccuracy due to the limited number of members.
n
i 2 -distribution
The external variance
is equal to the mean square sum
of a random standard normal distributed variable.
Weighted measure for the variability of the subperiods‘
means
From 2 to distribution
n = 21 years k = 7 breaks
data
X ~ 2 (a) and Y ~ 2 (b) X / (X+Y) ~ (a/2, b/2)
If we normalize a chi
2-distributed variable by the sum of itself and another chi
2-distributed variable, the result will be -distributed.
) (
) ( ) ) (
,
( a b
b b a
a
B
2 , 1 2 ) 1
(
2 1 1 1
2
k n
B k
v v v
p
k k n
with
Incomplete Beta Function
2 , 1 2 ) 1
(
2 1 1 1
2
k n
B k
v v v
p
k k n
External variance v is -distributed
and depends on n (years) and k (breaks):
2
i k
1
0
1 )
( i
l
l
l v m
l v v m
P
Solvable for even k and odd n:
2
3
n m
The exceeding probability P gives the best (maximum) solution for v
Incomplete Beta Function
v
pdv v
P
0
1 ) (
We are interested in the best solution, with the highest external variance.
We need the exceeding probability for high var
extP(v) for different k
Can we give a formula for in order to derive v(k)?
2
20 breaks
dk dv
Increasing the break number from k to k+1 has two consequences:
1. The probability function changes.
2. The number combinations
increase.
dv/dk sketch
P(v) is a complicated function and hard to invert into v(P).
Thus, dv is concluded from dP / slope.
And the solution is:
k b re ak s
k+ 1 b re ak s
1
0
1 )
( i
l
l
l v m
l v v m
P
v k
v k n
k k c n
k n
v dk
dv
1 ln 1
2 1 ln 1
1
1
2
Solution
5 ln 1 2
2 ln 1 1
1
*
*
*
*
k k k
dk dv v k
**
*
*
1 5 ln 1 2
2 ln 1 1
1 dk
k k
dv k
v
1 * 2 ln( 5 ) 2 1 1 * * 2 1
*1
k
k k k
v
Constance of Solution
10 1 y e ar s 21 y ea rs
The solution for the exponent
is constant for different length of
time series (21 and 101 years).
The extisting algorithm Prodige
Original formulation of Caussinus and Mestre for the penalty term in Prodige
Translation into terms used by us.
Normalisation by k* = k / (n -1)
Derivation to get the minimum
In Prodige it is postulated that the relative gain of external variance is a constant for given n.
1 2 ln min
ln v k
*n
0
ln 1 2
1
*
n
dk dv v
n
dk dv
v 2 ln
1 1
*
ln min
1 1 2
ln
n
n v k
min )
1 ln(
2 )
(
) (
1 ln )
(
1
2 1
1
2
n
n l k Y
Y Y Y n Y
C
ni i
k
j j j k
Our Results vs Prodige
We know the function for the relative gain of external variance.
Its uncertainty as given by isolines of exceeding probabilities for 2 -i are characterised by constant distances.
Prodige propose a constant of 2 ln(n) ≈ 9
Exceeding probability 1/128
1/64 1/32 1/16 1/8 1/4
Wrong Direction
n = 101 years n = 21 years
True Breaks
Only true for constant lengths
True breaks with fixed distances behave identical to random data.
For realistic random lengths the exponent is slightly increased.
Sub-periods with random lengths Sub-periods with
constant lengths
data theory
theory data