What is the correct number of break points hidden in a climate record?

(1)

What is the correct number of break points

hidden in a climate record?

(2)

Defining breaks

Consider the differences of one station compared to a reference. (Kriged ensemble of surrounding stations) Breaks are defined by abrupt changes in

the station-reference time series.

Internal variance within the subperiods External variance

between the means of different subperiods

Criterion:

Maximum external variance attained by a minimum number of breaks

(3)

Decomposition of Variance

n total number of years N subperiods

n_i years within a subperiod

The sum of external and internal variance is constant.

(4)

Two questions

Titel of this talk asks: How many breaks?

Where are they situated?

Testing of all permutions is not feasible.

The best solution for a fixed number of breaks can be found by Dynamical Programming

1013

10 9 8 7 6 5 4 3 2 1

90 91 92 93 94 95 96 97 98 99 10

99

1 



 



 



 



 



  k n

(5)

Dynamical Programming (1)

Find the optimum positions for a fixed number of breaks.

Consider not only the complete time series, but all possible truncated variants.

(6)

Dynamical Programming (2)

Find the first break by simply testing all permutions.

(7)

Dynamical Programming (3)

Fill up all truncated variants. The internal variance consists now of two parts: that of the

truncated variant plus that of the rest.

(8)

Dynamical Programming (4)

Fill up all truncated variants. The internal variance consists of two parts: that of the truncated variant plus that of the rest.

Search the minimum out of n.

(9)

Dynamical Programming (5)

The 2-breaks optimum for the full length is found.

To begin the search for 3 breaks, we need as before the previous solutions for all, also shorter length.

This needs n²/2 searches, which is for larger numbers of breaks k much less than all

permutations (n over k).

(10)

Position & Number

Solved:

The optimum positions for a fixed number of breaks are known by Dynamical Programming.

Left:

Find the optimum number of breaks.

The external variance increase in any case with increasing number of breaks.

Use as benchmark the behaviour of a random time series.

(11)

Segment averages

with stddev = 1

Segment averages x_i scatter randomly mean : 0

stddev: 1/

Because any deviation from zero can be seen as inaccuracy due to the limited number of members.

ni

(12)

External Variance

The external variance

is equal to the mean square sum

of a random normal distributed variable.

Weighted measure for the variability of the subperiods‘

means

(13)

 ² -distribution

n: Length of time series (Number of years) k: Number of breaks

N = k+1: Number of subsegments

[ ]: Mean of several break position permutations [var_ext] = (N-1)/n = k/n

In average, the external variance increases linearly with k.

However, we consider the best member as found by DP.

varext ~



N2 The external variance is chi²-distributed.

Def.:

Take N values out of N (0,1), square and add them up.

By repeating a N2-distribution is obtained.

(14)

21-years random data (1)

1000 random time series are created.

Only 21-years long, so that explicite tests of all

permutations are possible.

The mean increases linearly.

However, the maximum is relevant (the best solution as found by DP) Can we describe this function?

First guess: ¹^^v ^



¹^^k^*



⁴

(15)

21-years random data (2)

Above, we expected the data for a fixed number of breaks being chi²-distributed.

(16)

The random data does not fit exactly to a chi²-distribution.

The reason is that chi²has no upper bounds.

But varext cannot exceed 1.

A kind of confined chi² is the beta distribution.

From  ² to  distribution

n = 21 years k = 7 breaks

data

^

(17)

From  ² to  distribution

n = 21 years k = 7 breaks

data

^

X ~ ²(a) and Y ~ ²(b)  X / (X+Y) ~ ^{(a/2, b/2)}

If we normalize a chi²-distributed variable by the sum of itself and another chi²-distributed variable, the result will be -distributed.

The -distribution fits well to the data and is the theoretical distribution for the external variance of all break position permutations.



(18)

From  ² to  distribution

^



11

15 17

7

) (

) ( ) ) (

,

( a b

b b a

a

B  



 

 



 



  

  ^



 

2 , 1 2 ) 1

(

2 1 1 1

2

k n

B k

v v v

p

k k n

with

We are interested in the best solution, with the highest external variance, as provided by DP.

We need the exceeding probability for high var_ext

(19)

Incomplete Beta Function

 



 



  

  ^



 

2 , 1 2 ) 1

(

2 1 1 1

2

k n

B k

v v v

p

k k n

External variance v is -distributed

and depends on n (years) and k (breaks):

The exceeding probability P gives the best (maximum) solution for v

Incomplete Beta Function

 



^



 



 



 ¹

0

1 )

( ⁱ

l

l v m

l v v m

P

Solvable for even k and odd n:

2 i k

2

3

 n m













v

pdv v

P

0

1 ) (

(20)

Example 21 years, 4 breaks

 



^



 



 



 ¹

0

1 )

( ⁱ

l

l v m

l v v m

P

k = 4  i = 2 n = 21  m = 9



¹



⁹ ⁹



¹



⁸

)

(v v v v

P    

2 ik

2

3

n m

(21)

Theory and Data



¹



⁹ ⁹



¹



⁸

)

(v v v v

P    

Theory (Curve):

Random data (hached) fits well.

(22)

Nominal Combination Number

4 4845 20

1  



 







 



  k n

For n = 21 and k = 4 there are

break combinations.

If they all were independent we could read the maximum external variance at (4845)^-1≈ 0.0002

being 0.7350

However, we suspect that the break combinations are not independent. And we know the correct value of var_ext.

(23)

Effective and Nominal

Remember:

var_ext= 0.5876 for k=4

The reverse reading leads to an

23 times higher exceeding probability.

This shows that the break permutations are strongly dependent and the effective number of combinations is smaller than the nominal.

However, the theorectical function is correct.

(24)

From 21 years to 101 years

As we now know the theoretical function, we quit the explicit check by random data.

And skip from unrealistic short time series (n=21) to more realistic (n=101).

Again the numerical values of the external variance is known and we can conclude the effective combination numbers.

Can we give a formula for in order to derive v(k) ?

2

20breaks

dk dv

(25)

dv/dk sketch

Increasing the break number from k to k+1 has two consequences:

1. The probability function changes.

2. The number combinations increase.

Both increase the external variance.

k bre aks

k+1 b reaks

(26)

Using the Slope

P(v) is a complicated function and hard to invert into v(P).

Thus, dv is concluded from dP / slope.

We just derived P(v) by integrating p(v), so that the slope p(v) is known.

k bre aks

k+1 b reaks

 



^



 



 



 ¹

0

1 )

(

i l

l

l v m

l v v m

P

(27)

The Slope

 

) (

) )) (

(

ln( P v

v v p

dv P

d  

     

 

^





 

 



 







 





 



 1

0 1

1 1 1 1

)) (

ln( i

l

l l m

i i m

v l v

m

i i m m v v

v dv P

d

     

  ¹

1 1

1 1 1

)) ( ln(



 

 



 









 





 





i i m

v i v

m

i i m m v v

v dv P

d

 

v i v m

dv P d





 

 1

)) 1 ( ln(



^P ^v



ⁿ



^k



d   1

)) ( ln(

Insert the known functions:

The last summand dominates:

Reduce and replace m and i:

(28)

Distance between the Curves

     

 

_^^















 



 





 



 





















 1

0 0 1

1 1 ln

ln

ln i

l

l l m

i l

l l m

i i

v l v

m

v l v

m P

P

     

  _^^















 



 







 



 











 





1 1 1

1 1 1 ln

ln ln

i i m

i i

v i v

m

v i v

m P

P

     

  ^_^









 

 

v i

v i P m

P_i _i

1 ln 1

ln ln ₁

     

 

^^





 

 

v k

v k P n

P_k _k

1 ln 1

2 ln 1

ln ₁

The last summand dominates:

Reduce and replace m and i:

(29)

Effective combination growth

Nominal Growth Rate

-2 ln ( (n-1- k) / k) ^Ln: Logarithmic sketch

minus: Number of combinations is reciprocal to Exceeding Probability 2: Exceeding Probability only known for even break numbers



 



  k n 1



 







 1 1 k (n-1-k) / k n

However, break combinations are not independent and we know the effective number of combinations

(30)

Ratio: nominal / effective

k1 k2 k nominal effectiv c=nom/eff

2 4 3 -2.552 -7.784 0.328

4 6 5 -2.186 -6.952 0.315

6 8 7 -1.963 -6.356 0.309

8 10 9 -1.765 -5.889 0.300

10 12 11 -1.645 -5.503 0.299

12 14 13 -1.514 -5.173 0.293

14 16 15 -1.435 -4.885 0.294

16 18 17 -1.363 -4.627 0.295

18 20 19 -1.292 -4.394 0.295

The ratio of nominal / effective is approximatly constant with c = 0.3

(31)

Very Rough Solution

   

 

^_^



 

 







 



 



  



 

v k

v k n

k k c n

k n

v dk

dv

1 ln 1

2 1 ln 1

1 1 2

1

*

  n k k Normalisation

for small k^*

) 4 ln(

) 100 ln(

3 . 0

2  



 : 4

) 1 (

) 1

ln ( ln 1

1 2 1

*

* 





 







 





 



  













 k v

v k

n k c k

dk dv v

k

15 . 4 39

. 1 76

.

2  



for n = 100

(32)

The Two Contributions

 : 4

) 1 (

) 1 ln ( ln 1

1 2 1

*

* 





 







 





 



  













 k v

v k

n k c k dk

dv v

truth k

estimate

(33)

Exact Solution

 5 ln 1 2

2 ln 1 1

1

*

* 



 



 

 



k k k

dk dv v k

  *

*

1 5 ln 1 2

2 ln 1 1

1 dk

k k

dv k

v 











 



 



  





¹ ^*



²^ln(⁵⁾ ²¹ ¹ _* ^* ²¹ ^*

1

k

k k k

v



 

 



  





(34)

Constance of Solution

101 years21 years

The solution for the exponent  is constant for different length of time series (21 and 101 years).

(35)

The extisting algorithm Prodige

Original formulation of Caussinus &

Lyazrhi for the penalty term as adopted by Mestre for Prodige

Translation into terms used by us.

Normalisation by k* = k / (n -1)

Derivation to get the minimum

In Prodige it is postulated that the relative gain of external variance is a constant for given n.

1  2 ln  min

ln v  k^* n 

  ⁰

ln 1 2

1

*  

  n

dk dv v

 ⁿ

dk dv

v 2ln

1 1

* 



  ^ln  ^min

1 1 2

ln 

 

 n

n v k

 

min )

1 ln(

2 )

(

) (

1 ln )

(

1

2 1

1

2

 

 



















 







 n

n l k Y

Y Y Y n Y

C _n

i i

k

j j j k

(36)

Our Results vs Prodige

We know the function for the relative gain of external variance.

Its uncertainty as given by isolines of exceeding probabilities for 2^-i are characterised by constant distances.

Caussinus and Lyazrhi (adopted by Mestre) propose just a constant of 2 ln(n) ≈ 9

Exceeding probability 1/128

1/64 1/32 1/16 1/8 1/4

(37)

Wrong Direction

n = 101 years n = 21 years

(38)

Conclusion

We have found a general mathematical formulation how the external variance of a random time series is increasing when more and more breaks as given by Dynamical Programming are inserted.

This is much more accurate than existing estimations and can be used in future as benchmark to define the optimum number of breaks.

(39)

Integrated result

How does the found function look like after integration?

Crosses: Test data Line: Theory

Error bars: 90 and 95 percentile

(40)

Appendix (1)

 v

l m

l v m f

i i

 



 









 







1 1

 

 v

l

v l f m

i i





 

1 1

 

 v

l

v l f n

k k



 

1 1

 

 ^v

k

v k f n



 

1 1

Consider the individual summands of the sum as defined in

The factor of change f between a certain summand and its successor is:

m and i can be replaced by n and k:

inserting k instead of lk is a lower limit for f because (n-1-l_k)/l_k, the rate of change of the binomial coefficients, is decreasing monotonously with k:

where li runs from zero to i. The ratio of consecutive binomial coefficients can be replaced and it follows:

normalised by 1/(n-1):

 

^



 



 



 ¹

0

1 )

( ⁱ

l

l v m

l v x m

P

(41)

Appendix (2)

  





 ^*⁴

*

* 4

*

1 1 1 1

k k

k f k



 

 

 ^*³

*

* 4

1 1 1

k k f k



 

 

¹¹ ³⁴  ¹⁴ ³  ¹ ⁴³  ⁴

1

*

* 

 



 

k k

k k k

k f k

 ^  ^ ^  _ ^^ _ _





1 0 1

1

*

* 4

* 3

k

k f k

the approximate solution is known with 1-v = (1- k*)⁴

0 k

1 k

We can conclude that each element of the sum given above is by a factor f larger than the prior element.

For small k* the factor f is greater than about 4 and grows to infinity for large k*. Consequently, we can approximate the sum by its last summand according to:

  ¹   ¹

1

1 1 1

)

( ^ ^ ^^



  

 





 

 



 



 ⁱ  ⁱ ^m ⁱ

i l

l

l m v v

i v m

l v x m

P

(42)

Application (1)

Insert in each of 1000 random time series 5 breaks of variance 1.

The change of external variance for low break numbers (1, 2, 3 up to about 10) increase.

Lying above the theoretical function for random time series without any break (arrow).

Variances of break numbers higher than 5 increase, because the inserted 5 breaks are not always the biggest.

(43)

Application (2)

Stop break search, when the growth rate for the external variance drops firstly below the theoretical one for zero breaks.

1 Example of 1000 test time series

Crosses: Observations Thin line: Inserted breaks Fat line: Detected breaks

In average over 1000 samples:

Added variance: 86%

(theoretically 5/6)

Remaining after correction: 27%

Average detected break number5.48

What is the correct number of break points hidden in a climate record?