• Keine Ergebnisse gefunden

Adjustment of Temperature Trends In Landstations After Homogenization ATTILAH

N/A
N/A
Protected

Academic year: 2021

Aktie "Adjustment of Temperature Trends In Landstations After Homogenization ATTILAH"

Copied!
32
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Adjustment of Temperature Trends In Landstations After

Homogenization ATTILAH

Uriah Heat

Unavoidably Remaining Inaccuracies After Homogenization Heedfully Estimating the Adjustments of Temperature trends

(2)

Break and Noise Variance

Homogenization

To homogenize we consider the

difference time series between two neighboring stations.

The dominating natural variance is cancelled out, because it is very similar at both stations.

The relative break variance is increased and we have a realistic chance to detect the breaks.

General proceeding

Random combinations of test breaks are inserted. That one explaining the maximum variance is considered to show the true breaks.

Technical application

Dynamic Programming with Stop criterion

Noise Var Break Var

(3)

Dipdoc Seminar – 23.10.2017

Trend bias

Trend bias

If the positive and negative jumps do not cancel out each other, they introduce a trend bias.

Underestimation of trend bias

It is impossible to isolate the full break signal from the noise. Thus, only a certain part of it can be corrected. A small fraction remains (which has to be corrected after homogenization).

(4)

Underestimation of jump height

The two fat horizontal lines indicate the true jump height

Errors occur when the noise randomly (and erroneously) increases the data above the middle line.

Then, a part of Segment 2 is (erroneously) exchanged to Segment 1

Correct detection

x1 and x2 are determined as segment averages.

x1 is nearly correct, but x2 is to high.

Incorrect detection

x1’ and x2’ are determined as segment

averages. x2’ is nearly correct, but x1’ is to low.

In both cases the jump height is underestimated.

(5)

Dipdoc Seminar – 23.10.2017

Obviously, this systematic underestimation depends on the interaction between noise and break variance.

To quantify this effect, the statistical properties of both break and noise variance has to be known.

Nomenclature

k: Number of test breaks (here: 3) n: Number of true breaks (here: 7) m: Total length (here: 100)

l: Test segment length (here: 14, 4, etc.)

(6)

Statistical Characteristic of the Noise Variance

Beta distributed

(7)

Dipdoc Seminar – 23.10.2017

Example for noise variance

We insert k = 3 random test breaks and check the variance they are able to explain.

Since we have pure noise, the test segments’ means are very close to zero.

However, there is a small random variation: This is the explained variance.

(8)

Statistic for Noise Variance

We insert k = 3 test breaks at random positions into a random noise time series and calculate the explained variance.

This procedure is repeated 1000 by 1000 times (1000 time series and 3000 test break positions).

Relative explained variance:

=��������� ���

����� ���

(9)

Probability density:

with Beta function

For noise the shape parameters are:

when k denotes the number of test breaks and m the total length

Beta distribution

Dipdoc Seminar – 23.10.2017

(10)

The mean of a Beta distribution is given by:

Mean explained variance:

Maximum explained variance:

Behavior of Noise

optimum

mean (random)

/ (m-1)

Remember

(11)

Statistical Characteristic of the Break Variance

1. Heuristic approach 2. Empirical approach 3. Theoretical approach

(12)

For true breaks, constant periods exist. Tested segment averages are the (weighted) means of such (few) constant periods.

This is quite the same situation as for random scatter, only that less independent data is underlying.

Obviously, the number of breaks n plays the same role as the time series length m did before for noise.

Thus, the first approximation is:

First Approach

(13)

Second Approach (1/3)

However, this would lead to

This is obviously only true, when all real breaks are actually matched by the test breaks, (which is not the case for random trials).

Consider k=3, n=7 and count the number of platforms in each test segment. Altogether, there are 11

“independents”, in general n+k+1.

Dipdoc Seminar – 23.10.2017

4 2 4 1

Remember

(14)

Second Approach (2/3)

For noise, we had:

Now we have n+k+1

”independents”, thus:

4 2 4 1

=

+=

+= ������������

(15)

Second Approach (3/3)

This would lead to

This is rather reasonable, because for n = k the situation is

approximately:

Each test segment contains one true break, thus two independents,

which are then averaged. This leads to a reduction of the variance by a factor of 2.

However, so far we did not take into account that the HSPs have different lengths.

The effective number of true breaks must be smaller than the nominal.

Dipdoc Seminar – 23.10.2017

4 2 4 1

(16)

Effective observation number

If we generate i = 1…N random time series of length j = 1…m with each element being:

only a fraction of (m-1)/m can be found within the time series (because a fraction of 1/m is “lost” due to the variance of the time series means.

How large is this effect if a step function with n breaks is considered?

(17)

Sketch of derivation (1/2)

Dipdoc Seminar – 23.10.2017

��� ( ´ ) = [ ´ ´ ] = [ ( 1

�+1=1

)

2

] = 1

2

=1+1

�+=11

[

]

´ = 1

�=1

�+1

[

2]=¿ 1

2

�=1

+1

[

2][ ]=12

�=1

�+1

[ 2]

��� ( ´)= 1

2

=1

�+1

¿

��� ( ´)=+1

2

1

(

1

)

=1

(

1 1

)

2= 1

(

+1

)

�=1

(

�−1 1

)

2

The mean of each time series is:

The “lost” variance is:

Which can be reduced to the sum over mean squared lengths:

Which is equal to the weighted sum over l2

(18)

Sketch of derivation (2/2)

The sum of a product of two Binomial coefficients is solvable by the

Vandermonde’s identity:

Which leads to a solution for the l2 sum:

Inserted into the original expression, we obtain for the

“lost” external variance:

The remaining internal variance is then:

�=0

(

)(

�−�−

)

=

(

+1+1

)

�=1

2

(

11

)

=2

(

�+2

)

+

(

�+1

)

��� ( ´ )= 2

(+2)

2

+2

1��� (´ )=(+1)

(+2)

�+2

(19)

Third approach

Dipdoc Seminar – 23.10.2017

The relative unexplained variance of a test segment:

with i: number of breaks within a test segment and n: number of breaks within the entire time series

i = l/m n:

m = l(k+1):

with n* = n/2 +1

Similar to the second approach, but n counts only half.

1=

+2

+2

1=

�+2

+2

= +2

+2

1= +2

+2(+1)=

2 +1

2 +�+1

=

�+�

=

+�

=

�+

Remember

�=��������� ���

����� ���

Remember:

(20)

Statistical Characteristic of the Break Variance

1. Heuristic approach 2. Empirical approach 3. Theoretical approach

(21)

Empirical Var(k,n)

Dipdoc Seminar – 23.10.2017

Empirical test with 1000 random segmentations (fixed k) of 1000 time series (fixed n).

Calculate the mean relative

explained variance v from these 1,000,000 permutations.

Repeat this procedure for all combinations of k = 1, …, 20 and n = 1, …, 20.

20 functions v(k) for the different n.

(22)

Stepwise Fitting (1/3)

v/(1-v) is proportional to k.

The slope is a function of n.

(Numbers and lines do not cross).

The slope is certainly not

proportional, but rather reciprocal to n. (slp(1) large, slp(20) small).

Thus, better to plot 1/slp(n).

=���()

(23)

We expect horizontal lines, if the reciprocal slope is really

independent from k.

This is largely confirmed.

Averages over k gives than a value for each n. These seems to be

rather linear in n. Thus, plot these averages as a function of n.

Stepwise Fitting (2/3)

Dipdoc Seminar – 23.10.2017

=

���(�)=�����

(24)

with a = 0.629 and b = 1.855

Stepwise Fitting (3/3)

[

]

=���(�)=��+

=

���(�)

=

+

���(�)

= , ����=.���+�.���

0.629n + 1.855

solve for v:

and insert an+b:

(25)

Application of findings

Dipdoc Seminar – 23.10.2017

Summarizing the stepwise fitting:

The direct fit in the v/k space yields:

The best heuristic approach was:

=

�+�,����=.���+�.���

=.+�

=�.���+�.���

(26)

The mean of Beta distribution is:

The variance of a Beta distribution is:

Which can be solved for a and b:

Method of moments

So far we developed an

empirical equation for v, the mean explained variance.

The same procedure is applied to derive equations for and , the shape

parameters describing the distribution of v. These coefficients are determined by the method of moments.

= ��

(+ +�) (+)

´ =

+

=

(

´ (´ )  

)

´

=

(

´ (´ )  

)

(´ )

(27)

Empirical values for a and b

Dipdoc Seminar – 23.10.2017

Again 1000 by 1000 permutations for fixed values of n and k are performed.

The explained variance is calculated (for 1,000,000 permutation).

From the mean and the variance of the resulting distribution, a and b are

determined by the method of moments.

This proceeding is repeated for

combinations of k = 1…20 and n = 1…20.

The result is plotted against k.

a is strongly dependent on k, converging obviously to a = k/2 for large n.

b is strongly dependent on n, converging obviously to b = n*/2 for large k.

(28)

Alpha/k and Beta/n*

=

+=

�+= ��

��+� �

= � �

=

=

(29)

1/f

Dipdoc Seminar – 23.10.2017

f is neither a linear function of k nor of n, it is more promising to depict the reciprocal .

1/f is rather linear in k with a slope reciprocal to n and an incept of 2.

=

=

=

�+

(30)

Determination of c

For a more detailed determination of c, we solve

for c:

and plot the result against k.

=

=

+�

=�+�+�.��()

=�� 2

= � �

(31)

Resulting fits for Alpha and Beta

Dipdoc Seminar – 23.10.2017

=

+� =

+

=�+�+�.��()

(32)

Conclusion

The explained noise variance is Beta distributed with:

The explained break variance is Beta distributed with:

with

Referenzen

ÄHNLICHE DOKUMENTE

In this work we show the application of the wavelet analysis for the study of mean seasonal snow depth in the Adige catchment and discharge data of the Adige and Inn river basins..

Die Kinder sollen jeweils zwölf Kopfrechenaufgaben lösen und rechts neben die Lösungen zur Kontrolle die Symbole einzeichnen, die sich bei den Aufgaben

hinten, dahinter vorne, davor parallel senkrecht runden. Zylinder

– Divide the original data values by the appropriate seasonal index if using the multiplicative method, or subtract the index if using the additive method. This gives the

– Divide the original data values by the appropriate seasonal index if using the multiplicative method, or subtract the index if using the additive method. This gives the

Quelle: INIFES, Berechnungen nach Statistisches Bundesamt (11. Koordinierte Bevölkerungsvorausberechnung Länderergebnisse), Potenzialerwerbsquoten nach Prognos AG (Werte für

Neben den in der Liste 1000 enthaltenen Büchern für die Erste lebende Fremdsprache können auch Bücher be- stellt werden, die in der Liste 1100 für die Zweite lebende

Interrupt is generated and the Busy bit is reset. If the proper sector is located, the sector buffer is written to the disk, an interrupt is generated and the Busy bit is