Using the distribution of increments in Composite Reference difference time series to determine the temperature trend bias caused by inhomogeneities

(1)

Using the distribution of increments in Composite Reference difference

time series to determine the

temperature trend bias caused by inhomogeneities

Ralf Lindau

(2)

Inhomogeneities

Climate records are affected by

breaks resulting from relocations or changes in the measuring

techniques.

For the detection, differences of neighboring stations are considered to reduce the dominating natural variance.

Homogenization algorithms identify breaks by searching for the

maximum external variance (explained by the jumps).

(3)

Composite Reference method

Networks of neighboring stations are built (about 10 stations).

The average time series is built. (Composite reference).

This is subtracted from each candidate station.

As result we obtain about 10 difference time series with full break signal from candidate

with weaken break signal from the composite

without climate signal (assumed to be equal within network, appreciated) without trend bias (because it is just subtracted, not appreciated)

(4)

Trend bias

Only if the jumps are on average non-zero , they introduce a trend bias, (which is really harmful).

Otherwise they introduce only some additional scatter into the data.

Thus, we concentrate on the trend bias induced by inhomogeneities.

(5)

CR Approach Fails (1/2)

Panel (a) shows the step function of a candidate station and the

idealized composite reference Panel (b) the saw-shaped difference

time series between the two

together with the averages of the detected subperiods (thick),

Panel (c) the corrected (thick) and the original step function.

(explained by the jumps).

Some additional steps are inserted but the trend is not corrected.

b a

c

(6)

CR Approach Fails (2/2)

In the last example, we assumed only a common network bias and no inter-stational variance of the breaks.

Normally, the latter effect is large (compared to the bias) and superimposed.

The method finds and corrects these station-specific breaks.

As they dominate the variance, the procedure seems to perform well.

However, just the bias is missing.

Nonetheless, the CR approach can be used to correct the trend bias.

(7)

Increment distribution

A few large jumps, positive on average

A negative trend in between,

consisting of many small jumps, negative on average

(or vice versa)

The mean of all increments within a network is exactly zero, because each break of a station appears with a mirrored sign n times but attenuated by 1/n in the reference.

However, the median is negative (when the trend bias is positive)

(8)

Main classes and subclasses

Increments from modelled data (random Poisson-distributed):

Networks: 1000

Stations: 11

Month: 1001

Main class 0: no break in candidate 1: break in candidate Subclasses k: k breaks in reference Subclass means:



(9)

Some statistics (mean)

Frequency of main class 1:

Frequency of main class 0:

Relative frequency of the subclasses k (Binomial distribution):

Mean of class 0:

Mean of class 1:

The overall mean is actually zero

(10)

Some statistics (variance)

The slightly different means of each subclass impose a small additional variance to the main class. However, as the bias is small, this is negligible. Then:

Variance of class 0:

Variance of class 1:

The variances s₀² and s₁² are closely

connected to the signal-to-noise ratio SNR.

Approx. noise variance

Approx. noise plus break variance

(11)

Two main classes

Model input:

n = 10 p = 0.05 B = 1 K s_d = 1 K s_e = 1 K

Model output

Main class 0 1

Number: 10.450.296 549.704 Mean: - 0.050 0.950 Variance: 1.091 2.107

(12)

Median of two Gaussian

We assume a positive bias.

We go from to . The median is not yet reached because now half of class 0 is smaller, but not yet half of the class 1 data.

1. Wide horizontally striped Class 0 data

2. Wide vertically striped:

Class 1 data

3. Narrow horizontally striped:

Increment necessary to reach the median (class 0)

4. Black (negligible):

Increment necessary to reach the median (class 1)



(13)

Four terms (1/2)

I: We stand at x₀ and consider class 0:

Half of class 0 is reached, but this class contains only a fraction of 1 – p II: We stand at x₀ and consider class 1:

Half of class 1 is reached at x₁, but we are –B/s₁ away from x₁. III: We proceed by dx and stand approximately at 0 (class 0):

We are pB/s₀ away from x₀. The normalized increment is dx/s₀. IV: We proceed by dx and stand approximately at 0 (class 1):

We are (1 – p)B/s₁ away from x ₁. The normalized increment is dx/s₁.

I II III IV

(14)

Four terms (2/2)

Neglect IV while replacing 1-p by 1 in III:

Solve for dx:

with:

q is approx. 1, because we are in both

cases very near to 0 in the distribution

I II III IV

(15)

Approximation works

Formula with q = 1 (thin) Model results (thick) with p = 0.05

s₀= 0.5 K

s_d= 0.1 – 0.9 K

The median is a linear function of the product pB.

This is equal to the total temperature change caused by the break bias.

The slope depends on the quotient s /s

(16)

Real data

2°-by-2° grid box in USA Fitting two functions for

the inner and the outer distribution provides an estimate for the slope 1 - s₀/s₁ (0.6) Together with the median

(0.0028 K), we obtain a trend bias of -0.46

K/cty for this particular grid box.

(17)

66 grid boxes

A number of 66 2°-by-2°

grid boxes in the USA finds a mean trend bias of

– 0.0515 K/cty with a stddev of 0.6996 K/cty 0.6996 / 8.12 > 0.0515 Not significant

(18)

Modelled data, one gridbox

Input: Output:

n = 10 p = 0.10 s_d = 1.0 K s_e = 0.5 K

s₀ = 0.522 K s₀ = 0.552 K s₁ = 1.128 K s₁ = 1.108 K

�₀²=�+1

� �_�²�₁²=�+1

� �_�²+�_�²

Remember

(19)

Modelled data, 900 grid boxes

900 networks containing 10+1 stations

Realistic circumstances s₀ = 0.522 K

s₁ = 1.128 K p = 10 cty^-1 B = 0.05 K

 Trend bias = 0.5 K/cty Finding:

Mean deviation from inserted network trend: –0.017 K/cty RMS error: 0.790 K/cty

Unbiased method to determine

(20)

Using the distribution of increments in Composite Reference difference time series to determine the temperature trend bias caused by inhomogeneities