• Keine Ergebnisse gefunden

S TOCHASTIC PROCESSES .1 Basic definitions

B ASIC M ETHODS AND D ATA

2.1 S TOCHASTIC PROCESSES .1 Basic definitions

As outlined in Chap. 1, we use data analysis methods to retrieve information about hy-drological processes. This methodology is not restricted to assess observed data, but is also capable to reproduce empirical data, which is useful either for prediction purposes or for simulation studies to assess uncertainty issues. In time series analysisstochastic processesare used to model the data. These processes are rather driven by randomness to model fluctuations of empirical data than by non-linearity (which mostly is utilised by dynamical models built up by e.g. differential equations). In the following we re-fer to standard concepts and definitions of stochastic processes as given, for example, in Priestley (1981). In Chaps. 3 and 4 these concepts are used to build up our trend assess-ment frameworks.

Arandom variable Xis a function from a sample spaceSinto the real numbers. With every random variable, we associate a function called the cumulative distribution func-tion (cdf) ofX, which we denoteFX(x). It is defined by

FX(x) =PX(Xx) for allx . (2.1) Theexpected valueormeanof a random variableX, denoted byE(X), is

E(X) =µ=

Z

x fX(x)dx , (2.2)

where fX(x)is the density (pdf) of X. It is the firstmomentof Xand the nth centralised moment is defined byµn = E(Xµ)n. The second moment is thevarianceof a random variable. The moments of a distribution are important characteristics of a distribution, but they do not need to exist.

A sequence of random variables in time{Xt} ≡ X1,X2, . . . is called astochastic pro-cess, or random process. A sample of observationsx1,x2, . . . ,xn can be interpreted as a realisation of a stochastic process. In this case, the values of these observations cannot be predicted precisely beforehand but probabilities can be specified for each of the dif-ferent possible values at any particular time. These probabilities are determined by the marginal cdf of each random variableXt.

9

LetFX1,X2,...,Xn(x1,x2, . . . ,xn)denote the joint distribution ofX1,X2, . . . ,Xn. A stochas-tic process{Xt}iscompletely stationaryin case

FXt1,Xt2,...,Xtn(x1,x2, . . . ,xn)≡ FX(t1+k),X(t2+k),...,X(tn+k)(x1,x2, . . . ,xn) (2.3) holds for any admissiblet1,t2, . . . ,tnRand anyk.

This basically means that for any set of time pointst1,t2, . . . ,tn the joint probability distribution ofXt1,Xt2, . . . ,Xtn must remain unaltered if each time point is shifted by the same amountk. Complete stationarity implies independence from time for all moments (in case they exist). This requirement can be relaxed tostationarity up to an order m. Here it is only demanded that the main features of the distributions of Xt1 and X(t1+k) are similar, i.e. that their moments up to orderm have to be the same. Definitions ofweak stationarityonly assume time independence of moments up to order 2 (for further details see Priestley 1981). The distribution of a normal distribution is determined by its first two moments. Thus, for a Gaussian time series, weak stationary is equivalent to strict stationary.

The auto-correlation function ρ(τ)of weak stationary processes depends only on the differencesτ= titj, which results in

ρ(τ) = E([XtE(Xt)][Xt+τE(Xt)])

E(X2t)−[E(Xt)]2 = E(XtXt+τ)−[E(Xt)]2

VAR(Xt) . (2.4) 2.1.2 Autoregressive moving average processes

Autoregressive moving average (ARMA) processes are linear stochastic processes which model the auto-correlation structure of a data series. In our setting, the domain over which these random functions are defined, is a time interval. The auto-correlation struc-ture can be seen as the “memory” of a time series, e.g. due to auto-correlation events in the past influence events in the present or future despite of the concept of randomness, which drives stochastic processes. The intensity of this memory depends on the influence of factors such as weather or soil conditions. Precipitation, for example, often does not have a memory at all and therefore is said to bepurely stochastic. Hipel and McLeod (1994) present an overview for the physical justification of using ARMA models to represent river flow data. Autoregressive moving average (ARMA) processes now are well-known discrete parameter models incorporating auto-correlations of the random variable itself (AR) and a noise part (MA). Assumingµ=E(Xt) =0 an ARMA(p,q) process is defined by

φ(B)Xt =ψ(B)ǫt (2.5)

with Bdenoting the back-shift operator BXt=Xt1 andǫtbeing independent and iden-tically distributed (iid) normal random variables with zero expectation and variance σǫ2 and

are the autoregressive (AR) and moving average (MA) polynomials of order p and q, respectively.

An ARMA(p,q) process is stationary in case all solutions ofφ(z) = 0 lie outside the unit circle andφ(z)andψ(z)do not have common roots1. It iscausal if all solutions of ψ(z) =0 lie outside the unit circle.

For ARMA processes, the asymptotic decay of the correlations is exponential in the sense that there exists an upper bound for the auto-correlation functionρ(k), i.e.,

|ρ(k)| ≤bak , (2.7)

where 0<b< ∞, 0< a<1. |a|<1 holds, therefore we have∑k=0ρ(k)= constant<∞. Stochastic processes with this kind of correlation structure are calledshort-range correlated processes. For further details see, e.g., Box and Jenkins (1976).

2.1.3 Fractional ARIMA processes

An autoregressive integrated moving average (ARIMA) process (Box and Jenkins 1976) is obtained by integrating an ARMA process. Thus, if Eq. (2.5) holds for theδth difference (1−B)δXt, thenXtis called an ARIMA(p,δ,q) process forδbeing integer values.

Fractional ARIMA (FARIMA) processes (Granger and Joyeux 1980) are an extention of these well studied processes incorporatinglong-range dependenceorlong memoryor long-term correlation. Long-range dependence is qualitatively different from the short-ranged AR or MA dependence. Auto-correlations of long-range dependent data decay slower than exponential and it is not possible to determine a specific time lag so that we find correlations of larger lags getting negligible. To be more specific, a process has long-range dependence or long-long-range correlation if the auto-correlation functionρ(k)decays algebraically in the limit of large time lagsk:

klim

ρ(k)

ckβ =1 , (2.8)

withβ∈ (0, 1)andc> 0 being a finite constant. This implies that, contrary to the short-range dependent case,∑k=ρ(k) =holds, that is the correlations are not summable.

Assumingµ= E(Xt) =0, a FARIMA(p,δ,q) process is defined by

φ(B)(1−B)δXt =ψ(B)ǫt , (2.9) with B, ǫt, φ(z) andψ(z)as introduced in Eq.( 2.6) andδR being the fractional dif-ference or long-memory parameter. A FARIMA(p,δ,q) process is causal and stationary if δ<0.5 and all solutions ofφ(z) =0 andψ(z) =0 are outside the unit circle. A variety of non-stationary long-memory processes have stationary backwards differences, i.e. they can be made stationary by differencing. Thus, to obtain a stationary process out of a non-stationary one, where the non-stationarity is caused by the long-memory parameter, Xt must be differenced δi times, thereby δi is the integer part of δ. Also a random walk process can be made stationary by differencing (Percival and Walden 2000). A FARIMA

1Roots are the solutions of a polynomial set equal to zero.

process exhibits long memory for 0 < δ < 0.5. FARIMA(p,δ,q) models withδ < 0 are called intermediate memory or “overdifferenced”. In practice, this case is rarely encoun-tered (Beran 1994).

For any real numberδR, i.e. the FARIMA case, the difference operator can be expanded to an infinite power series

Here, Γ(x) denotes the gamma function. This formula can be reduced to Eq. (2.10) for the ARIMA case (for negative integers the gamma function has poles, that is the bino-mial coefficient is zero if k > δ andδ is an integer). For more details see Beran (1994), Ooms and Doornik (1999), and Sibbertsen (1999). Further details concerning the auto-covariance functionρ(k)of a FARIMA process or its spectrum (the Fourier transform of ρ(k)) are given in appendix A-1.

Three of the most simple models out of the FARIMA(p,δ,q) class are:

1. FARIMA(1,0,0) or AR(1) The first model consists of only a short-range correlated component with parameter φ1, the second model of a long-range correlated component only with long memory param-eterδ. The third model combines the previous two in the sense that forδ =0 orφ1 =0 it recovers the first or the second model, respectively.

The assessment of long-range dependence first became famous with the work of the hydrologist Hurst (1951), who was interested in modelling the storage capacity of reser-voirs of the River Nile. By studying the flow of the Nile, he formulated a power law. In this formula the famousHurst coefficient H is used. H is related toδby H = δ+0.5 for stationary processes (for non-stationary long memory processesHis not defined).

In Fig. 2.1 as an example a realisation of a FD(δ) process with parameter δ = 0.4 and an AR(1) process with parameter φ1 = 0.8 and their auto-correlation functions are depicted. Both processes have mean 0 and standard deviation 1. The AR(1) process exhibits an exponential decay and its auto-correlation function is summable (depicted in green in the right figure). On the other hand, ρ(k)of the FD(δ) process decays much slower, algebraically.

0 2000 4000 6000 8000 10000

−10−505 FD(δ) AR(1)

0 10 20 30 40

0.00.20.40.60.81.0

lag

acf

acf FD(δ) algebraic decay acf AR(1) exponential decay

Figure 2.1: Example of a FD(δ) and AR(1) series and auto-correlation function. Left: Realisation of a FD(δ) process withδ=0.4 (black) and of an AR(1) process withφ1=0.8 (blue). The standard deviation of both processes is 1 and the mean zero, though the AR(1) process is shifted to be-come visible.Right: Auto-correlation function of the FD(δ) process (black) and the AR(1) process (blue). The auto-correlation function of the AR(1) process decays exponentially, i.e.ρ(k) ≤ |1k| (which can be rewritten as exp{ln(b) +|k|ln(φ1)}). This function is depicted in green. The auto-correlation of the FD(δ) process decays much slower, i.e. algebraically, according toc|k|1, which is shown in red. Herebyb>0 andc>0 are constants.