Discussion of the Choice of Sequential Data Assimilation Methods . 46

3.4 Sequential Data Assimilation

3.4.4 Discussion of the Choice of Sequential Data Assimilation Methods . 46

weight model predictions and measurements, and therefore to improve the data assimila-tion results (Evensen, 2007, p. 265). An example for a non-linear ensemble method is the particle filter (PF; Pham, 2001) that is already successfully applied to low-dimensional systems. The model forecast step of the PF is performed by evaluating the non-linear model equations with an ensemble of model states, such as in the EnKF/EnKS prediction step. In the update step of the PF, the full PDF of the model predictions and observations are considered to obtain corrected model updates. In contrast to EnKF/EnKS, the PDF of each ensemble member (the so-called “particle”) is updated in the PF approach rather than the states of the particle. Thus, for non-linear problems the application of PF might be more desirable. However, so far the PF or other non-linear ensemble methods (see e.g., references in Evensen, 2007, p. 265) could not be practically used for high-dimensional systems as it is the case for this thesis.

3.4. Sequential Data Assimilation 47

optimal initial conditions. The dynamic processes are described using physical equations, i.e. nonlinear partial differential equations, which are numerically solved by applying, e.g., spectral methods or finite element methods. The stochastics of the system are represented by the physical and energetic equations which are, e.g., described by stochastically per-turbed physics tendencies and stochastic kinetic energy backscatter schemes in ECMWF’s integrated forecasting system (IFS; Shutts et al., 2007, Palmer et al., 2009). (2) In contrast, hydrological models are driven by forcing input fields including precipitation, tempera-ture, long- and short-wavelength radiation. These boundary conditions are not perfectly known but subject to measurement and interpolation errors. In addition, uncertain model parameters highly influence the simulation accuracy. The stochastics lie in the forcing data and background information such as topography and land cover. Therefore, it is necessary to ingest observation data into the model as soon as they are available to constrain the current state of the system. Hence, sequential data assimilation is a suitable method in hydrological science. (3) Due to the simple form of the EnKF, ease of implementation, and affordable computational requirements (Evensen, 2007, p. 38), as well as already success-ful applications to oceanography, atmosphere, and recently to hydrology, the EnKF and two of its extensions, the SQRA according to Evensen (2004) and the SEIK filter (Pham et al., 1998), are implemented in this thesis and applied to a simple hydrological model in chapter 4 and to WGHM in chapters 7 and 8. In addition, the EnKS is implemented and applied to the simple hydrological model.

4. Ensemble Kalman Filter Methods

In this thesis, a new calibration and data assimilation (C/DA) framework is introduced.

This framework has been designed based on the classical ensemble Kalman filter (EnKF;

Evensen, 1994, Burgers et al., 1998, Evensen, 2003). After first successful applications (Schumacher, 2012, Eicker et al., 2014), the framework has been extended in Schumacher et al. (2016b) by the square root analysis scheme (SQRA; Evensen, 2004) and the singu-lar evolutive interpolated Kalman filter (SEIK; Pham et al., 1998). The first motivation for considering variants of the filter algorithm is that the classical EnKF approach uses an observation ensemble that introduces an additional source of sampling errors to the algorithm (Evensen, 2004). Whitaker and Hamill (2002) showed that for small ensemble sizes the sampling errors decrease by applying SQRA (Tippett et al., 2003, and refer-ences therein) as will be shown later in this chapter. The second motivation is to reduce computation time in the update step. The update step of the SEIK filter is performed in the ensemble space instead of the observation space, unlike for the EnKF and SQRA methods. Therefore, especially the assimilation of large numbers of observations, i.e. much larger than the ensemble size, is usually better handled by the SEIK filter. Thus, the SEIK filter allows to establish a flexible framework for larger river basins and for integrating additional observations of, e.g., river discharge, lake level, soil moisture or snow water equivalent in future work. Additionally, the ensemble Kalman smoother (EnKS; Evensen and van Leeuwen, 2000) is introduced for smoothing possibly occurring jumps in the EnKF update due to introduced observations.

In this chapter, the two-step procedure of the C/DA is described: (i) the ensemble predic-tion step (secpredic-tion 4.1), i.e. the forward integrapredic-tion of the model for each ensemble mem-ber, which is basically independent of the applied filter algorithm, and (ii) the update (or analysis) step that merges model states and observations (section 4.2). The algorithms of EnKF (section 4.2.1) as well as SQRA (section 4.2.2), SEIK (section 4.2.3) and EnKS (section 4.2.4) are discussed and related to the EnKF. A range of tuning techniques for optimizing the C/DA results, e.g., improved initial sampling (Evensen, 2004), variance inflation factors (Hamill and Snyder, 2002) and localization (Houtekamer and Mitchell, 2001, Hamill et al., 2001) are addressed in section 4.3. Furthermore, throughout the chap-ter a simple example is used to illustrate the filchap-tering procedure and to highlight selected issues. Therefore, a simple hydrological model was introduced in section 2.2.1.

4.1 Model Prediction

The model forward integration is implemented by evaluating the non-linear dynamical model equations, denoted by f(.),

xk =f(x_k−1,uk,p) +q_k−1. (4.1)

The model states x_k of the current time step k depend non-linearly on the model states x_k−1 of the previous time step (k−1), time dependent input forcing fieldsu_k and constant

model parametersp, as well as on unknown model errorsq_k−1 of any probability distribu-tion, i.e. not necessarily of the Gaussian distribution. Note that in addition to Eq. (3.21) forcing fields (u_k) and model parameters (p) are considered. In linear approaches, the error covariance matrix of the model is obtained from error propagation of the previous model state covariance matrix C(x_k−1) to the current time step, as given in Eq. (3.27), for which the model error covariance matrix Q_k−1 =E(q_k−1q^T_k−1) should be given.

In ensemble based data assimilation, the model equations are evaluated for each of the i= 1, . . . , N_e ensemble members (e.g., Evensen, 2007), i.e.

x⁽ⁱ⁾⁻_k =f(x⁽ⁱ⁾_k−1,u⁽ⁱ⁾_k ,p⁽ⁱ⁾). (4.2)

The model states x⁽ⁱ⁾⁻_k of the current time step k, referred to as model predictions, are denoted with the superscript ”−“. In this work, q_k−1 is neglected, i.e. no explicit real-izations of the model errors are generated, due to the difficulty in specifying the matrix Q_k−1. However, an alternative strategy to consider these errors is introduced in section 4.3.

Uncertainties in model statesxk−1, input forcing fieldsu_k, and model parameterspare in-troduced by generatingN_e ensemble members. In Eq. (4.2), only model states are written to the model prediction vector, i.e. x⁽ⁱ⁾⁻_k =w⁽ⁱ⁾⁻_k (see Eq. (3.30)).

For a simultaneous C/DA, Eq. (4.2) is reformulated to x⁽ⁱ⁾⁻_k = w⁽ⁱ⁾⁻_k

p⁽ⁱ⁾⁻_k

= f(w⁽ⁱ⁾_k−1,u⁽ⁱ⁾_k ,p⁽ⁱ⁾_k−1) p⁽ⁱ⁾_k−1

. (4.3)

Here, the time indices k−1 and k are added to the model calibration parameters p. In the model prediction step, the values of the calibration parameters are constant. Then, in the ensemble filter update the calibration parameters are simultaneously updated along with the model states w.

The error statistics of the model prediction are represented by the ensemble mean x_k =

1 Ne

PNe

i=1x⁽ⁱ⁾⁻_k and the empirical error covariance matrix (e.g., Ripley, 1987) C^e(x⁻_k) = 1

Ne−1∆X⁻_k(∆X⁻_k)^T, (4.4)

determined from the ensemble spread. Here, the matrix∆X⁻_k stores the ensemble pertur-bations ∆x⁽ⁱ⁾⁻_k = x⁽ⁱ⁾⁻_k −x_k in its columns. It is possible to define ∆X⁻_k = X⁻_kW with X⁻_k = (x⁽¹⁾⁻_k , . . . ,x^(N_k ^e⁾⁻) and the idempotent (N_e×N_e)-projection matrix W with ele-ments equal to1−N_e⁻¹ on its diagonal and −N_e⁻¹ as off-diagonal entries. By introducing W in the mentioned way, the rank of the matrix is (N_e−1) and the formulation of the model covariance matrix results in

C^e(x⁻_k) = 1

N_e−1X⁻_kW(X⁻_k)^T. (4.5)

In the following, five examples are designed to describe the details of the sequential en-semble data assimilation methods (box 1-5).

4.1. Model Prediction 51

Box 1: Prediction with a Simple Model and WGHM

Let us assume that we aim to improve the simulation of the total water storage (TWS), denoted by S, of the simple model presented in section 2.2.1, and simultaneously calibrating the model parameter K by introducing observations Y of TWS. In the model prediction step, an ensemble of model states is first initialized. In this example, N_e = 30 realizations of K₀ (dimensionless), of the initial water state S₀ (m³) and of the precipitation minus evapotranspiration time series (P −E)k (m³) are generated based on arbitrarily chosen PDFs (Tab. 4.1).

Table 4.1:Details to generate an initial ensemble of model runs by introducing uncertainties of the initial water stateS0, the model parameterK, and the input forcing field(P−E), i.e.

net precipitation. The model prediction is then performed using the equations of the simple model which was presented in section 2.2.1.

Ensemble of Variable Simple Model Initial Values name in in Eqs. (2.27)

(Eq. (4.3)) and (2.28)

model state w S S₀⁽ⁱ⁾ ∈ [2, 8]

calibration parameter p K K₀⁽ⁱ⁾∈ [0.01 0.99]

forcing field u (P −E) (P −E)⁽ⁱ⁾_k =m⁽ⁱ⁾·(P −E)k

with m⁽ⁱ⁾ ∈ [0.8, 1.2]

For K₀, a minimum and maximum limit of 0.01 and 0.99 was defined, respectively.

Subsequently, 30 uniformly distributed ensemble membersK₀⁽ⁱ⁾ were generated within these limits. The same procedure was applied to S₀ while considering 2 m³ and 8 m³ as limits. In order to account for uncertainties in the forcing field, i.e. net precipitation in this example, a multiplicative error centered at one and with maximum limits of 0.8 and 1.2 was assumed for (P−E)_k. Thus, larger uncertainties were associated with higher net precipitation values. It is worth mentioning that the ensemble of(P−E)_k is only used to represent uncertainties of the forcing field and, therefore, to realistically represent uncertainties of the model simulation. It is not updated in the ensemble filter. The model in Eq. (4.3) depends on w⁽ⁱ⁾_k−1, which is the water storage S_k−1⁽ⁱ⁾ of the previous time step k−1 for sample i, u⁽ⁱ⁾_k (the i-th realization of (P −E)⁽ⁱ⁾_k at time step k), and p⁽ⁱ⁾_k−1 (associated with the parameter K_k−1⁽ⁱ⁾ ). The initial ensemble mean and error covariance matrix of S0 and K0, which both should be improved, are calculated using their respective ensembles and given by

x₀ = S₀

K₀

4.88 0.47

, and C^e(x₀) = σ_S²

0 σ_S₀_K₀ σ_K₀_S₀ σ_K²₀

3.28 −0.05

−0.05 0.09

. By evaluating Eqs. (2.27) and (2.28) for each ensemble member over 24 time steps, the open loop (OL) simulations, i.e. no observations of S are assimilated, in Fig. 4.1 are determined.

0 5 10 15 20 0

2 4 6 8 10

time

P-E

0 5 10 15 20

0 5 10 15 20 25

time

Ensemble mean Ensemble members

0 5 10 15 20

0 5 10 15 20 25

time

a) b) c)

Figure 4.1:An overview ofNe= 30 ensemble members of the generated daily a) accumulated net precipitation P −E (in m³), b) the open loop simulations of storage S (in m³), and c) accumulated runoff R (in m³).

For data assimilation, the model prediction is performed until time step k, at which observations are available. The model simulations of each ensemble member are written into a prediction vector

x⁽ⁱ⁾⁻_k =

w⁽ⁱ⁾⁻_k p⁽ⁱ⁾⁻_k

S_k⁽ⁱ⁾ K_k⁽ⁱ⁾

(1−K_k−1⁽ⁱ⁾ )S_k−1⁽ⁱ⁾ + (P −E)⁽ⁱ⁾_k−1 K_k−1⁽ⁱ⁾

# ,

with m=2 rows, which is the number of states and parameters that will be updated, and one column. These vectors are collected in the model prediction matrix

X⁻_k =

S_k⁽¹⁾ S_k⁽²⁾ . . . S_k^(N^e⁾ K_k⁽¹⁾ K_k⁽²⁾ . . . K_k^(N^e⁾

# ,

withm=2 rows andN_e columns. The second order error statistics of the model predic-tion are represented by the ensemble meanx⁻_k and the empirical model error covariance matrix C^e(x⁻_k), which are given at time stepk =1 by

x⁻₁ = S₁

K₁

3.47 0.47

, and C^e(x⁻₁) =

σ_S²₁ σ_S₁_K₁ σ_K₁_S₁ σ_K²₁

2.46 −0.24

−0.24 0.09

. The variance of the TWS S is ∼27 times larger than the variance of the model pa-rameter K, since the minimum and maximum limits form a much larger interval for TWS. This depends obviously on the selected units for S. Since the parameter is not observed, the variance of K or its ratio to the variance of S will not affect the C/DA results. Instead, the parameter K will be calibrated via the dimensionless correlation to the state S. The model state and parameter have a covariance of -0.24 and a cor-relation coefficient of -0.52. The corcor-relation coefficient does not depend on the unit of S and, thus, the calibration of K is independent of the selected unit.

To transfer this example to the more complex WaterGAP Global Hydrology Model (WGHM), the model prediction vector is extended in three ways: (1) In WGHM, not only one TWS but ten individual water storage compartments are defined (see section 2.2.2). Therefore, the simulated values of each compartment are written to the model prediction vector. (2) WGHM does not only consider one storage for a

Im Dokument 3. Data Assimilation (Seite 66-73)