• Keine Ergebnisse gefunden

RAINFALL-RUNOFF MODELS WITH PARZEN REGRESSION

R. Wόjcik, P. Torfs, P. Warmerdam

Wageningen University, Department of Environmental Science, Sub-Department Water Resources, Nieuve Kanaal II, 6709 PA, Wageningen, The Netherlands

ABSTRACT

In this paper a black box technique for rainfall-runoff modelling on a daily scale will be discussed.

The following aspects: incorporation of memory, periodicity and non-linearity will be highlighted.

An application to the catchment of the Beerze in the Netherlands will be shown. It will be demonstrated that it is feasible to model the runoff of a catchment with black box models, using rainfall data only.

Keywordsnon-linear black box models, rainfall-runoff process, Parzen densities, non-stationary time series

INTRODUCTION

The rainfall-runoff process of a catchment involves “hydrological” memory, as rainfall (P(n)) is not immediately transferred into discharge (Q(n)). A classical black box approach towards incorporating this memory into a rainfall-runoff model (see e.g. Minns and Hall, 1996 and references therein) is to apply the concept of delay reconstruction and then build the model using the following equation:

Q(n) = f(Q(n-τQ), Q(n-2 τQ), …, Q(n-K τQ),P(n), P(n- τP), P(n-2 τP),……, P(n-L τp)) (1)

where τQ, τP, K and L are delay reconstruction parameters and f (·) stands for a black box model.

This methodology, recently cast into the broader framework of non-linear systems theory (see Takens, 1981 and Casdagli, 1992 for theoretical background and Porporato and Ridolfi, 1997; 2001 or Silvakumar et al., 2002 for hydrological applications), suffers from several technical flaws. Firstly, there is no straightforward algorithmic way to find the optimal delay reconstruction given noisy non-linear time series, so usually several subjective decisions concerning the choice of τQ , τP ,K and L must be taken. Secondly, if the number of inputs on the right-hand side of (1) is large and the sample size is limited (as frequently happens in practice) there might not be enough data points to populate the reconstructed input-output space and then statistical models f (·) (both linear and non-linear) will show large uncertainty in the estimated parameters and tend to overfit. In statistical literature this phenomenon is referred to as the curse of dimensionality (Hastie et al. 2002). Finally, it is important to note that the discharge estimate Q(n) is always conditioned on antecedent discharges. This restriction stems from the fact that in (1) the use of rainfall inputs alone would be insufficient to calculate Q(n) with reasonable accuracy as pointed out by Minns and Hall (1996). So (1) is actually a discharge prediction model which in the literature is sometimes mistakenly referred to as rainfall-runoff model. A considerable part of its predictive power (especially for daily and sub-daily time resolution) is due to strong correlation of Q(n) with Q(n-τQ).

In this paper a new methodology that solves the above-mentioned problems is presented. It will be shown that combining simple conceptual models with a black box model is an effective way of computing rainfall-runoff transformation based on rainfall information only.

INCORPORATING MEMORY INTO BLACK BOX MODELS – A CONCEPTUAL APPROACH

The problem of accounting for a memory in statistical rainfall-runoff models can be tackled by using two linear (parallel) reservoir models S1 and S2 of the form:

Si(n+1) = αi Si(n) + P(n) (2)

where Si(n) is the storage of ith reservoir and αi are identified by investigating the recession curves of the discharge. The first reservoir accounts for slow processes involved in runoff formation such as fluctuations in groundwater flow, changes in soil moisture etc., while the second reservoir accounts for fast processes such as surface runoff. Combining measured rainfall input P(n) with storage functions of the reservoirs S1(n) and S2(n) leads to a black box model of the following type:

Q(n) = f(P(n), S1(n), S2(n)) (3)

Catchments in temperate climate systems also show a clear yearly periodic pattern. This is mainly due to the annual cycle in temperature, land use and evapotranspiration. If that extra information is not directly used in the model (as we assume in this study), the basic approach would therefore be to fit a model (3) for every day:

Q(n) = fd(n) (P(n), S1(n), S2(n)) (4)

where d(n) stands for the day number of time n. This multiplies the number of parameters by a factor 365, which makes it as such inapplicable. Therefore, the use of an artificial input is proposed:

t(n) = sin(ω (n+nf)) (5)

where ω = 2π /365 and nf is an appropriately chosen phase constant. Inclusion of this input into (3) yields:

Q(n) = f(P(n), S1(n), S2(n), t(n)) (6)

By letting f (·) to be a universal function approximator, the form of t(n) is not so important, i.e. other forms that account for certain periodic components present in data can be used as well.

In this paper, a rainfall-runoff model of the type described by (6) will be shown to be an attractive alternative to the type described by (1). As f (·) we will use the Parzen regression technique. Fig 1 illustrates this new way of rainfall-runoff modelling schematically.

Fig 1: A new methodology for incorporating “hydrological” memory into statistical rainfall - runoff models.

PARZEN REGRESSION

Parzen (Parzen, 1962; see also Silverman, 1986 and Wand and Jones, 1995) used the sum of Gaussian densities to approximate arbitrary densities, as shown in Fig 2. To fit a Parzen density to the joint input-output sample, a maximum likelihood principle was used, but with a new twist being an inclusion of an extra

penalty term that prevents degeneration of the Parzen density in question and controls locality of the fit. The maximised fitting criterion Λ can then be written as:

Λ = L – γ dKL(P;G) (7)

where L denotes the likelihood function, γ is the locality constant and dKL(P;G) is the average Kullback-Leibler distance (see Kullback, 1959) between components of the Parzen density P and reference Gaussian density G. Once the optimal value of γ is found by a validation procedure and the density is fitted to the inputs and outputs, conditional means and standard deviation bands can be calculated, as depicted in Fig 3.

Fig 2: A one and two dimensional example of Parzen densities (in both cases as the sum of three Gaussian components).

Fig 3: The steps involved in Parzen regression: first a Parzen density is fitted to the data, then conditional densities are estimated, from which mean and e.g. confidence intervals may be extracted.

DATA AND RESULTS

The rainfall and runoff data used for regression experiments in this study are a sequence of daily values registered at the Beerze catchment in the Netherlands over a period of 8 years (January 1980-December 1987). The catchment area covers 240 km2 and the mean annual rainfall is about 800 mm. The data set was divided into three parts: a calibration set, a validation set and a testing set. The parts consisted of 3 years, 2

years and 3 years of daily records respectively. The calibration set was used to fit regression models, the validation set was used for selection of model parameters, and the testing set was used for assessment of the final chosen models. Two types of models f(·) in (6) were considered: a linear regression model and a non-linear Parzen regression model.

Fig 4: Performance of non-linear and linear regression technique on the testing set.

Fig 4 shows the testing results for both types of models. It can clearly be seen that the non-linear model performed better. Apart from the visual judgement, we also calculated normalised mean squared error for the testing set. This error was always lower for Parzen regression. The strong point of the non-linear model is that apart from discharge estimates, extra information in the form of conditional Parzen densities is available.

Fig 5 shows a segment of the testing set for which several such conditional densities are plotted together with the conditional mean (estimated discharge), standard deviation bands, and measured discharge. It is interesting to see that there are three cases (designated as A, B and C) for which the conditional density is bimodal. The reason for that kind of behaviour might be that the mapping of rainfall into runoff is not

exactly functional, i.e. the same values of rainfall are sometimes associated with different values of runoff.

Therefore, the mean and the standard deviation have to be interpreted through these two modes.

Fig 5: X-Y plane: Parzen regression on a segment of the testing set with error bands given by standard deviations; Z-plane: conditional densities.

CONCLUSIONS

1. The use of conceptual reservoir models solves efficiently a memory incorporation problem for statistical rainfall - runoff models.

2. The Parzen regression model performed better as a rainfall-runoff transformation tool than the linear regression model.

3. In addition to runoff estimates, Parzen regression offers other results that can be useful for the modeller:

standard deviation error bands, full conditional density. All these concepts have their standard probabilistic interpretation.

REFERENCES

Casdagli, M. (1992) A dynamical systems approach to modelling input-output systems. In: Casdagli, M., Eubank, S. (eds.), Nonlinear Modelling and Forecasting. Santa Fe Institute Studies in the Science of Complexity. Addison-Wesley, Redwood City.

Hastie, T., Tibshirani, R., Friedman, J.H. (2002) The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer-Verlag, London.

Kullback, S. (1959) Information Theory and Statistics. Dover Publications, New York.

Minns, A.W., Hall, M.J. (1996) Artificial neural networks as rainfall - runoff models. Hydrological Sciences Journal, 41, 399-417.

Parzen, E. (1962) On estimation of a probability density function and mode. Annals of Mathematical Statistics, 33, 1065-1076.

Porporato, A., Ridolfi, L. (1997) Non-linear analysis of river flow time sequences. Water Resources Research, 33, 1352-1367.

Porporato, A., Ridolfi, L. (2001) Multivariate nonlinear prediction of river flow. Journal of Hydrology, 248, 109-122.

Silvakumar, B., Jaywardena, A.W, Fernando, T.M.K.G. (2002) River flow forecasting: use of phase-space reconstruction and artificial neural networks approaches. Journal of Hydrology, 265, 225-245.

Silverman, B.W. (1986) Density Estimation for Statistics and Data Analysis. Chapman and Hall, New York.

Takens, F. (1981) Detecting strange attractors in turbulence. In: Rand, D.A., Young, L.S. (eds.), Dynamical Systems and Turbulence. Lecture Notes in Mathematics, vol. 898. Springer-Verlag, New York.

Wand, M.P., Jones, M.C. (1995) Kernel Smoothing, Chapman & Hall, London.

IMPLEMENTATION OF A MATHEMATICAL MODEL