Setup and development of the optimal autoregressive models

Use of autoregressive models for short-term prediction 4.3

4.3.3 Setup and development of the optimal autoregressive models

The objective of seasonal to inter-annual prediction is to estimate the future states or values of climate variables from previously known climate predictors (Goddard et al. 2001). The terminology of short-term prediction, as used here, refers to seasonal prediction to predict the near-future state based on the previous evolution of the climate series. In this section, the AR- and ARIMA- models are established with monthly climate time-series. The non-seasonal and seasonal orders of the models are determined and the optimal models are fitted to the observed climate data. Finally, up to one-year-ahead climate predictions for the study area are carried out.

Fitting of an AR-model 4.3.3.1

The autoregressive (AR) model represents a linear prediction function to predict future output of a system from previous output. The setup and development of the appropriate autoregressive models (AR and ARIMA) is a non-trivial task and basically consists of four steps (e.g. Box and Jenkins 1976, Hipel and McLeod 1994, Box et al. 2008):

(1) Model identification

(2) Estimation of the model parameters

(3) Diagnostic checking for the identified model appropriateness (4) Application of the final model (validation and forecasting)

The (1) step is probably the most difficult one, as it requires essentially the determination of the orders p of the AR-process in Eq. (4.1). This is done by visual inspection of the plots of the regular autocorrelation function (ACF) and of the partial autocorrelation function (PACF), as shown in the previous section.

Once the optimal model order is selected, i.e. the major structure of the underlying AR- model is determined, in step (2) the model coefficients in Eq. (4.1) are estimated by linear regression in the step (2) which amounts to the solution of the so-called Yule-Walker equations (e.g. Hipel and McLeod 1994). As the model order p cannot be always uniquely determined, the order of the model is further fine-tuned by optimizing the AIC, following the fitting AR-algorithm of Plummer (2012). The common AIC has been described in Eq. (2.4). When applied to the most general autoregressive (ARIMA or SARIMA) model (see following section), as defined in Eq.

(4.4), the AIC is expressed as (Hyndman and Khandakar 2008):

where are the model orders of the SARIMA-model, r is the number of error terms, and is the maximized likelihood of the model, fitted to the non-seasonally- (to remove non-stationarities in the time series) and seasonally-differenced time series , i.e.

in Eq. (4.4). For the regular AR-model for a stationary and non-seasonal time series, obviously, only parameter p enters Eq. (4.9).

The finally resulting model configuration is diagnostically checked and selected in step (3). The optimal autoregressive order p is determined by selecting model, which gives best prediction skill (smallest AIC).

Finally, in step (4), the structure and configuration of the model are validated and applied for short-term forecasting of the time-series , k time-steps ahead, using present values.

The development of the various autoregressive models is executed here in the R-programming environment, using the “stats” and “forecast” packages (Team 2008).

(4.9)

In the following analysis, all AR- models have been validated, i.e. calibrated and verified, based on two validation schemes (see Table 4.4):

1) Calibration with 1971-1985 data, verification for the following year 1986 (scheme vrf1) 2) Calibration with 1971-1999 data, verification for the following year 2000 (scheme vrf2) According to the plots of the ACF- and PACF- autocorrelations of the previous section which indicate a seasonal cycle of 12 months in the climate time series Therefore, the further optimization of the AR-model, based on the AIC-values, is carried out with an increasing order p up to p=60. The models providing the lowest AIC are then used further for prediction in the subsequent verification periods, providing another way for the selection the optimal model.

For the calibration with 1971-1985 precipitation time series at station 459201, Figure 4.2a shows the AIC as a function of the number p of parameters ranging from 1 to 60 months. The local minimum AIC values are found for p = 12, 24, 26, 48 and 58, - i.e. they scatter around multiples of the 12-month cycle -, as indicated by the points b) to f) in Figure 4.2a. The AR-models with these orders p are then used to predict the 12-month rainfall for the following year 1986 (verification scheme vrf1), as shown in the corresponding panels of Figure 4.2b to Figure 4.2f.

Figure 4.2. AIC as a function of the order p of the AR(p), with the corresponding local minima indicated (a), and optimal models to predict rainfall at station 459201 for year 1986, based on calibration between 1971-1985 (vrf1), for the different optimal orders p=12, 24, 26, 48 and 58 (b-f). Also shown are the 80% and 95% confidence intervals of the forecast as well as the NS-coefficients below the charts, showing the model performance of the respective AR- model.

a) AICs varying with p b) AR(12)

c) AR(24) d) AR(26)

e) AR(48) f) AR(58)

c) d)

e) f)

PCP (mm/day) PCP (mm/day)

PCP (mm/day) PCP (mm/day) PCP (mm/day)

also shown in the panels are the 80% and 95% confidence intervals of the forecast as well as the NS- coefficients characterizing the model performance of the respective AR- model. One may notice that in this verification modus, the AR(12)-model of Figure 4.2b provides the best performance with NS=0.481. Consequently, this model is then selected for the precipitation prediction at this meteorological station.

The examination of the AR-model performances is done for all precipitation- and temperature data sets to determine the best choice of model order for the whole meteorological network in the study region. The results for the two validation schemes are shown in Table 4.5 which indicates that for most of the stations, p = 12 and 24 are the optimal orders. On notes also that for the year-2000 prediction (vrf2), the optimal order tends more to p = 36.

Table 4.5. Orders p of the best fitting AR(p) model for the 4 temperature- and 24 precipitation stations in the study region for predicting the 1-year-ahead monthly climate, for the two validation schemes (see text for explanations).

validation

scheme predictand number of stations with p model parameters in AR(p)^*

p=12 17 24 26 28 34 36 38 46 47 48 53 56 59

cal: 1971-1985 vrf1: 1986

Tmax 2 1 1

Tmin 1 2 1

PCP 14 8 1 1

cal: 1971-1999 vrf2: 2000

Tmax 1 2 1

Tmin 2 1 1

PCP 11 5 1 2 2 1 1 1

count 30 1 17 1 2 1 5 1 1 1 1 1 1 1

*number of stations add up to 4 for Tmin and Tmax and to 24 for PCP

Fitting of an ARIMA- model 4.3.3.2

As discussed previously, the autoregressive integrated moving average (ARIMA) and its seasonal homologue, the SARIMA- model, are the most complete models to describe non-stationary, seasonal stochastic processes. Fitting of an ARIMA-model is quite more complex than that of an AR-model and is endeavored in this section.

Similar to that of an AR-model, the fitting process of an ARIMA(p,d,q) -model follows also the four steps of Box and Jenkins (1976). To formulate the initial model, generally, the selection of the orders p and q of the AR- and the MA-process, respectively, is based on the inspection of the ACF- and PACF- series as a function of the lag-times (Hyndman and Khandakar 2008). More specifically, the order p of the AR-process is determined from largest lag time k for which the PACF is non-zero, whereas the order q of the MA-process is determined from the corresponding lag of the ACF (Hipel and McLeod 1994).

A more quantitative way to do the optimal selection of p and q in a trial ARIMA(p,d,q)- model is based on the Ljung-Box test (Ljung and Box 1978) which tests the residuals of the fitted model for remaining autocorrelation. Normally, the model configuration is chosen by selecting the lowest AIC value. However, the values of AIC as computed by Eq. (4.9) cannot be compared at different levels of differencing, d, and, in case of seasonality, also of D (Smith and Yadav 1994, Smith and Yadav 1995, Hyndman and Khan Dakar 2008). Therefore, the likelihood of the whole model is used to evaluate the optimal model directly.

The order d of the differencing to make the non-stationary time series stationary is determined by means of the KPSS-test (Kwiatkowski et al. 1992) which tests a series for stationarity. In fact, the whole ARIMA-setup process is based on four alternative optimizing approaches, as

combinations of step-wise optimization and KPSS-tests (Kwiatkowski et al. 1992), i.e. op1) stepwise fitting of p and q and considering d from the lowest AIC, op2) non-stepwise fitting of p and q and considering d from lowest AIC, op3) stepwise fitting of p and q and KPSS-test for choosing d (Kwiatkowski et al. 1992) and op4) non-stepwise fitting of p and q and KPSS-test for choosing d.

These four optimization options are investigated by examining the ARIMA- model residuals at all climate series observation stations in the study region for the two verification periods mentioned. The results are listed in Table 4.6, which shows that the op4 is the optimal method for fitting climate time-series in the study area by means of the appropriate ARIMA-model.

Table 4.6. Model performance in predicting monthly climate series for two validation schemes vrf1 and vrf2, using four methods of ARIMA- model optimization (see text for explanations).

validation

scheme predictand unit

residual error (RMSE)¹

calibration² verification³

Im Dokument The prediction of seasonal and inter-annual climate variations and their impacts on the water resources in the eastern seaboard of Thailand (Seite 163-166)