Dynamic Linear Models - State Space Models

10 Spectral Analysis

11 State Space Models

11.3 Dynamic Linear Models

Once the equations are set up, it is straightforward to derive the matrices:

1 2

Similar to the example above, we could now simulate from an AR(2) process, add some artificial measurement noise and then try to uncover the signal using the Kalman filter. This is left as an exercise.

11.3 Dynamic Linear Models

A specific, but very useful application of state space models is to generalize linear regression such that the coefficients can vary over time. We consider a very simple example where the sales manager in a house building company uses the following model: the company’s house sales at time t, denoted as S_t, depends on the general levels of sales in that area L_t and the company’s pricing policy P_t.

t t t t t

S  L P V

This is a linear regression model with price as the predictor, and the general level as the intercept. The assumption is that their influence varies over time, but generally only in small increments. We can use the following notation:

1 zero that are independent over time. While we assume independence of L_t and

t

 , we could also allow for correlation among the two. The relative magnitudes of these perturbations are accounted for with the variances in the matrices V_t and W_t of the state space formulation. Note that if we set W_t 0, then we are in the case

of plain OLS regression with constant parameters. Hence, we can also formulate any regression models in state space form. Here, we have:

t t

Because we do not have any data for this sales example, we again rely on a simulation. Evidently, this has the advantage that we can evaluate the Kalman filter output versus the truth. Thus, we let

t t t

y  a bx z

2 /10

xt  t

We simulate 30 data points from t1,...,30 and assume errors which are standard normally distributed, i.e. z_t ~N(0,1). The regression coefficients are a4 and

b for t1,...,15 and a5 and b 1 for t16,...,30. We will fit a straight line with time-varying coefficients, as this is the model that matches what we had found for the sales example above.

> ## Simulation

> x.mat <- matrix(xx, nrow=nn, ncol=2)

> y.mat <- matrix(y1, nrow=nn, ncol=1)

> ## State Space Formulation

> ssf <- SS(y=y.mat, x=x.mat,

> Fmat=function(tt,x,phi)

> fit <- kfilter(ssf)

> par(mfrow=c(1,2))

> plot(fit$m[,1], type="l", xlab="Time", ylab="")

> title("Kalman Filtered Intercept")

> plot(fit$m[,2], type="l", xlab="Time", ylab="")

> title("Kalman Filtered Slope")

The plots show the Kalman filter output for intercept and slope. The estimates pick up the true values very quickly, even after the change in the regime. It is worth noting that in this example, we had a very clear signal with relatively little noise, and we favored recovering the truth by specifying the state space formulation with the true error variances that are generally unknown in practice.

Example: Batmobile

We here consider another regression problem where time-varying coefficients may be necessary. The description of the practical situation is as follows: In April 1979 the Albuquerque Police Department began a special enforcement program aimed at countering driving while intoxicated accidents. The program was composed of a squad of police officers, breath alcohol testing (BAT) devices, and a van named batmobile, which housed a BAT device and was used as a mobile station. The data were collected by the Division of Governmental Research of the University of New Mexico under a contract with the National Highway Traffic Safety Administration of the Department of Transportation to evaluate the batmobile program. Source: http://lib.stat.cmu.edu/DASL/Datafiles/batdat.html

The data comprise of quarterly figures of traffic accidents and the fuel consumption in the Albuquerque area as a proxy of the driven mileage. The first 29 quarters are the control period, and observations 30 to 52 were recorded during the experimental (batmobile) period. We would naturally assume a time series regression model for the number of accidents:

2 3 4

0 1 2 1 ( ) 3 1 ( ) 4 1 ( )

t t Q t Q t Q t t

ACC    FUEL       E

The accidents depend on the mileage driven and there is a seasonal effect. In the above model the intercept ₀ is assumed to be constant. In the light of the BAT program, we might well replace it by ₀ L_t, i.e. some general level of accidents that is time-dependent. Let us first perform regression and check residuals.

0 5 10 15 20 25 30

4.04.44.85.2

Time

Kalman Filtered Intercept

0 5 10 15 20 25 30

-1012

Time

Kalman Filtered Slope

> ## Regression and Time Plot of Residuals

> fit <- lm(ACC ~ FUEL + season, data=regdat)

> plot(ts(resid(fit)), main=”Time Plot of Residuals”)

> ## Fitting a Loess Smoother to the Residuals

> times <- 1:52

> fit.loess <- loess(resid(fit) ~ times, span=0.65, degree=2)

> lines(times, fitted(fit.loess), col="red")

The time series plot of the residuals shows very clear evidence that there is timely dependence. In contrast to what the regression model with constant coefficients suggests, the level of accidents seems to rise in the control period, then drops markedly after the BAT program was introduced. The conclusion is that our above regression model is not adequate and needs to be enhanced. However, just adding an indicator variable that codes for the times before and after the introduction of the BAT program will not solve the issue. It is evident that the level of the residuals is not constant before the program started, and it does not suddenly drop to a constant lower level thereafter.

The alternative is to formulate a state space model and estimate it with the Kalman filter. We (conceptually) assume that all regression parameters are time dependent, and rewrite the model as:

2 3 4

1 2 1 ( ) 3 1 ( ) 4 1 ( )

t t t t t Q t t Q t t Q t t

ACC L  FUEL       E

Our main interest lies in the estimation of the modified intercept term L_t, which we now call the level. We expect it to drop after the introduction of the BAT program, but let’s see if this materializes. The state vector X_t we are using contains the regression coefficients, and the state equation which describes their evolution over time is as follows:

Time

ts(resid(fit))

0 10 20 30 40 50

-60-40-2002040

Time Plot of Residuals

As we can see, we only allow for some small, random permutations in the level L_t, but not in the other regression coefficients. The observation equation then describes the regression problem, i.e.

t t t t canonical regression fit. Also the starting values for Kalman filtering, as well as the variances of these initial states are taken from there. Hence, the code for the state space formulation, as well as for Kalman filtering is as follows:

> y.mat <- as.matrix(regdat$ACC)

> x.mat <- model.matrix(fit)

> ssf <- SS(y=y.mat, x=x.mat,

+ Fmat=function(tt,x,phi) return(t(x[tt,,drop=FALSE])), + Gmat=function(tt,x,phi) return(diag(5)),

+ Wmat=function(tt,x,phi) return(diag(c(10,0,0,0,0))), + Vmat=function(tt,x,phi) return(matrix(600)),

+ m0=matrix(c(0,5,50,50,25),1,5), + C0=diag(c(900,1,100,100,100)))

> fit.kal <- kfilter(ssf)

The filtered output is in object m in the output list. We can extract the estimates for the mean of the state vector at time t, which we display versus time.

0 10 20 30 40 50

Indeed, the level L_t shows a pronounced drop from t35 to t45. Hence, the BAT program shows an effect, but there is some delay after the intervention and it takes some time until is has full effect. Finally, we track the estimates for the seasonal coefficients.

The estimates for these coefficients slightly wiggle around over time. However, they do not seem to change systematically, as we had previously guessed.

0 10 20 30 40 50

3040506070

Coefficient for Q2

Time

Coefficient

0 10 20 30 40 50

3040506070

Coefficient for Q3

Time

Coefficient

0 10 20 30 40 50

10203040

Coefficient for Q3

Time

Coefficient

Im Dokument Applied Time Series Analysis (Seite 189-194)