• Keine Ergebnisse gefunden

4.2 Predictor components and priors

4.2.6 Interaction surfaces

4.2.6.6 Comparison

When comparing the different bivariate random walks discussed in the previous para-graphs, Kronecker sums of univariate random walks and the approximation to the bihar-monic differential operator are the most promising alternatives to define priors for the parameters of tensor product splines. Both the local quadratic fit and the Kronecker product of penalty matrices lead to problems which make them unsuitable in the present setting. An empirical comparison of bivariate random walks will be performed in a sim-ulation study in Section 8.

Considering differences between first and second order random walks, the latter have larger null spaces including different types of planes and are therefore able to better approximate effects that are in fact planes. Generally, second order random walks seem to be more appropriate when estimating smooth surfaces while first order random walks may be more suitable for wigglier functions. This hypothesis will also be investigated further in the simulation study in Section 8.

5 Inference

When performing Bayesian inference, all inferential conclusions are based on the posterior of the model. In an empirical Bayes approach to structured additive regression, no hy-perpriors are assigned to the hyperparameters, i. e. the variances τj2 are treated as fixed.

In this case, the specific form of the posterior only depends on the parameterization of the regression terms in the model. If we choose the original parameterization in terms of the ξj as discussed in the previous sections, the posterior is given by

p(ξ1, . . . , ξp, γ,|y)∝L(y, ξ1, . . . , ξp, γ) Yp

j=1

p(ξjj2), (5.1)

where L(·) denotes the likelihood which is the product of individual likelihood contri-butions defined by the exponential family density in (4.1) and p(ξjj2) is the prior for regression coefficients ξj as given in (4.9). Note that posterior (5.1) has the form of a pe-nalized likelihood, where the penalty terms equal the prior distributions of the regression coefficients. Hence, empirical Bayes estimates derived by maximizing (5.1) can also be considered penalized likelihood estimates.

Since the variances shall also be estimated from the data, an appropriate estimation procedure has to be provided. In principle, methodology developed for mixed models could be used, since from a frequentist perspective, prior (4.9) merely defines ξj to be a random effect with a correlated random effects distribution. However, the fact that some of the priors are improper prevents direct usage of mixed model estimation techniques. An increasingly popular idea to solve this problem is to reparametrize models with penalties as mixed models with i. i. d. random effects. This approach goes back to Green (1987) for smoothing splines and has been used in a variety of settings throughout the last years. Lin

& Zhang (1999) consider models for longitudinal data consisting of nonparametric effects modeled by smoothing splines and random effects to account for correlations caused by the longitudinal data structure. Zhang (2004) further extends this model and includes varying coefficient terms with smoothing splines as effect modifiers. Mixed model approaches to penalized splines based on truncated power series representations have been described in Wand (2003) and Ruppert et al. (2003). Kammann & Wand (2003) added spatial effects to these models using (low rank) Kriging terms. Penalized splines with B-spline bases were considered in Currie & Durb´an (2002).

In either case, the mixed model representation yields a variance components model, and techniques for estimating the variance parameters are yet available or can be eas-ily adapted. Probably the most common approach is to employ restricted maximum likelihood, also termed marginal likelihood in the literature. Kauermann (2004) provides some theoretical results on (restricted) maximum likelihood estimates for smoothing pa-rameters in penalized spline models.

Before addressing the estimation of regression coefficients and variance parameters in a mixed model in detail, we will first show how to rewrite structured additive regression models as variance components models.

5.1 Mixed model representation

In order to reformulate structured additive regression models as mixed models, we first take a closer look on the general prior (4.9) for the regression coefficients ξj. This prior specifies a multivariate Gaussian distribution. However, in most cases the precision matrix Kj is rank deficient rendering (4.9) an improper distribution. Assuming thatKj is known and does not depend on further parameters to be estimated, we can reexpressξj via a one-to-one transformation in terms of a parameter vector βj with flat prior and a parameter vector bj with i. i. d. Gaussian prior. While βj captures the part of a function fj that is not penalized by Kj, bj captures the orthogonal deviation from this unpenalized part.

The dimensions of both vectors depend on the rank of the penalty matrix Kj. If Kj

has full rank, the unpenalized part vanishes completely and by choosing bj =Kj1/2ξj we immediately obtain bj ∼N(0, τj2I).

For the general case of rank deficient Kj, things are somewhat more complicated. If we assume that the j-th parameter vector has dimension dj and the corresponding penalty matrix has rank kj the decomposition of ξj into a penalized and an unpenalized part is of the form

ξj = ˜Xjβj + ˜Zjbj (5.2)

with a dj×(dj −kj) matrix ˜Xj and a dj ×kj matrix ˜Zj. Requirements for decomposition (5.2) are:

(i) The composed matrix ( ˜Xjj) has full rank to make the transformation in (5.2) a one-to-one transformation. This also implies that both ˜Xj and ˜Zj have full column rank.

(ii) ˜Xj and ˜Zj are orthogonal, i. e. ˜Xj0j = 0.

(iii) ˜Xj0Kjj = 0, resulting in βj being unpenalized byKj. (iv) ˜Zj0Kjj =Ikj, resulting in an i. i. d. Gaussian prior forbj.

In general the matrices defining (5.2) can be set up as follows: Establish ˜Xj as a (dj − kj)-dimensional basis of the null space of Kj. As a consequence, requirement (iii) is automatically fulfilled. The matrix ˜Zj can be constructed as ˜Zj =Lj(L0jLj)−1, where the full column rankdj×kj matrixLj is determined by the factorization of the penalty matrix Kj intoKj =LjL0j. This ensures requirements (i) and (iv). If we furthermore choose Lj

such thatL0jj = 0 and ˜XjL0j = 0 hold, requirement (ii) is satisfied too. The factorization of the penalty matrix can be based on the spectral decomposition Kj = ΓjjΓ0j, where the kj ×kj diagonal matrix Ωj contains the positive eigenvalues ωjm, m = 1, . . . , kj, of Kj in descending order, i. e. Ωj = diag(ωj1, . . . , ωjkj) and the dj ×kj orthogonal matrix Γj is formed of the corresponding eigenvectors. From the spectral decomposition we can choose Lj = Γj1/2j .

Note that the factorLj is not unique and in many cases numerically superior factorizations exist. For instance, for P-splines a more favorable choice for Lj is given by Lj = D0, whereDis the difference matrix used to construct the penalty matrix. For random walks a weighted version of the difference matrix can be used, i. e.Lj =D0W12, and for seasonal

effects again the factor Lj =D0, where D is defined in Section 4.2.2.4, is an alternative choice.

From the general prior (4.9) forξj and decomposition (5.2), it follows that p(βj)∝const and

bj ∼N(0, τj2Ikj), (5.3)

since

1

τj2ξj0Kjξj = 1 τj2b0jbj.

This is a special version of the general decomposition result discussed for IGMRFs in Section 4.2.

Defining the vectors x0ij =v0ijj and zij0 = vij0j allows us to rewrite the predictor (4.7) as

ηi = Xp

j=1

vij0 ξj +u0iγ

= Xp

j=1

(vij0jβj +vij0jbj) +u0iγ

= Xp

j=1

(x0ijβj+zij0 bj) +u0iγ

= x0iβ+zi0b.

The design vector zi and the vector b are composed of the vectors zij and the vectors bj, respectively. More specifically, we obtain zi = (zi10 , z0i2, . . . , zip0 )0 and the stacked vector b = (b01, . . . , b0p)0. Similarly, the vector xi and the vector β are given by xi = (x0i1, x0i2, . . . , x0ip, u0i)0 and β= (β10, . . . , βp0, γ0)0. In matrix notation, proceeding in a similar way yields

η = Xp

j=1

Vjξj +U γ

= Xp

j=1

(Xjβj+Zjbj) +U γ

= Xβ+Zb, (5.4)

where the matrices X and Z are composed in complete analogy as in the vector-based presentation.

Finally, we obtain a GLMM with fixed effectsβand random effectsb∼N(0, Q) whereQ= blockdiag(τ12Ik1, . . . , τp2Ikp). Hence, we can utilize GLMM methodology for simultaneous estimation of the functionsfj and the variance parametersτj2, see the subsequent sections.

Due to the flat prior ofβ, posterior (5.1) transforms to p(β, b|y)∝L(y, β, b) exp

−1

2b0Q−1b

(5.5)

and the log-posterior is given by

lp(β, b|y) = l(y, β, b)− Xp

j=1

1

j2b0jbj, (5.6) where l(y, β, b) denotes the log-likelihood corresponding toL(y, β, b).

The decomposition of ξj also leads to a similar decomposition for fjij) into a penalized and an unpenalized part:

fjij) = vij0jβj +vij0jbj

= x0ijβj+zij0 bj. (5.7) This decomposition will form the basis for the construction of pointwise credible intervals for fjij) (see Section 5.2.1).

The mixed model representation furthermore allows for a different perspective on the iden-tification problem inherent to nonparametric regression models. For each of the model components with improper prior (except varying coefficient terms), the matrix Xj repre-senting the deterministic part offj contains a column of ones corresponding to the mean level of the respective function. Provided that there is at least one such term and that we have an intercept included in the model, linear dependencies in the design matrix X of fixed effects occur. To get around this, we delete all the vectors of ones except for the intercept which has a similar effect as centering the functions fjij).

A similar problem occurs in models with varying coefficient terms, if a covariate, u say, is included both in a parametric way and as interaction variable of a varying coefficient term, i. e. if a model with a predictor of the form

η=. . .+u·γ+u·f(w) +. . .

shall be estimated. In this case, γ is not identifiable since the design matrix for the unpenalized part resulting from the decomposition of f(w) contains a column of ones.

This column of ones is multiplied by u and, thus, u is represented twice in the overall design matrix of the fixed effects. This in turn results in a singular coefficient matrix when estimating the regression coefficients. As a consequence, covariates have to be considered either as parametric effects or as interaction variables and not as both.

For models with interaction surfaces, there are also modeling restrictions that have to be respected to ensure identifiability. If we consider a regression model of the form

η=. . .+f1(u) +f2(w) +f1,2(u, w) +. . . ,

wheref1 andf2 shall be modeled as univariate penalized splines andf1,2 is to be included as a bivariate P-spline, we already know from the above considerations, that the mean levels of these functions are not identifiable. Moreover, if second order random walks are employed as priors for both the univariate terms and the interaction term, further linear dependencies in X are introduced. Either with a bivariate second order random walk based on a Kronecker sum or an approximation to the biharmonic differential oper-ator (see Sections 4.2.6.2 and 4.2.6.4), the design matrix X contains columns which are

multiples of the original covariate vectors u and w. The same observation holds for the reparametrizations of the univariate terms. Hence, linear dependencies caused by these vectors are observed and the resulting model is not identifiable. As a consequence, care has to be taken to choose a suitable formulation for the estimation problem at hand.

However, the mixed model approach has the advantage, that identifiability problems can be diagnosed from the rank of the design matrix X, while in fully Bayesian inference based on MCMC, they are not necessarily recognizable from the output of the estimation procedure.