Penalized splines for cross-sectional and panel data

2.2.1 Penalized splines in the cross-sectional context

We start our considerations by discussing penalized spline specifications for cross-sectional data. Consider the additive regression model

y_i=β₀+

h=1

f_h(x_hi) +u_i, u_i∼N 0, σ_u²

, i= 1, . . . , n, (2.1)

where y_i is the response variable of interest, β₀ is an overall intercept term, f₁(x_1i), . . . , f_p(x_Hi) are smooth functions representing the potentially nonlinear effects of H deterministic covariates andui are independent and identically distributed normal error terms with variance σ_u².² To approximate the potentially nonlinear effects f_h, we use the weighted sum ofd_h B-spline basis functions, B_h1, . . . , B_hd_h, such that

fh(xhi)≈

d_h

j=1

Bhj(xhi)βhj=z^T_h(xhi)βh, (2.2) whereβ_his ad_h-dimensonal column vector of basis coefficients andz_h(x_hi) is thed_h-dimensonal column vector containing the evaluations of the basis functions at the observed covariate value xhi. Thereby, the amount of basis functions and coefficients dh is steered by the number of knots which divide the domain of the covariate. The bias introduced by the spline representation of a smooth function converges to zero with growing number of knots, see Claeskens et al. (2009) for details. We assume this bias to be negligible by using sufficiently many knots and subsequently postulate equality between an arbitrary smooth function and its spline representation, which leads to the expression

fh(xh) =Zhβh

in compact matrix notation, where

Zh=







Bh1(xh1) . . . Bhd_h(xh1) ... . .. ... B_h1(x_hn) . . . B_hd_h(x_hn)







is a design matrix of dimension n×dh assumed to be of full rank. In order to avoid an overfit to the data, a matrix Kh penalizing to much variability of adjacent coefficients in the coefficient vectorβh is assigned to each smooth function resulting in the penalized least squares criterion

y−β₀1_n−

h=1

Z_hβ_h

!^T

y−β₀1_n−

h=1

Z_hβ_h

! +

h=1

λ_hβ^T_hK_hβ_h, (2.3)

whereydenotes then-dimensional response vector,1_nis ann-dimensonal column vector of ones andλ_his a smoothing parameter determining the impact of the penalty on the minimization criterion. Thedh×dh -dimensional matrix Kh of first-order differences, that is penalizing differences of directly contiguous

2For notational simplicity, we refrain from adding stochastic covariates and covariates with strictly parametric effects.

However, as can be seen in Section 2.5, semiparametric partially linear models can also be handled easily within our framework.

coefficients, has the form

K_h=







1 −1

−1 2 −1 . .. . .. . ..

−1 2 −1

−1 1





 .

Difference matrices of higher orders can be easily constructed. Details on such penalties for B-spline functions can be found, for example, in Eilers and Marx (1996).

Let nowxhbe an arbitrary value on the domain ofxh.Defining the smoothing matrixLh(xh) as Lh(xh) = (In−S_−h)Zh

Z^T_h(In−S_−h)Zh+λhKh−1z^T_h(xh) (2.4) with I_n denoting the identity matrix of dimension n×n, S_−h = Z_−h(Z^T_−hZ_−h +λ_−hK_−h)⁻¹Z^T_−h with λ−h = (λ1, . . . , λh−1, λh+1, . . . , λH), Z−h = (Z1, . . . ,Zh−1,Zh+1, . . . ,ZH), K_−h= (K1, . . . ,K_h−1,Kh+1, . . . ,KH), andzh(xh) defined as in (2.2), the estimator of eachfh(xh) can be written as

fˆh(xh) =L^T_h(xh)y.

It follows that Varh

fˆh(xh)i

= Var

L^T_h(xh)y

=L^T_h(xh)Var(y)Lh(xh) =L^T_h(xh)σ_u²InLh(xh) (2.5) holds for homoscedastic and independent errors.

Naturally, the smoothing parametersλhare unknown. One way to estimate them is to exploit the mixed model representation of penalized splines. In particular, it is always possible to rewrite

Zhβh=Zh(Fhfαhf+Fhrαhr) =Xhfαhf+Xhrαhr,

where (Fhf,Fhr) is of full rank,F^T_hfFhr =F^T_hrFhf =F^T_hfKhFhf = 0 and F^T_huKhFhu =Id_h−q with the difference penalty orderq.³ It follows thatXhfis of dimensionn×qandXhris of dimensionn×(dh−q).

Then, αhf containsq fixed coefficients andαhr is a vector of (dh−q) virtually penalized random coef-ficients which are assumed to be mutually independent and normally distributed with constant variance σ²_hr and independent from the errorsu_i. In this mixed model formulation, we obtain estimates both for the coefficients (fixed and random) and smoothing parameters by a single (restricted) maximum likelihood estimation. The smoothing parameter estimators ˆλ_h = _σ_ˆ^σ^ˆ2²^u

are then given as ratios of two (estimated) variances. For details we refer the reader to Ruppert and Wand (2003) or Fahrmeir et al. (2013). In Section 2.3 we will make use of the the mixed model formulation to construct confidence bands.

Asymptotic properties of the penalized spline estimator have been studied, among others, by Claeskens et al. (2009), Kauermann et al. (2009), Wang et al. (2011), Antoniadis et al. (2012), Yoshida and Naito (2012) and Yoshida and Naito (2014). Under mild conditions, consistency of the estimator is shown by Claeskens et al. (2009) for a univariate model with i.i.d. errors. Antoniadis et al. (2012), Yoshida and Naito (2012) and Yoshida and Naito (2014) discuss the asymptotic properties for additive models and derive consistency within different frameworks, always including the case of i.i.d. errors. As we will show, our models can be transformed in such a way that they fit into the class of additive models with i.i.d.

errors.

3One way to obtain such a decomposition is described in Wood (2006, pp. 316-317).

It should be noted that each row in the initial design matrix Zh (i.e., before applying the mixed model reformulation) for each covariate sums up to one, that is,Pd

j=1B_hj(x_hi) = 1∀i= 1, . . . , n.Obviously, this leads to an identification problem in an additive model with an intercept or multiple smooth components.

This issue can be solved by imposing a centering constraint on each functionfhsuch that

i=1

fh(xhi) =

i=1

z^T_h(xhi)βh=0

holds for all h= 1, . . . , H. Following the ideas of Wood (2006, pp. 167-168), this can be achieved by constructing appropriate matrices Wh of dimension dh×(dh−1) with orthogonal columns, leading to a reparameterized model with design matrices ˜Z_h = Z_hW_h and penalty matrices ˜K_h = W^T_hK_hW_h. If the mixed model framework is used to determine the smoothing parameters as described above, the reparameterizing procedure to ensure identifiability is done before the mixed model reformulation of the model.

2.2.2 Penalized splines for panel data: A first-difference estimator

In comparison to cross-sectional data leading to model (2.1) introduced in the previous section, we now consider individuals (e.g., persons) observed at T consecutive points of time.⁴ We therefore consider an additive panel data model

yit=γi+

h=1

fh(xhit) +uit, i= 1, . . . , N, t= 1, . . . , T, (2.6)

where uit are assumed to be independent and normally distributed errors with constant variance and γ_i are individual-specific, time-invariant fixed effects allowed to be correlated with other covariates. As model (2.6) holds for each point of time, we obtain

y_i,t−1=γi+

h=1

fh(x_hi,t−1) +u_i,t−1 (2.7)

for a one period time lag. To cancel out the individual-specific effects γi, we subtract (2.7) from (2.6) and obtain

∆yit=yit−yi,t−1=γi−γi+

h=1

[fh(xhit)−fh(xhi,t−1)] +uit−ui,t−1 (2.8)

h=1





d_h

j=1

Bhj(xhit)βhj−

d_h

j=1

Bhj(xhi,t−1)βhj



+ ∆uit

h=1

[z_h(x_hit)−z_h(x_hi,t−1)]^Tβ_h+ ∆u_it

h=1

[∆zh(xhit)]^Tβh+ ∆uit,

where equation (2.2) is used for the second and third equality and ∆ denotes the first-difference operator over time. Note that onlyT−1 observations per individual are retained after differencing. Accordingly,

4The only reason to refrain from incorporating different observation horizons between persons is notational convenience.

As can be seen in Section 2.4 and Section 2.5, unbalanced panels can be handled without any difficulties in our framework.

as theN T ×dh-dimensional design matrixZhof the evaluated basis functions is now given by

Z_h=







Bh1(xh11) . . . Bhd_h(xh11) ... . .. ... B_h1(x_h1T) . . . B_hd_h(x_h1T)

... . .. ... Bh1(xhN1) . . . Bhd_h(xhN1)

... . .. ... Bh1(xhN T) . . . Bhd_h(xhN T)







, (2.9)

we obtain

∆y=

h=1

∆Zhβh+ ∆u (2.10)

in compact matrix notation, where ∆y= (y₁₂−y₁₁, . . . , y_1T−y1,T−1, . . . , y_N₂−yN1, . . . , y_{N T}−y_N,T−1)^T is aN(T−1)-dimensional column vector, ∆uis defined analogously and theN(T−1)×dh-dimensional matrix ∆Z_h is obtained by building the difference between matrixZ_h in (2.9) and its one period lagged counterpart:

∆Zh=







B_h1(x_h12) . . . B_hd_h(x_h12) ... . .. ... Bh1(xh1T) . . . Bhd_h(xh1T)

... . .. ... Bh1(xhN2) . . . Bhd_h(xhN2)

. ..

B_h1(x_{hN T}) . . . B_hd_h(x_{hN T})







−







B_h1(x_h11) . . . B_hd_h(x_h11) ... . .. ... Bh1(x_h1,T−1) . . . Bhd_h(x_h1,T−1)

... . .. ... Bh1(xhN1) . . . Bhd_h(xhN1)

... . .. ... B_h1(x_hN,T−1) . . . B_hd_h(x_hN,T−1)





 .

Additionally taking into account penalization, a first-difference penalized spline estimator for all βh can be obtained by minimizing the penalized least squares criterion

∆y−

h=1

(∆Zh)βh

#^T"

∆y−

h=1

(∆Zh)βh

# +

h=1

λhβhT

Khβh. (2.11)

Since the smoothing parameters are unknown, one can again exploit the mixed model representation and using (restricted) maximum likelihood estimation as discussed in the previous subsection. Note that the framework is similar to the cross-sectional data case since only the vector of the dependent variable and the design matrices differ between the equations (2.3) and (2.11). The major difference in comparison to cross-sectional data is the problem of autocorrelated errors which are often encountered in panel data contexts. Krivobokova and Kauermann (2007) show that the restricted maximum likelihood based estimation of a smoothing parameter is robust to modest forms of autocorrelation. Moreover, further adjustments on the design matrices and the dependent variable for addressing serial correlation are also possible, see Section 2.3 for an elaboration of this topic.

We briefly have to refer to the identification problem in case of multiple smooth model components: Our aim is to estimate the functions fh, h= 1, . . . , H. Hence, model (2.6) should be identified such that

i=1 T

t=1

f_h(x_hit)≈Z_hβ_h= 0

holds for allh= 1, . . . , H.Therefore, we rewrite the design matrices of the evaluated basis function given in (2.9) and the penalty matrices such that ˜Z_h=Z_hW_hand ˜K_h=W^T_hK_hW_h,proceeding as described in the previous subsection. Furthermore, the identification restriction also implies that a one period lagged design matrix is then constructed directly from ˜Zh by taking its one-period-lagged rows. After building the difference between each ˜Zhand its respective lagged counterpart, the resulting matrices ∆ ˜Zh

and the penalty matrices ˜Kh are plugged into (2.11) to obtain estimators forβhand thus forfh. Another common approach in fixed effects panel data models is time-demeaning, that is, removing the individual-specific effects γi by building the mean over time for each individual in equation (2.6) and subtracting the resulting equation from (2.6). Using the information above, this variant is straightforward to derive.

2.3 Simultaneous confidence bands for penalized

Im Dokument Causality, Prediction, and Replicability in Applied Statistics: Advanced Models and Practices (Seite 24-28)