Inference based on MCMC - Mixed model based inference in structured additive regression

Since we will compare mixed model based empirical Bayes estimates with their fully Bayesian counterparts in several simulation studies in the following, we give a short overview over MCMC inference in structured additive regression. In this case, no repara-metrization is needed and inference can be performed directly for the parametersγ, ξ1, . . . , ξp. A nice introductory text about MCMC inference is for example given in Green (2001).

More details on MCMC inference in structured additive regression models can be found in Fahrmeir & Lang (2001a), Lang & Brezger (2004) and Brezger & Lang (2005).

In a fully Bayesian approach, parameter estimates are generated by drawing random samples from the posterior (5.1) via MCMC simulation techniques. The variance para-metersτ_j² can be estimated simultaneously with the regression coefficientsξj by assigning additional hyperpriors to them. The most common assumption is, that the τ_j² are in-dependently inverse gamma distributed, i. e. τ_j² ∼ IG(aj, bj), with hyperparameters aj

and bj specified a priori. A standard choice is to use aj = bj = 0.001. In some data situations (e. g. for small sample sizes), the estimated nonlinear functionsf_j may depend considerably on the particular choice of hyperparameters. It is therefore good practice to estimate all models under consideration using a (small) number of different choices foraj

and b_j to assess the dependence of results on minor changes in the prior assumptions.

Suppose first that the distribution of the response variable is Gaussian, i. e. yi|ηi, σ² ∼ N(ηi, σ²/ωi),i= 1, . . . , nory|η, σ² ∼N(η, σ²Ω⁻¹) where Ω =diag(ω1, . . . , ωn) is a known weight matrix. In this case an additional hyperprior for the scale parameter σ² has to be specified. Similarly as for the variances of the regression coefficients, an inverse Gamma distributionσ² ∼IG(a0, b0) is a convenient choice.

For Gaussian responses the full conditionals for fixed effects as well as nonlinear functions fj are multivariate Gaussian. Thus, a Gibbs sampler can be used where posterior samples are drawn directly from the multivariate Gaussian distributions. The full conditional γ|·

for fixed effects with diffuse priors is Gaussian with mean

E(γ|·) = (U⁰ΩU)⁻¹U⁰Ω(y−η)˜ (5.36) and covariance matrix

Cov(γ|·) = σ²(U⁰ΩU)⁻¹, (5.37)

where U is the design matrix of fixed effects and ˜η =η−U γ is the part of the additive predictor associated with the other effects in the model. Similarly, the full conditional for the regression coefficients ξj of a function fj is Gaussian with mean

mj =E(ξj|·) = 1

σ²V_j⁰ΩVj+ 1 τ_j²Kj

−1

σ²V_j⁰Ω(y−η−j) (5.38) and covariance matrix

P_j⁻¹ =Cov(ξj|·) = 1

σ²V_j⁰ΩVj+ 1 τ_j²Kj

−1

, (5.39)

whereη−j =η−Vjξj. Although the full conditional is Gaussian, drawing random samples in an efficient way is not trivial, since linear equation systems with a high-dimensional precision matrix P_j must be solved in every iteration of the MCMC scheme. Following Rue (2001), drawing random numbers from p(ξj|·) can be conducted as follows: First the Cholesky decomposition Pj = LL⁰ is computed. Proceeding by solving L⁰ξj = z, where z is a vector of independent standard normal distributed random variables, yields ξj ∼N(0, P_j⁻¹). Afterwards, the meanmj is computed by solvingPjmj = _σ¹2V_j⁰Ω(y−η−j).

This is achieved by first solving Lν = _σ¹2V_j⁰Ω(y−η−j) by forward substitution followed by backward substitution L⁰mj = ν. Finally, adding mj to the previously simulated ξj

yields ξj ∼N(mj, P_j⁻¹).

For all effects considered in Section 4.2 except GRFS, the posterior precision matrices Pj

can be transferred into a band matrix like structure with bandsize depending on the prior.

If fj corresponds to a Markov random field, the posterior precision matrix is usually a sparse matrix but not a band matrix (compare Section 4.2.3.1). In this case the regions of a geographical map must be reordered, for example using the reverse Cuthill-McKee algorithm, to obtain a band matrix like precision matrix. Random samples from the full conditional can now be drawn in a very efficient way using Cholesky decompositions for band matrices or band matrix like matrices (see for example the envelope method for matrices with local bandwidths described in George & Liu 1981).

The full conditionals for the variance parameters τ_j², j = 1, . . . , p, and σ² are all inverse Gamma distributions with parameters

a⁰_j =aj +rank(Kj)

2 and b⁰_j =bj +1

2ξ_j⁰Kjξj (5.40) for τ_j². For σ², we obtain

a⁰₀ =a0+n

2 and b⁰₀ =b0+1

2(y−η)⁰(y−η). (5.41) If more general responses from an exponential family are given, a Metropolis-Hastings-algorithm based on iteratively weighted least squares (IWLS) proposals can be used.

The idea of IWLS updates has been introduced by Gamerman (1997) for the estimation of generalized linear mixed models and adapted to the present situation of structured additive regression in Brezger & Lang (2005).

Suppose we want to update the regression coefficients ξj of the j-th function fj with current value ξ^c_j of the chain. Then, according to IWLS, a new value ξ_j^p is proposed by drawing a random number from the multivariate Gaussian proposal distributionq(ξ_j^c, ξ_j^p) with precision matrix and mean

Pj =V_j⁰W(ξ^c_j)Vj+ 1

τ_j²Kj and mj =P_j⁻¹V_j⁰W(ξ_j^c)(˜y−η−j). (5.42) The working weights and working observations W and ˜y are defined in complete analogy to the discussion in Section 5.2 and the vectorη−j =η−Vjβj is the part of the predictor associated with all remaining effects in the model. The proposed vectorξ^p_j is accepted as the new state of the chain with probability

α(ξ_j^c, ξ_j^p) = min

1,p(ξ_j^p|·)q(ξ_j^p, ξ_j^c) p(ξ_j^c|·)q(ξ^c_j, ξ_j^p)

wherep(ξj|·) is the full conditional for ξj (i. e. the conditional distribution of ξj given all other parameters and the data y).

A fast implementation requires efficient sampling from the Gaussian proposal distribu-tions. Using the same algorithms as for Gaussian responses implies that the number of calculations required to draw random numbers from the proposal distribution is linear in the number of parameters and observations. Also the computation of the acceptance probabilities is linear in the number of observations.

The full conditionals for the variance parametersτ_j² remain inverse gamma with parame-ters

a⁰_j =a_j +rank(K_j)

2 and b⁰_j =b_j +1 2ξ_j⁰K_jξ_j

and updating can be done by simple Gibbs steps, drawing random numbers directly from the inverse gamma densities.

Convergence of the Markov chains to their stationary distributions can be assessed by inspecting sampling paths and autocorrelation functions of sampled parameters. In the majority of cases, however, the IWLS updating scheme has excellent mixing properties and convergence problems do not occur.

From a theoretical point of view, fully Bayesian inference allows for inference in struc-tured additive regression models avoiding the need of the Laplace approximation involved in mixed model based inference. Furthermore a frequently claimed advantage of full over empirical Bayes inference is that the variability caused by the estimation of hyperpara-meters is only considered appropriately by the former. However, in our experience this effect can be neglected in most situations, at least if the data contain enough information.

On the other hand, MCMC inference introduces hyperpriors for the variance parameters and no generally applicable rule for the choice of the hyperparameters is available. While the influence of these hyperparameters is usually small, it may become relevant in sparse data situations. Furthermore, mixing and convergence of several Markov chains have to be monitored when performing MCMC inference to achieve accurate sampling based approximations to the quantities of interest. Mostly this is achieved by a visual inspection

of sampling paths since no generally accepted measure is available to determine the actual convergence.

Although a theoretical comparison of empirical and fully Bayesian inference reveals several differences between both inferential concepts, it should be noted that in many applica-tions and also in most of our simulaapplica-tions, differences where comparably small. If larger differences are observed, this may be an indicator for a weakly identified model or other problems caused by the model formulation. Therefore it may be advisable, to compare the results of both approaches, if possible.

6 BayesX

The mixed model methodology presented in the previous sections and all extensions that will be discussed in the following parts have been implemented in the public domain software package BayesX (Brezger, Kneib & Lang 2005a) as a part of this thesis. The program does not only comprise tools for performing empirical Bayes inference but also allows for full Bayesian inference based on MCMC and for the estimation of Gaussian and nongaussian directed acyclic graphs. Functions for handling and manipulating data sets and geographical maps, as well as for visualizing results are added for convenient use.

In this section, we mainly give an overview about the general usage of BayesX (Section 6.1) and describe the different types of objects existing in BayesX and their specific methods (Section 6.2). Instructions for downloading the program are given in the concluding Section 6.3. A complex example on childhood undernutrition in Zambia will be used for a tutorial-like introduction in Section 7.

6.1 Usage of BayesX

After having started BayesX, a main window divided into four sub-windows appears on the screen. These sub-windows are a command window for entering and executing commands, an output window for displaying results, a review window for easy access to past commands, and an object-browser that displays all objects currently available.

BayesX is object-oriented although the concept is limited, that means inheritance and other concepts of object-oriented languages like C++ or S-Plus are not supported. For every object type a number of object-specific methods may be applied to a particular object. To be able to estimate Bayesian regression models we need a dataset object to incorporate, handle and manipulate data, a remlreg object to estimate semiparametric regression models, and a graph object to visualize estimation results. If spatial effects are to be estimated, we additionally need map objects. Map objects mainly serve as auxiliary objects for remlreg objects and are used to read the boundary information of geographical maps and to compute the neighborhood matrix and weights associated with the neighbors. The syntax for generating a new object in BayesX is

> objecttype objectname

whereobjecttypeis the type of the object, e. g.dataset, andobjectnameis the arbitrarily chosen name of the new object. In the following section, we give an overview on the most important methods of the object types required to estimate Bayesian structured additive regression models.

Im Dokument Mixed model based inference in structured additive regression (Seite 87-91)