Algorithmic variants - MCMC inference for the extended AFT model

Part I. Bayesian regularization in the AFT model

6. MCMC inference for the extended AFT model

6.2. Algorithmic variants

6.2.1. Standardization of the baseline error distribution

As outlined in Subsection 2.3 we require constraints on the transformed weights α₀ to achieve a standardized baseline error distribution to enforce the interpretation of the predictor η_i as the mean

i Yi

(Y | )θ = µ

E and the scale parameter σ² as the variance Var(Y | )_i θ = σ²_Y_i of the conditional distribution Y |_i θ, compare (2.12) and (2.14). In this case the trace plots of the samples of the global intercept parameter γ₀, the scale parameter σ² and the samples of the mixture weights w( )α₀ of the corresponding baseline error distribution Y | ,₀ γ σ₀ indicate the desired convergence. With an

unconstrained estimation of the transformed mixture weights α0 the mean µ =ε

∑

^gk 1⁰=w ( )mk α0 k and the variance σ =²ε

∑

^gk 1⁰=w ( )(mk α0 ²k+s )²k − µ²ε, of the error distribution ε|α0, (2.8), can take any values µ ∈ε ℝ and σ >ε² 0. The corresponding mean and variance of the baseline error distribution Y | ,0 γ σ0

are given by µY₀ = γ + σµ0 ε and σ²_Y₀ = σ σ² ²ε, compare (2.13), and the interpretation of the parameters γ0 and σ² as global intercept and variance of the conditional distribution Y |i θ is not feasible. As a further consequence of the unconstrained update of α0, the parameters γ0 and µε of the mean

Y0 0 ε

µ = γ + σµ as well as the parameters σ² and σ²ε of the variance σ²_Y₀ = σ σ² ²ε of the conditional distribution Y | ,0 γ σ0 are not identifiable. This is due to the fact, that for any scalar c∈ℝ,

0 0 c

γ = γ + σ

ɶ in combination with µ = µ −ɶε ε c, and σ =ɶ² c²σ² in combination with σ =ɶ²ε c⁻²σ²ε, does not change the mean µY₀ and the variance σ²_Y₀. It follows that only the trace plots of the mean µY₀ and variance σ²_Y₀ of the baseline error distribution show the desired stationarity, but stationarity is not indicated by the trace plots of the single components of these expressions and the mixture weights

( )0

w α .

The unconstrained estimation requires also an adjustment of the hyperparameters of the intercept γ0

and the scale parameter σ in the algorithm, if informative instead of diffuse priors are used for these parameters. Consider as an example the scale parameter σ². In the case of a standardized error distribution ~ (0,1)ε , our prior knowledge of the baseline variance Var(Y |0 γ σ0, ) of the conditional distribution Y | ,0 γ σ0 is reflected by the hyperparameters h1,σ and h2,σ of an inverse gamma distribution, compare (5.18). For an unstandardized baseline error distribution our knowledge concerns the product of the scale parameter and the variance of the baseline error distribution, i. e.

2 2

0 0 1, 2,

ar(Y |γ σ = σ σ, ) ε ~ InvGamma(h , h )σ σ

V . Due to the identification problem of the variance, the

single components σ²ε and σ² can take any positive value in each iteration and we have to adjust the hyperparameters of the prior for the scale parameter σ² with respect to the values of σ²ε in the iterations according to σ²~ InvGamma(h , h1,σ 2,σ σ²ε). The same argumentation holds for the intercept and is leading to the adjustment γ0~ N(h1,γ₀− σµε, h²2,γ₀) in every iteration of the sampler.

We advocate the following strategy to obtain a standardized baseline error distribution, which avoids the direct implementation of constraints in the update of the (transformed) mixture weights. Let

k 0 k

k 1w ( )m

ε =

µ =

∑

α and σ =²ε

∑

^gk 1⁰= w ( )(mk α0 ²k+s )²k − µ²ε denote the mean and the variance of the unstandardized error distribution with density f ( |ε ⋅ α0)=

∑

^gk 1⁰= w ( ) ( | m ,s )k α0 ϕ ⋅ k ²k from (2.7). The density of the standardized baseline error distribution εɶ0|α0 is obtained via the transformation

0 ( ε) ε

ε = ε − µɶ σ as f ( |εɶ₀ ⋅ α0)=

∑

^gk 1⁰= w ( ) ( | m ,s )k α0 ϕ ⋅ ɶ0,k ɶ²0,k with basis knots mɶ0,k and basis variances

20,k

sɶ given by

k 2 2k

0,k 0,k 2 0

m s

m ^ε, s , k 1,...,g

ε ε

= − µ = =

σ σ

ɶ ɶ . (6.31)

This transformation shifts and scales only the basis knots and variances to match the zero mean and unit variance condition, but leaves the mixture weights unchanged. To avoid a change of the posterior we have to ensure, that the mean µ_Y₀ = γ + σµ₀ ε and the variance σ²_Y₀ = σ σ² ²ε of the baseline error distribution Y | ,₀ γ σ₀ , now expressed in terms of the standardized error distribution εɶ₀|αɶ₀, does not change. With the relationship Y₀= γ + σε = γ + σµ + σσ εɶ₀ ɶɶ₀ ₀ ε εɶ₀ follows that we have to adjust the intercept and the scale parameter according to

2 2 2

0 0 ε, ε

γ = γ + σµ σ = σ σ

ɶ ɶ . (6.32)

To reformulate the standardized baseline error density in terms of the initial basis knots mk and basis variances s²k, we have to recompute the weights at the basis knots mk by solving the linear equation system

g 2

k k k 0 0

k 1= w (m | m ,s ) f (m |ϕ = ε ), =1,...,g

∑

^ɶ ^ℓ ^ɶ ^ℓ ^α ^ℓ ^, ^(6.33)

with respect to the constraints

∑

^g_{k 1}⁰₌ w^ɶ_k =1 and wɶ_k>0. The standardized baseline error density is then given by f ( |εɶ₀ ⋅ αɶ₀)=

∑

^g_{k 1}⁰= w ( ) ( | m ,s )ɶ_k αɶ₀ ϕ ⋅ _k ²_k , where αɶ₀ denotes the corresponding transformed mixture weights.

In the simulations we try two alternatives. We run the MCMC iterations without the standardization of the baseline error density within the algorithm described in Section 5.2.3 and standardize the error density in post-processing steps with the listed corrections (6.32) applied to the samples of the involved parameters. Alternatively, we standardize the baseline error density according to (6.32) within the sampling algorithm, after every update of the mixture weights, but we leave the weights unchanged to avoid in the next iteration sampling from possibly unfavorable conditional posterior regions arising probably from the recomputation of the mixture weights according to (6.33). In this version the knot positions and basis variances change in each iteration of the sampler and need to be stored in addition to the samples of the parameters. The option for a within-standardization is selected in the function baftpgm() with the argument scalebasis=TRUE. The weights are optionally recomputed after the simulation to show the convergence also in terms of the mixture weights.

6.2.2. Varying the update order of the transformed mixture weights

If we specify the methods method.alpha=“mcondstep” or method.alpha=“slice” in the function baftpgm(), we have the following options to vary the order of the update of the transformed mixture weight in every loop of the sampler (we assume e. g. α_g₀: 0= for identifiability):

• order.alpha=”fix1”. The order of the indices is fixed to (1, 2,3,....,g₀−1).

• order.alpha=”fix2”. The fixed order of the coefficients is determined in the way, that the coefficient j, which is just updated, does not depend on the coefficients used for the update of the previous coefficient j 1− . If d denotes the used difference order, the update order is ₀

j→j (d+ ₀+1)→j 2(d+ 0+1)→,…,j 1,...,d= ₀+1. This is the default setting.

• order.alpha=”random1”. In each update step a random permutation of the indices (1, 2,3,....,g0−1) is used.

• order.alpha =”random2”. In the order of the option order.alpha=”fix2” in each step one random cut is used to exchange the update order.

6.2.3. Varying the update of the component labels

We have several alternatives to classify the observations to the mixture components.

• method.rlabel=”gibbs”: Random assignment by sampling the labels with p from (6.29), _ij which is the default option.

• method.rlabel=”fix-maxprob”: Hard assignment to the class with maximum probability

i,max i 0

p =max{p(r =k) : k 1,...,g }= .

• method.rlabel=”fix-interval”: Hard assignment to intervals Ij around the knots that build a partition of the domain of the error distribution. E. g. for equidistant knots and homoscedastic basis variances these intervals are Ij=(0.5(mj 1− +m ),0.5(mj j+m )]j 1+ ,

j 1,...,g= 0, and m :−1 = −∞, mg 1₀+ := ∞. 6.2.4. Scale dependent implementation

In addition a variant with scale-dependent covariance matrices in the priors of the regularized predictor components (5.6) and (5.16) is implemented, i. e. ,Σβ = σ²Dτ_β and −_j ^{2 2}_j j−

α = σ τα

Σ K . With this

parametrization the values of the scale parameter strengthen or relax the regularization. The derivation of the associated full conditionals is straightforward. We can select this option with the argument scaledpri=TRUE.

Im Dokument Bayesian regularization in regression models for survival data (Seite 86-89)