• Keine Ergebnisse gefunden

Part I. Bayesian regularization in the AFT model

6. MCMC inference for the extended AFT model

6.2. Algorithmic variants

6.2.1. Standardization of the baseline error distribution

As outlined in Subsection 2.3 we require constraints on the transformed weights α0 to achieve a standardized baseline error distribution to enforce the interpretation of the predictor ηi as the mean

i Yi

(Y | )θ = µ

E and the scale parameter σ2 as the variance Var(Y | )i θ = σ2Yi of the conditional distribution Y |i θ, compare (2.12) and (2.14). In this case the trace plots of the samples of the global intercept parameter γ0, the scale parameter σ2 and the samples of the mixture weights w( )α0 of the corresponding baseline error distribution Y | ,0 γ σ0 indicate the desired convergence. With an

unconstrained estimation of the transformed mixture weights α0 the mean µ =ε

gk 10=w ( )mk α0 k and the variance σ =2ε

gk 10=w ( )(mk α0 2k+s )2k − µ2ε, of the error distribution ε|α0, (2.8), can take any values µ ∈ε ℝ and σ >ε2 0. The corresponding mean and variance of the baseline error distribution Y | ,0 γ σ0

are given by µY0 = γ + σµ0 ε and σ2Y0 = σ σ2 2ε, compare (2.13), and the interpretation of the parameters γ0 and σ2 as global intercept and variance of the conditional distribution Y |i θ is not feasible. As a further consequence of the unconstrained update of α0, the parameters γ0 and µε of the mean

Y0 0 ε

µ = γ + σµ as well as the parameters σ2 and σ2ε of the variance σ2Y0 = σ σ2 2ε of the conditional distribution Y | ,0 γ σ0 are not identifiable. This is due to the fact, that for any scalar c∈ℝ,

0 0 c

γ = γ + σ

ɶ in combination with µ = µ −ɶε ε c, and σ =ɶ2 c2σ2 in combination with σ =ɶ2ε c2σ2ε, does not change the mean µY0 and the variance σ2Y0. It follows that only the trace plots of the mean µY0 and variance σ2Y0 of the baseline error distribution show the desired stationarity, but stationarity is not indicated by the trace plots of the single components of these expressions and the mixture weights

( )0

w α .

The unconstrained estimation requires also an adjustment of the hyperparameters of the intercept γ0

and the scale parameter σ in the algorithm, if informative instead of diffuse priors are used for these parameters. Consider as an example the scale parameter σ2. In the case of a standardized error distribution ~ (0,1)ε , our prior knowledge of the baseline variance Var(Y |0 γ σ0, ) of the conditional distribution Y | ,0 γ σ0 is reflected by the hyperparameters h1,σ and h2,σ of an inverse gamma distribution, compare (5.18). For an unstandardized baseline error distribution our knowledge concerns the product of the scale parameter and the variance of the baseline error distribution, i. e.

2 2

0 0 1, 2,

ar(Y |γ σ = σ σ, ) ε ~ InvGamma(h , h )σ σ

V . Due to the identification problem of the variance, the

single components σ2ε and σ2 can take any positive value in each iteration and we have to adjust the hyperparameters of the prior for the scale parameter σ2 with respect to the values of σ2ε in the iterations according to σ2~ InvGamma(h , h1,σ 2,σ σ2ε). The same argumentation holds for the intercept and is leading to the adjustment γ0~ N(h1,γ0− σµε, h22,γ0) in every iteration of the sampler.

We advocate the following strategy to obtain a standardized baseline error distribution, which avoids the direct implementation of constraints in the update of the (transformed) mixture weights. Let

g0

k 0 k

k 1w ( )m

ε =

µ =

α and σ =2ε

gk 10= w ( )(mk α0 2k+s )2k − µ2ε denote the mean and the variance of the unstandardized error distribution with density f ( |εα0)=

gk 10= w ( ) ( | m ,s )k α0 ϕ ⋅ k 2k from (2.7). The density of the standardized baseline error distribution εɶ0|α0 is obtained via the transformation

0 ( ε) ε

ε = ε − µɶ σ as f ( |εɶ0α0)=

gk 10= w ( ) ( | m ,s )k α0 ϕ ⋅ ɶ0,k ɶ20,k with basis knots mɶ0,k and basis variances

20,k

sɶ given by

k 2 2k

0,k 0,k 2 0

m s

m ε, s , k 1,...,g

ε ε

= − µ = =

σ σ

ɶ ɶ . (6.31)

This transformation shifts and scales only the basis knots and variances to match the zero mean and unit variance condition, but leaves the mixture weights unchanged. To avoid a change of the posterior we have to ensure, that the mean µY0 = γ + σµ0 ε and the variance σ2Y0 = σ σ2 2ε of the baseline error distribution Y | ,0 γ σ0 , now expressed in terms of the standardized error distribution εɶ0|αɶ0, does not change. With the relationship Y0= γ + σε = γ + σµ + σσ εɶ0 ɶɶ0 0 ε εɶ0 follows that we have to adjust the intercept and the scale parameter according to

2 2 2

0 0 ε, ε

γ = γ + σµ σ = σ σ

ɶ ɶ . (6.32)

To reformulate the standardized baseline error density in terms of the initial basis knots mk and basis variances s2k, we have to recompute the weights at the basis knots mk by solving the linear equation system

0

0

g 2

k k k 0 0

k 1= w (m | m ,s ) f (m |ϕ = ε ), =1,...,g

ɶ ɶ α , (6.33)

with respect to the constraints

gk 10= wɶk =1 and wɶk>0. The standardized baseline error density is then given by f ( |εɶ0αɶ0)=

gk 10= w ( ) ( | m ,s )ɶk αɶ0 ϕ ⋅ k 2k , where αɶ0 denotes the corresponding transformed mixture weights.

In the simulations we try two alternatives. We run the MCMC iterations without the standardization of the baseline error density within the algorithm described in Section 5.2.3 and standardize the error density in post-processing steps with the listed corrections (6.32) applied to the samples of the involved parameters. Alternatively, we standardize the baseline error density according to (6.32) within the sampling algorithm, after every update of the mixture weights, but we leave the weights unchanged to avoid in the next iteration sampling from possibly unfavorable conditional posterior regions arising probably from the recomputation of the mixture weights according to (6.33). In this version the knot positions and basis variances change in each iteration of the sampler and need to be stored in addition to the samples of the parameters. The option for a within-standardization is selected in the function baftpgm() with the argument scalebasis=TRUE. The weights are optionally recomputed after the simulation to show the convergence also in terms of the mixture weights.

6.2.2. Varying the update order of the transformed mixture weights

If we specify the methods method.alpha=“mcondstep” or method.alpha=“slice” in the function baftpgm(), we have the following options to vary the order of the update of the transformed mixture weight in every loop of the sampler (we assume e. g. αg0: 0= for identifiability):

order.alpha=”fix1”. The order of the indices is fixed to (1, 2,3,....,g0−1).

order.alpha=”fix2”. The fixed order of the coefficients is determined in the way, that the coefficient j, which is just updated, does not depend on the coefficients used for the update of the previous coefficient j 1− . If d denotes the used difference order, the update order is 0

j→j (d+ 0+1)→j 2(d+ 0+1)→,…,j 1,...,d= 0+1. This is the default setting.

order.alpha=”random1”. In each update step a random permutation of the indices (1, 2,3,....,g0−1) is used.

order.alpha =”random2”. In the order of the option order.alpha=”fix2” in each step one random cut is used to exchange the update order.

6.2.3. Varying the update of the component labels

We have several alternatives to classify the observations to the mixture components.

method.rlabel=”gibbs”: Random assignment by sampling the labels with p from (6.29), ij which is the default option.

method.rlabel=”fix-maxprob”: Hard assignment to the class with maximum probability

i,max i 0

p =max{p(r =k) : k 1,...,g }= .

method.rlabel=”fix-interval”: Hard assignment to intervals Ij around the knots that build a partition of the domain of the error distribution. E. g. for equidistant knots and homoscedastic basis variances these intervals are Ij=(0.5(mj 1 +m ),0.5(mj j+m )]j 1+ ,

j 1,...,g= 0, and m :1 = −∞, mg 10+ := ∞. 6.2.4. Scale dependent implementation

In addition a variant with scale-dependent covariance matrices in the priors of the regularized predictor components (5.6) and (5.16) is implemented, i. e. ,Σβ = σ2Dτβ and j 2 2j j

α = σ τα

Σ K . With this

parametrization the values of the scale parameter strengthen or relax the regularization. The derivation of the associated full conditionals is straightforward. We can select this option with the argument scaledpri=TRUE.