• Keine Ergebnisse gefunden

Adaption of the Bayesian NMIG selection criterion

Part III. Simulations

11. CRR-type models

11.5. Adaption of the Bayesian NMIG selection criterion

px = 20 px = 60 px = 160 px = 200 ˆ 0

β ≠0 β ≠

ˆ 0 β =0 β =

ˆ 0 β ≠0 β ≠

ˆ 0 β =0 β =

ˆ 0 β ≠0 β ≠

ˆ 0 β =0 β =

ˆ 0 β ≠0 β ≠

ˆ 0 β =0 β =

BEST 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5

CPL.PenL 0.445 0.288 0.411 0.332 0.299 0.380 0.220 0.416 CPL.BL-HS.STD 0.393 0.396 0.358 0.406 0.187 0.477 0.120 0.486 CPL.BR-HS.STD 0.404 0.348 0.383 0.346 0.238 0.437 0.196 0.450 CPL.BN-HS.STD 0.287 0.492 0.229 0.485 0.142 0.473 0.103 0.474 CPL.BL-HS.CRI 0.284 0.490 0.232 0.488 0.045 0.499 0.011 0.499 CPL.BR-HS.CRI 0.301 0.483 0.264 0.470 0.068 0.498 0.031 0.499 CPL.BN-HS.CRI 0.208 0.500 0.162 0.497 0.094 0.490 0.060 0.490 CPL.BN-HS.IND 0.301 0.489 0.241 0.484 0.151 0.468 0.111 0.470

Table 11.3: Average fraction of correctly classified coefficients for the CRR models after variable selection with increasing number of covariates. Especially β ≠ˆ 0,β ≠0 denotes the case that the estimated effect is nonzero

(β ≠ˆ 0) when the corresponding true effect is nonzero (β ≠0), and β =ˆ 0,β =0 denotes the case that the estimated effect is zero (β =ˆ 0) when the corresponding true effect is zero (β =0).

Li and Lin (2010) utilize in the context of the Bayesian elastic net prior the receiver operating characteristic (ROC) curve to adapt the α-level of the credible interval in the HS.CRI criterion. They improve the variable selection accuracy by plotting the correct inclusion rate (sensitivity) against the false inclusion rate (1-specificity) along the range of α in simulations and suggest using α =0.5 in practice, because a higher level of α results in a higher sensitivity but a lower specificity with the elastic net prior. Besides an adjustment of our HS.CRI region this ROC based method provides also another method to determine the HS.IND threshold, but we did not investigate this topic so far.

Results

Figure 11.30 and Figure 11.31 show the impact of these modifications on the MSE of the estimated regression coefficients. Figure 11.30 summarizes the results for models CRR 1 to CRR 3 from Subsection 11.1 and Figure 11.31 those with the higher-dimensional predictor from Subsection 11.4.

As expected, none of the modified selection rules does improve the MSE performance in the simulation model CRR 1 with clearly separable large and small effects and estimated posterior inclusion probabilities close to 1 and 0, compare left panel of Figure 11.30. For the remaining simulations CRR 2 (middle panel) and CRR 3 (right panel), the largest improvements are achieved with the first modification HS.IND.1, i. e. by adapting the selection threshold to the lower value 0.1.

CPL.BN CPL.BN-HS.STD CPL.BN-HS.CRI CPL.BN-HS.IND CPL.BN-HS.IND.1 CPL.BN-HS.IND.2 CPL.BN-HS.IND.3

0.000.050.100.150.20

MSE Beta

CRR1

CPL.BN CPL.BN-HS.STD CPL.BN-HS.CRI CPL.BN-HS.IND CPL.BN-HS.IND.1 CPL.BN-HS.IND.2 CPL.BN-HS.IND.3

0.000.050.100.150.200.250.300.35

CRR2

CPL.BN CPL.BN-HS.STD CPL.BN-HS.CRI CPL.BN-HS.IND CPL.BN-HS.IND.1 CPL.BN-HS.IND.2 CPL.BN-HS.IND.3

0.000.050.100.150.20

CRR3

Figure 11.30: Mean squared errors of the regression coefficient estimates, MSE( )βˆ , under the Bayesian NMIG prior and the associated variable selection methods in the simulation models CRR 1 to CRR 3. The additional boxes show the results under the modified HS.IND selection rule. HS.IND.1: Selection threshold 0.1. HS.IND.2:

Selection threshold 0.5 and the values of the nonzero regression coefficient estimates are computed using the subsample where the indicator equals v . HS.IND.3: Combination of HS.IND.1 and HS.IND.2. 1

With decreasing value of the HS.IND-threshold the MSE of the resulting final model moves in direction of the MSE of the model CPL.BN, which includes all covariates in the predictor. In models CRR 1 and CRR2, where the effects are smaller and not clearly separated, we have seen that the application of the HS.IND criterion clearly decreases the MSE performance. In such situation it turns out that adapting the threshold value to the smaller observed posterior inclusion probabilities is a reasonable strategy to improve the predictive performance. With this line of action it is possible to get sparse final models with comparable good performance as the CPL.BN model. But, the improvement of the adaptation of the HS.IND-threshold is always limited by the MSE of the full CPL.BN model, and in particular in models CRR 2 and CRR 3 we obtain smaller MSE values with other regularization

methods, like the ridge regularization. In the high-dimensional simulations, compare Figure 11.31, we obtain a similar result as for models CRR 2 and CRR 3. Due to the decreased estimated inclusion probabilities, the MSE of the CPLBN-HS.IND model is clearly increased in comparison to the CPL.BN model, and decreasing the threshold moves the MSE of the CPLBN-HS.IND in direction of the MSE of the full CPL.BN model.

0.00.51.01.5

MSE Beta

p=20

0.00.51.01.52.02.53.0 p=60

CPL.BN CPL.BN-HS.STD CPL.BN-HS.CRI CPL.BN-HS.IND CPL.BN-HS.IND.1 CPL.BN-HS.IND.2 CPL.BN-HS.IND.3

010203040

MSE Beta

p=160

CPL.BN CPL.BN-HS.STD CPL.BN-HS.CRI CPL.BN-HS.IND CPL.BN-HS.IND.1 CPL.BN-HS.IND.2 CPL.BN-HS.IND.3

010203040

p=200

Figure 11.31: Mean squared errors of the regression coefficient estimates, MSE( )βˆ , under the Bayesian NMIG prior and the associated variable selection methods in the CRR model with increasing number of covariates. The additional boxes show the results under the modified HS.IND selection rule. HS.IND.1: Selection threshold 0.2.

HS.IND.2: Selection threshold 0.5 and the values of the nonzero regression coefficient estimates are computed using the subsample where the indicator equals v . HS.IND.3: Combination of HS.IND.1 and HS.IND.2. 1

The absolute values of the coefficient estimates, constructed under the HS.IND2 and HS.IND3 modification, are in general larger, since the samples with associated value Ij=v0 are ignored. Both modifications do not improve the MSE, and the MSE resulting from the HS.IND2 criterion is clearly increased in almost all models (not in CRR 1). So, we note that also the shrinkage of the “larger”

effects improves the predictive performance, in particular in the higher-dimensional cases. We refer again to the application in Section 14 which shows similar results from the practical perspective.

Final remarks

In summary, the Bayesian NMIG prior performs best in sparse models, where covariates have mainly

“small” and “large” effects as in model type CRR 1. In the higher dimensions also the reduced shrinkage of “larger” effects, if present, causes an improvement of the predictive performance. In models with “moderate” or “smaller” effect sizes, like model types CRR 2 and CRR 3, the Bayesian ridge or lasso prior achieve the best performance results. We have seen that in models with various effect sizes the posterior inclusion probabilities for the covariates, as provided by the NMIG prior,

reflect very well the importance of the covariates. But, as previously observed with the AFT simulations, variable selection guided by the induced ranking of the covariates shows in general no improvement of the predictive performance, even in models of CRR 1 type. We have also seen that variable selection may improve the predictive performance, but often full models yield comparable or higher performances as sparse final models.