• Keine Ergebnisse gefunden

Number of boosting iterationsmstop

The number of boosting iterationsmstopis the most important parameter of the boosting algorithm since it controls variable selection and overfitting behaviour of the algorithm, including the amount of shrinkage and smoothness of the estimators.

However, in general the danger of overfitting is relatively small for boosting algorithms when weak base learners with small degrees of freedom and small step lengths are used (B ¨uhlmann and Hothorn, 2007). Stopping the boosting algorithm early enough (early stopping) is all the same crucial to induce shrinkage of the estimators towards zero. Shrinkage is desirable since shrunken estimates yield more accurate and stable predictions due to their reduced variance (see, e.g., Hastie et al., 2009). In addition, early stopping is important to employ the inherent variable selection and model choice abilities of boosting (which we will further discuss in Section 4.4).

The optimal number of boosting iterationsmstopfor STAQ models can be determined by cross-validation techniques, such as k-fold cross-cross-validation, bootstrap or subsampling. With each of these techniques, the data is split into two parts: a training and a test sample. Boosting estimation is then carried out on the training sample with a very large initial number of iterations while the empirical risk is evaluated on the test sample (out-of-bag risk) for each boosting iteration. The optimal mstop finally arises as the point of minimal risk of the aggregated empirical out-of-bag risks.

To save computational effort, Mayret al.(2012b) recently proposed a sequential and fully data-driven approach for the search of the optimal mstop. This approach also avoids that the initial number of boosting iterations has to be specified by the user.

By using quantile boosting, the flexibility in estimating the nonlinear effects is considerably increased since the specification of differentiability of the nonlinear effects remains part of the model specification and is not determined by the estimation method itself.

Estimator properties and inference

Boosting with early stopping is a shrinkage method with implicit penalty. As a result, boosting estimators will be biased for finite samples but typically the bias vanishes for increasing sample sizes (B ¨uhlmann and Hothorn, 2007). The number of iterations mstop can be regarded as a smoothing parameter that controls the bias-variance trade-off (B ¨uhlmann and Yu, 2003), and the resulting shrinkage property of boosting estimators is beneficial with respect to prediction accuracy.

Regarding consistency of boosting estimators, B ¨uhlmann and Yu (2003) showed that for aL2 loss function the optimal minimax rate is achieved by component-wise boosting with smoothing splines as base learners. Zhang and Yu (2005) studied consistency and convergence of boosting with early stopping in general. They showed that models fitted by boosting with early stopping attain the Bayes risk. Unfortunately, their results are not directly applicable for quantile regression since the check function is not twice continuously differentiable with respect to η. Thus, an approximation by means of a continuously differentiable function, as for example given by the expectile loss function (see Section 3.5.1), would have to be applied.

Since boosting just yields point estimators, subsampling strategies, such as the bootstrap, have to be applied to obtain standard errors of the estimators. However, this is no fundamental drawback compared to other estimation approaches for STAQ models since most of the approaches also rely on bootstrap to obtain standard errors.

Similar to the majority of the other estimation approaches, quantile boosting does not prevent quantile crossing since the estimation is performed separately for different quantile parameters.

Variable selection

Boosting with early stopping is accompanied with an inherent and data-driven mechanism for variable selection since only the best-performing covariate is updated in each boosting step.

By stopping the algorithm early, less important covariates are not updated and are therefore effectively excluded from the final model.

For example, suppose that a large number of covariates is available in a particular application.

Then the boosting algorithm will start by picking the most influential ones first as those will allow for a better fit to the negative gradient residuals. When the boosting algorithm is stopped after an appropriate number of iterations, spurious non-informative covariates are likely to be not selected.

Thus, boosting combines parameter estimation and variable selection into one single model estimation procedure. When the estimation is additionally conducted on bootstrap samples, not only the variability of the effect estimates is assessed but also the variability of the variable selection process itself.

Boosting also allows for model choice when considering competing modelling possibilities. In this context, the decomposition of a nonlinear functional effect into base learners for linear part and nonlinear deviation is particularly important since the decision on linearity vs. nonlinearity of an effect can be made in a fully data-driven way.

Furthermore, component-wise boosting can be applied inp >> ncases, i.e., for high-dimensional data with more covariates than observations, since a single base learner typically relies on one covariate only and is fitted separately from other base learners. Moreover, problems with multi-collinearity, which in particular arise in high-dimensional data, do not have a negative effect on the estimation accuracy.

Regarding consistency of the variable selection procedure, B ¨uhlmann (2006) studied boosting for linear models with simple linear models as base learners. They pointed out connections to the Lasso and showed that boosting yields consistent estimates for high-dimensional problems.

However, there are no similar results available for additive models to the best of our knowledge.

For additive models an alternative for a formal variable selection procedure is offered by stability selection (Meinshausen and B ¨uhlmann, 2010) which leads to consistent variable selection and controls of the family-wise error rate.

To sum up, boosting provides a unique framework for variable selection in STAQ models. This can be seen as a major advantage of quantile boosting over other estimation approaches, which in the majority of cases only poorly address variable selection issues.

Software

The R packagemboost(Hothornet al., 2010, 2012) provides an excellent implementation of the generic functional gradient descent boosting algorithm presented in Section 4.1, and one can choose between a large variety of different loss functions and base learners.

Quantile regression is applied when specifying the argumentfamily=QuantReg() with the two argumentstaufor the quantile parameter andqoffsetfor the offset quantile. Code examples for estimating STAQ models withmboostwill be given in Chapters 6.1 and 7.1.

To our knowledge,mboostis currently the only software that allows to fit the full variety of different effect types from the structured additive predictor. In comparison to the R packagequantreg, which has established as a standard tool for fitting linear quantile regression models, more complex models with individual-specific and spatial effects, varying coefficient terms and a larger number of smooth nonlinear effects can be fitted bymboost.