• Keine Ergebnisse gefunden

Surface Approximation and Penalized Model Selection

To illustrate the challenges of choosing an optimal model in the sense of AIC, we will discuss two approaches: With and without a penalty term regarding the number of coefficients. We start with a data set of sizenobs, which we approximate with an LR B-spline surface by setting, e.g., the tolerance, the maximum number of iterations, the refinement strategy, and the polynomial bidegree of the spline. We call the result of the fitting a modeland considerk possible models. The vectorck contains the estimated coefficients and has a length ncpk. Each model has its own likelihood L(ck): This associates a numerical value to the question how “likely” the model is to the observations. It is convenient to work with the log-likelihood function for the model with the estimatesck, which is defined asl(ck)=log(L(ck)). The likelihood is a measure of goodness of fit and has a meaningonlywhen it is compared with another likelihood computed for another model.

4.2 Surface Approximation and Penalized Model Selection Criteria 43

Approach 1 without penalty term: Case 1

When performing a surface approximation, one could search for the optimal refinement level, i.e., the iteration step from which the algorithm should be stopped because the optimal model has been found. Here we would call model 1 the approx-imation at level 1, model 2 at level 2. To each model is associated a likelihood, computed from the parameter vector of estimated coefficients. As the iteration step increases, its length will increase accordingly, but the corresponding likelihood may increase only slightly. Searching for the minimum of the likelihood without penal-izing for the number of coefficients could lead to an overfitting and ripples in the approximated surface.

Approach 1 without penalty term: Case 2

If we make a first approximation of a scattered point cloud with a tolerance of 0.01, we obtain a parameter vectorc1of lengthncp1; The approximation has a likelihood L(c1). In parallel, we can compute a second model by changing the tolerance to 0.005. Its likelihood isL(c2), withc2of lengthncp2 ncp1. For both models, we stop the refinement after 5 iterations. UsuallyL(c1)=L(c2)and we could state that L(c1) <L(c2). This would lead to the conclusion that the second model is more appropriate to fit the data as its likelihood is higher. This statement is partially true:

The number of coefficients for the second model is much higher than for the first one.

This difference may be unfavorable (i) from a computational point of view, (ii) if overfitting should be avoided due to the presence of noise in the data, or (iii) if a lean model is preferred for storage or subsequent use. A too high number of coefficients should be avoided as ripples and oscillations may occur in the fitted surface.

The penalized criteria address the drawbacks raised in the first approach. In their simple form they are called the Bayesian Information Criterion (BIC) [Sch78] or the Akaike Information Criterion (AIC) [Aka73]. The two criteria are defined as:

B I Ck= −2l(ck)+log(nobs)ncpkand (4.1) A I Ck= −2 [l(ck)]+2ncpk, (4.2)

respectively. They can be seen as statistical alternatives to more usual heuristic con-siderations: The first term in Eqs.4.1and4.2is the log-likelihood, i.e., a measure of the goodness of fit to the data. The second term is a penalty term, which accounts for the increase in complexity. Whenkmodels are compared with each other, the model with the smallest IC is chosen. Choosing the best model within the framework of IC can be seen as finding a balance between these two quantities. The reader is referred

to [Bur02] for the detailed derivation of the IC. In the following, we come back to the two cases with the second approach which accounts for a penalty term.

Approach 2 with penalty term: Case 1

For case 1, we can assume that the likelihood will saturate after a given number of iterations. At the same time, the number of coefficients will still strongly increase with each iteration step. It is likely that a minimum of the BIC and/or the AIC occurs, balancing both values.

Approach 2 with penalty term: Case 2

For case 2, only two models are compared with each other, the choice of the most optimal model is easy to meet if bothA I C2 >A I C1andB I C2 >B I C1, i.e., it can be concluded on the superiority of model 2 with respect to model 1: A tolerance of 0.005 is more optimal than 0.01 for approximating the data at hand, within the context of model selection with IC.

Potentially the BIC and the AIC may come to two different conclusions, i.e., A I C2<A I C1andB I C2 >B I C1. For case 1, this could be that the 3rd step is more optimal for BIC and the 5th for AIC. It is often stated that the BIC underestimates the optimal number of parameters to estimate. On the contrary, the assumption beyond the AIC is that the true model is unknown and unknowable. AIC is good for making asymptotically equivalent to cross-validation, and BIC for consistent estimation. In case of disagreement of the two criteria, other measures of goodness of fit should be added within the context of surface fitting, such as the MAE, the maximum error Maxerr k, ornoutk.

In the following, we skip the subscriptkfor the sake of readability. We refer to Chap.3and recall that the following indicators to judge the goodness of fit will be used additionally:

1. The mean absolute error (MAE) defined as M AE =n1obsnobs

i=1|zj− ˆzj|,z= {zj}njobs=1andzˆ= {ˆzj}njobs=1. We havezˆis the estimatedz-component of the point cloud obtained after thekth iteration.

2. The maximum error is given byMaxerr=maxzˆ−z, 3. The number of points outside a given tolerance:nout,

4. The degree of freedom or number of control pointsncp estimated for a given iteration step of the refinement,

5. The computational timeC T. We have used a stationary desktop with 64 GB of DDR4-2666 RAM. It has a i9-9900 K CPU with 8 cores and 16 threads, but a single core implementation is used in the experiments.

4.3 Improving Information Criterion for Surface Approximation 45