• Keine Ergebnisse gefunden

Applications to Smoothed Interpolation

with kEfkNG(Rd) ≤ kfkNG(Ω). Thus we have for f = Ef| ∈ NG(Ω) ⊂ W2k(Ω) for all non-negativek

kfkW2k(Ω)≤ kEfkW2k(Rd) ≤CEkkk/2kEfkNG(Rd) ≤CEkkk/2kfkNG(Ω) . (4.4.1) Similar considerations apply to the inverse multiquadrics

KM(x) =

c2+kxk22

β

, forβ > d/2. The essentially same argument as above (see Theorem 4.4.2) leads to

NM

Rd

⊂W2τ Rd

, and kfkW2τ(Ω)≤CEkkkkfkNM(Ω) . (4.4.2) Applying Corollary 4.2.6 immediately yields

Corollary 4.4.3 Under the assumptions from Theorem 4.2.4 and with the constant for the GaussianE(G)(k) =CEkkk/2 (see Theorem 4.4.1), we obtain for allu∈ NG(Ω)

kukLq(Ω) ≤eclog(ch)/hkukNG(Ω)+c1/hku|Xk(X) .

Analogously withE(M)(k)≤CEkkkands= 1(see Appendix, Theorem 4.4.2) we find for allu∈ NM(Ω)

kukLq(Ω) ≤ec/hkukNM(Ω)+c1/hku|Xkq(X) .

4.5 Applications to Smoothed Interpolation

We shall apply our general results in the case of -possibly regularized- kernel based in-terpolation, as introduced in Section 2.4. To start with, we briefly recall the situation of symmetric reconstruction methods, as outlined in Section 2.3.2. One is given centers X ={x1, . . . xN} ⊂ Rdand data(f1, . . . fN)T ∈RN generated by an unknown function u∈ NK(Ω). One has to solve the system

(K+λId)b=f|X , (4.5.1)

withK= (K(xi−xj))i,j=1...N to build an approximant sλ,X,K(f) (·) := X

xj∈X

bjK(·, xj) .

We point out that the classical interpolant is a special case, namely for choosingλ= 0. It is known [67] that

ksλ,X,K(f)kNK(Ω) ≤ 2kfkNK(Ω) ksλ,X,K(f)|X −f|Xk(X) ≤ √

λkfkNK(Ω) holds.

Theorem 4.5.1 Ifis a compact cube, then there exists a constanth0 >0and constants c,c,˜ ¯c >0such that for all data setsX ⊂Ωwith fill distanceh≤h0we get for Gaussian kernels

kf −sλ,X,K(f)kLq(Ω)≤3

eclog(˜ch)/h+√ λ¯c1/h

kfkNG(Ω)

for allf ∈ NG(Ω)and1≤q ≤ ∞. For the inverse multiquadrics we find

kf−sλ,X,K(f)kLq(Ω)≤3

ec/h+√ λc1/h

kfkNM(Ω)

for allf ∈ NM(Ω)and1≤q≤ ∞.

Remark 4.5.2 In the caseλ= 0, i.e., the standard interpolation we obtain the well known orders for interpolation with Gaussian and inverse multiquadric kernels, respectively [65].

Proof:Proof of Theorem 4.5.1: For the Gaussian kernel we have

kf −sλ,X,K(f)kLq(Ω) ≤ eclog(˜ch)/hkf−sλ,X,K(f)kNG(Ω) + ¯c1/hksλ,X,K(f)|X −f|XkL(X)

≤ 3

eclog(˜ch)/h+√ λ¯c1/h

kfkNG(Ω) ,

and analogous considerations apply to the inverse multiquadrics. 2 From here on we restrict ourselves to the case of Gaussian kernels since all results can be carried over to inverse multiquadrics easily. We write abbreviatelyNK :=NG.

Corollary 4.5.3 For the choiceλ≤e2(clog(˜ch)−log(¯h c)) we get

kf−sλ,X,K(f)kLq(Ω)≤3eclog(˜ch)/hkfkNK(Ω) .

Theorem 4.5.4 If the domainsatisfies an interior cone condition, there exist constants A, B, C, cθ, h0such that for all data setsX⊂Ωwith fill distanceh≤h0and1≤q ≤ ∞

kDα(f−sλ,X,K(f))kLq(Ω)

AeBlog(Ch)/h+√

λcθh−|α|

kfkNK(Ω) . Here the constantsA, B, C, cθdepend only onθ, α, d, q.

Corollary 4.5.5 For the choiceλ≤Ae2Blog(Ch)/hh2|α|we get kf−sλ,X,K(f)kLq(Ω)≤3AeBlog(Ch)/hkfkNK(Ω) .

This shows that we can improve the condition number of the system (4.5.1) at least to the value of λ = Ae2Blog(Ch)/hh2|α| instead of e1/q2 for the Gaussian kernel and still get good approximation orders. We point out that we get rid of the separation distance q := qX := 12min1i<jNkxj−xik2, which can spoil the condition number in case of badly distributed points.

In the next section we present a more general application of sampling inequalities in the deterministic error analysis of kernel based machine learning algorithms.

Chapter 5

Kernel Based Learning

Support Vector (SV) machines and related kernel-based algorithms are modern learning systems motivated by results of statistical learning theory [57]. The concept of SV ma-chines is to provide a prediction function that is accurate on the given training data, and that is sparse in the sense that it can be written in terms of a typically small subset of all sam-ples, called thesupport vectors[50]. Therefore, SV regression and classification algorithms are closely related to regularized problems from classical approximation theory [23], and techniques from functional analysis were applied to derive probabilistic error bounds for SV regression [17].

This chapter provides a theoretical framework to derive deterministic error bounds for some popular SV machines. We show how a sampling inequality from [67] can be used to bound the worst-case generalization error for theν- and theǫ-regression without making any sta-tistical assumptions on the inaccuracy of the training data. In contrast to the literature, our error bounds explicitly depend on the pointwise noise in the data. Thus they can be used for any subsequent probabilistic analysis modelling certain assumptions on the noise distri-bution.

This chapter is organized as follows. Section 5.1 deals with regularized approximation problems in Hilbert spaces with reproducing kernels and outlines the connection to clas-sical SV regression (SVR) algorithms. We provide a deterministic error analysis for the ν- and theǫ-SVR for both exact and inexact training data. Our analytical results showing optimal convergence orders in Sobolev spaces are confirmed by numerical experiments.

5.1 Regularized Problems in Native Hilbert Spaces

In native Hilbert spaces for kernels we consider the following learning or recovery problem.

We assume that we are given (possibly only approximate) function valuesy1, . . . , yN ∈R of an unknown functionf ∈ NK on some scattered pointsX :=

x(1), . . . ,x(N) ⊂ Ω, i.e., f x(j)

≈ yj forj = 1, . . . , N. We point out that we shall use a slightly different notation in this section, which comes from the machine learning literature. In particular, bold letters denote vectors, i.e.,v= (v1, . . . , vd)T ∈Rd. The characterCdoes not denote a generic constant but a fixed parameter of the optimization problems. Generic constants are denoted byC˜or different letters.

55

To control accuracy and complexity of the reconstruction simultaneously, we use the whereC > 0 is a positive parameter, and Vǫ denotes a positive function, which may be parametrized by a positive real numberǫ. In this section C always denotes a positive pa-rameter rather than a constant. We point out thatVǫ need not be a classical loss function.

Therefore we shall give some proofs of results that are well known in the case ofVǫ being a loss function [51] .

Theorem 5.1.1 (Representer theorem) If(sX,y, ǫ)is a solution of the optimization prob-lem (5.1.1), then there exists a vectorw∈RN such that

sX,y(·) =

Proof:Everys∈ NK(Ω)can be decomposed into two parts s=s||+s,

wheres||is contained in the linear span of

K x(1)

, . . . , K x(N),· , ands is con-tained in the orthogonal complement, i.e.,

s||, s

NK = 0. By the reproducing property of the kernelKin the native space we have

s Using this identity the problem (5.1.1) can be rewritten as

s=smin||+s

Since the proof of Theorem 5.1.1 does not depend on the minimality with respect toǫthis result holds also true ifǫis a fixed parameter instead of a primal variable. To be precise we state this result as a corollary.

Corollary 5.1.2 IfsX,yis a solution of the optimization problem

s∈NminK(Ω)

5.2. SUPPORT VECTOR REGRESSION 57