Applications to Smoothed Interpolation - Sampling Inequalities and Applications

with kEfk_N_G(^R^d) ≤ kfk_N_G_(Ω). Thus we have for f = Ef|Ω ∈ NG(Ω) ⊂ W₂^k(Ω) for all non-negativek

kfkW₂^k(Ω)≤ kEfk_W₂^k(^R^d) ≤C_E^kk^k/2kEfk_N_G(^R^d) ≤C_E^kk^k/2kfk_N_G(Ω) . (4.4.1) Similar considerations apply to the inverse multiquadrics

K_M(x) =

c²+kxk²2

₋β

, forβ > d/2. The essentially same argument as above (see Theorem 4.4.2) leads to

R^d

⊂W₂^τ R^d

, and kfk_W₂^τ_(Ω)≤C_E^kk^kkfk_N_M_(Ω) . (4.4.2) Applying Corollary 4.2.6 immediately yields

Corollary 4.4.3 Under the assumptions from Theorem 4.2.4 and with the constant for the GaussianE_(G)(k) =C_E^kk^k/2 (see Theorem 4.4.1), we obtain for allu∈ NG(Ω)

kuk_L_q_(Ω) ≤e^c^log(ch)/hkuk_N_G_(Ω)+c^1/hku|Xk_ℓ_∞_(X) .

Analogously withE_(M)(k)≤C_E^kk^kands= 1(see Appendix, Theorem 4.4.2) we find for allu∈ NM(Ω)

kuk_L_q_(Ω) ≤e^c/hkuk_N_M_(Ω)+c^1/hku|Xk_ℓ_q_(X) .

4.5 Applications to Smoothed Interpolation

We shall apply our general results in the case of -possibly regularized- kernel based in-terpolation, as introduced in Section 2.4. To start with, we briefly recall the situation of symmetric reconstruction methods, as outlined in Section 2.3.2. One is given centers X ={x₁, . . . x_N} ⊂ R^dand data(f₁, . . . f_N)^T ∈R^N generated by an unknown function u∈ NK(Ω). One has to solve the system

(K+λId)b=f|X , (4.5.1)

withK= (K(x_i−x_j))_i,j=1...N to build an approximant s_λ,X,K(f) (·) := X

xj∈X

b_jK(·, x_j) .

We point out that the classical interpolant is a special case, namely for choosingλ= 0. It is known [67] that

ks_λ,X,K(f)k_N_K_(Ω) ≤ 2kfk_N_K_(Ω) ks_λ,X,K(f)|X −f|Xk_ℓ_∞_(X) ≤ √

λkfk_N_K_(Ω) holds.

Theorem 4.5.1 IfΩis a compact cube, then there exists a constanth₀ >0and constants c,c,˜ ¯c >0such that for all data setsX ⊂Ωwith fill distanceh≤h0we get for Gaussian kernels

kf −s_λ,X,K(f)k_L_q_(Ω)≤3

e^c^log(˜^ch)/h+√ λ¯c^1/h

kfk_N_G(Ω)

for allf ∈ NG(Ω)and1≤q ≤ ∞. For the inverse multiquadrics we find

kf−s_λ,X,K(f)k_L_q_(Ω)≤3

e^c/h+√ λc^1/h

kfk_N_M(Ω)

for allf ∈ NM(Ω)and1≤q≤ ∞.

Remark 4.5.2 In the caseλ= 0, i.e., the standard interpolation we obtain the well known orders for interpolation with Gaussian and inverse multiquadric kernels, respectively [65].

Proof:Proof of Theorem 4.5.1: For the Gaussian kernel we have

kf −s_λ,X,K(f)k_L_q_(Ω) ≤ e^c^log(˜^ch)/hkf−s_λ,X,K(f)k_N_G_(Ω) + ¯c^1/hks_λ,X,K(f)|X −f|Xk_L_∞_(X₎

≤ 3

e^c^log(˜^ch)/h+√ λ¯c^1/h

kfk_N_G_(Ω) ,

and analogous considerations apply to the inverse multiquadrics. 2 From here on we restrict ourselves to the case of Gaussian kernels since all results can be carried over to inverse multiquadrics easily. We write abbreviatelyNK :=NG.

Corollary 4.5.3 For the choiceλ≤e^2(c^log(˜^ch)−log(¯^h ^c)) we get

kf−s_λ,X,K(f)k_L_q_(Ω)≤3e^c^log(˜^ch)/hkfk_N_K_(Ω) .

Theorem 4.5.4 If the domainΩsatisfies an interior cone condition, there exist constants A, B, C, c_θ, h₀such that for all data setsX⊂Ωwith fill distanceh≤h₀and1≤q ≤ ∞

kD^α(f−s_λ,X,K(f))k_L_q_(Ω)≤

Ae^B^log(Ch)/^√^h+√

λc_θh^−|α|

kfk_N_K(Ω) . Here the constantsA, B, C, c_θdepend only onθ, α, d, q.

Corollary 4.5.5 For the choiceλ≤Ae⁻^2B^log(Ch)/^√^hh²^|^α^|we get kf−s_λ,X,K(f)k_L_q_(Ω)≤3Ae^B^log(Ch)/^√^hkfk_N_K_(Ω) .

This shows that we can improve the condition number of the system (4.5.1) at least to the value of λ = Ae⁻^2B^log(Ch)/^√^hh²^|^α^| instead of e⁻^1/q² for the Gaussian kernel and still get good approximation orders. We point out that we get rid of the separation distance q := q_X := ¹₂min₁_≤_i<j_≤_Nkx_j−x_ik₂, which can spoil the condition number in case of badly distributed points.

In the next section we present a more general application of sampling inequalities in the deterministic error analysis of kernel based machine learning algorithms.

Chapter 5

Kernel Based Learning

Support Vector (SV) machines and related kernel-based algorithms are modern learning systems motivated by results of statistical learning theory [57]. The concept of SV ma-chines is to provide a prediction function that is accurate on the given training data, and that is sparse in the sense that it can be written in terms of a typically small subset of all sam-ples, called thesupport vectors[50]. Therefore, SV regression and classification algorithms are closely related to regularized problems from classical approximation theory [23], and techniques from functional analysis were applied to derive probabilistic error bounds for SV regression [17].

This chapter provides a theoretical framework to derive deterministic error bounds for some popular SV machines. We show how a sampling inequality from [67] can be used to bound the worst-case generalization error for theν- and theǫ-regression without making any sta-tistical assumptions on the inaccuracy of the training data. In contrast to the literature, our error bounds explicitly depend on the pointwise noise in the data. Thus they can be used for any subsequent probabilistic analysis modelling certain assumptions on the noise distri-bution.

This chapter is organized as follows. Section 5.1 deals with regularized approximation problems in Hilbert spaces with reproducing kernels and outlines the connection to clas-sical SV regression (SVR) algorithms. We provide a deterministic error analysis for the ν- and theǫ-SVR for both exact and inexact training data. Our analytical results showing optimal convergence orders in Sobolev spaces are confirmed by numerical experiments.

5.1 Regularized Problems in Native Hilbert Spaces

In native Hilbert spaces for kernels we consider the following learning or recovery problem.

We assume that we are given (possibly only approximate) function valuesy₁, . . . , y_N ∈R of an unknown functionf ∈ NK on some scattered pointsX :=

x⁽¹⁾, . . . ,x^(N) ⊂ Ω, i.e., f x^(j)

≈ y_j forj = 1, . . . , N. We point out that we shall use a slightly different notation in this section, which comes from the machine learning literature. In particular, bold letters denote vectors, i.e.,v= (v₁, . . . , v_d)^T ∈R^d. The characterCdoes not denote a generic constant but a fixed parameter of the optimization problems. Generic constants are denoted byC˜or different letters.

To control accuracy and complexity of the reconstruction simultaneously, we use the whereC > 0 is a positive parameter, and V_ǫ denotes a positive function, which may be parametrized by a positive real numberǫ. In this section C always denotes a positive pa-rameter rather than a constant. We point out thatV_ǫ need not be a classical loss function.

Therefore we shall give some proofs of results that are well known in the case ofV_ǫ being a loss function [51] .

Theorem 5.1.1 (Representer theorem) If(s_X,y, ǫ^∗)is a solution of the optimization prob-lem (5.1.1), then there exists a vectorw∈R^N such that

sX,y(·) =

Proof:Everys∈ NK(Ω)can be decomposed into two parts s=s_||+s_⊥,

wheres_||is contained in the linear span of

K x⁽¹⁾,·

, . . . , K x^(N),· , ands_⊥ is con-tained in the orthogonal complement, i.e.,

s_||, s_⊥

NK = 0. By the reproducing property of the kernelKin the native space we have

s Using this identity the problem (5.1.1) can be rewritten as

s=smin_||+s_⊥

Since the proof of Theorem 5.1.1 does not depend on the minimality with respect toǫthis result holds also true ifǫis a fixed parameter instead of a primal variable. To be precise we state this result as a corollary.

Corollary 5.1.2 IfsX,yis a solution of the optimization problem

s∈NminK(Ω)

5.2. SUPPORT VECTOR REGRESSION 57

Im Dokument Sampling Inequalities and Applications (Seite 57-61)