• Keine Ergebnisse gefunden

kΛΨ>vjk ≤mj+k. (4.39)

Now since trivially, the 2-norm bounds each individual component of a vector, and [ΛΨ>vj]` = λ`ψ`(X)>vj/√

n(because [ΛΨ>]`i`[Ψ]i``ψ`(Xi)/√ n),

√1

n|λ`ψ`(X)>vj| ≤mj+k. (4.40) The kernel matrix K has at most rank r. Therefore, if r < n, and the columns of V are sorted in non-increasing order, mr+1, . . . , mn, andvr+1, . . . , vn lie in the nullspace ofK, and are orthogonal to the image ofK which lies insidev1, . . . , vr. On the other hand, the image of Kis also spanned by the columns ofΨ. Therefore,Ψ>vj= 0 forr+ 1≤j≤n.

We conclude this section by relating the size of the pseudo-inverse to kΨ>Ψ−Ik, which was called therelative error term kCkin Chapter 3.

Recall that the norm of the pseudo inverse Ψ+ is the inverse of the smallest singular value of Ψ:

+k= 1/σn(Ψ). (4.41)

The singular values are the square roots of the eigenvalues ofΨ>Ψ. First of all, note thatkΨ>Ψ− Ik= maxii>Ψ)−1|, and thereforekΨ>Ψ−Ik →0 implies thatλi>Ψ)→1 for all 1≤i≤n.

Furthermore,

1−λn>Ψ)≤ max

1≤i≤ni>Ψ)−1| ≤ kΨ>Ψ−Ik ⇒ λn>Ψ)≥1− kΨ>Ψ−Ik. (4.42) Therefore,σn(Ψ) =p

λn>Ψ)≥p

1− kΨ>Ψ−Ik, and it follows that kΨ+k= 1

σn(Ψ) ≤ 1

p1− kΨ>Ψ−Ik. (4.43)

We have proven the following lemma:

Lemma 4.44 Under the conditions of the previous theorem, it holds that

+k ≤ 1

p1− kΨ>Ψ−Ik. (4.45)

Thus, since kΨ>Ψ−Ik →0almost surely, it follows that kΨ+k →1.

4.8 Eigenvector Perturbations for General Kernel Functions

The next step consists in relating the scalar products between sample vectors of eigenfunctions and the eigenvectors of the degenerate kernelsK[r]. In Lemma 4.28, we have seen that this is ac-complished by multiplying these scalar products with the scalar products between the eigenvectors ofKand K[r]:

√1

n|λ`ψ`(X)>uj| ≤

r

X

j=1

|u>ivj| 1

√n|λ`ψ`(X)>vj|. (4.46) In Theorem 4.34, we have proved that

√1

n|λ`ψ`(X)>vj| ≤mj+k. (4.47) Therefore,

r

X

j=1

|u>ivj| 1

√n|λ`ψ`(X)>vj| ≤ kΨ+k

r

X

j=1

|u>ivj|mj. (4.48)

The last sum is the expression we will study in this section.

Recall that u1, . . . , un are the eigenvectors ofK andv1, . . . , vr are those ofK[r]. We interpret Kas being an additive perturbation of K[r],K=K[r]+Er. The vector

s= u>iv1, . . . , u>ivr

. (4.49)

contains the coefficients of ui with respect to the eigenbasis of K[r]. Therefore, these scalar products measures the perturbation of vi to ui induced by the additive perturbation Er. If kErk = 0, ui =vi, and since the vj are orthogonal, only [s]i = 1, with all other entries being zero. For non-zero perturbations,uiwill be perturbed away fromvi leading to a spreading of the coefficients away from the configuration of all coefficients being zero except for [s]i= 1. We wish to study the amount of this perturbation, and the effect this has on the sumPr

j=1|u>ivj|mj. The first question is addressed by a family of general results on perturbation of eigenvectors, known assin-theta-theorems.

The following Lemma is a special case of (Davis and Kahan, 1970, Theorem 6.2) (see also (Eisenstat and Ipsen, 1994; Stewart and Sun, 1990))

Lemma 4.50 LetAbe a symmetricn×nmatrix with spectral decompositionULU>. LetUandL be partitioned as follows.

U= [U1U2], L=

L1 0 0 L2

, (4.51)

where U1 ∈ Mn,k, L1 ∈ Mk, U2 ∈ Mn,n−k, and L2 ∈ Mn−k. Furthermore, let E be another symmetric matrix and A˜ = A+E. Let ˜l be an eigenvalue of A˜ and x˜ an associated unit-length eigenvector. Then,

kU>2xk ≤˜ kEk

minn−k≤i≤n|˜l−li|. (4.52)

Proof It holds that

(A+E)˜x= ˜lx ⇒ Ex˜= (˜lI−A)˜x. (4.53) Therefore,

kEk ≥ kE˜xk=k(˜lI−A)˜xk=k(˜lI−ULU>)˜xk (4.54) This norm becomes smaller when we only consider the lastn−kcomponents of the resulting vector.

This part is computed by(˜lI−U2L2U>2)˜x. Therefore, we continue (4.54):

kE˜xk ≥ k(˜lI−U2L2U>2)˜xk=kU2(˜lI−L2)U>2xk,˜ (4.55) becauseU2U>2=I. Finally,

kU2(˜lI−L2)U>2xk˜ =k(˜lI−L2)U>2xk ≥˜ min

n−k≤i≤n|˜l−li|kU>2xk.˜ (4.56) Dividing bymini|˜l−li| concludes the proof of the lemma.

This lemma has a simple corollary for the case where one considers scalar products between individual eigenvectors ofAand ˜A:

Corollary 4.57 Denote the eigenvalues ofK by li and those ofK[r] by mj. Let the corresponding eigenvectors beui, andvj respectively. Then,

|u>ivj| ≤ kErk

|li−mj|∧1 =:ωij (4.58)

wherea∧b= min(a, b). The numbersωij will be called perturbation coefficients.

Proof The corollary follows from the previous lemma by setting A=K[r], ˜A=K,E=Er, and settingU2 equal to an×1 matrix equal to vj. The scalar product cannot become larger than 1 because|u>ivj| ≤ kuikkvjk= 1, andui,vj are unit length vectors.

4.8. Eigenvector Perturbations for General Kernel Functions 71

0 20 40 60 80 100

10−12 10−10 10−8 10−6 10−4 10−2 100

i, j

eigenvalue

original and perturbed eigenvalue for ||E|| = 1e−09 mj li

(a) The original and perturbed eigenvalues.

0 20 40 60 80 100

10−10 10−8 10−6 10−4 10−2 100

j

||E||/|li − mj|

perturbation coefficients ωij

i = 5 i = 10 i = 20 i = 30

(b) The resulting perturbation coefficients.

Figure 4.2: Example plots for perturbation coefficientsωij. IfkErkis small and the eigenvalues de-cay quickly, the large eigenvalues are well-separated such that the perturbation of the eigenvectors is negligibly small.

This is a classical result which is usually paraphrased as the perturbation being small if the eigenvalues are well-separated. In our case, we assume that the eigenvalues decay to zero, such that the eigenvalue become clustered around 0 and seem anything but well-separated. However, note that the separation is measured at the scale ofkErk. In Chapter 3, we have seen thatkErk →0 as rincreases, such thatkErkwill be rather small typically, and eigenvalues which are close together can be well-separated nevertheless. Now, for|li−mj|>kErk we can re-write

ωij = 1

|li−mj| kErk

. (4.59)

Typically, j 7→ ωij will have the following shape for fixed i (see Figure 4.2). In Figure 4.2(b), each line describes the characteristics of the perturbation of a single eigenvector. The pertur-bation coefficient ωij will be 1 for eigenvalues mj which are closer than kErk to li. For larger eigenvalues, ωij drops off fairly quickly, as it does for smaller eigenvalues, although it eventually starts to reach a plateau and not decay further. Roughly stated, if li is still much larger than kErk, and li is isolated, then ωij will have a single peak of 1 at ωii and be negligibly small for ωi1, . . . , ωi,j−1, ωi,j+1, . . . , ωin. For small eigenvaluesli, the perturbation can be rather severe, al-though we see that the perturbation will occur mostly in the direction of eigenspaces to comparably small eigenvalues.

Now, we return to the sum

r

X

j=1

ωijmj. (4.60)

We will show that outside of a relatively small set aroundliij will be of the order of kErk.

Lemma 4.61 Consider

ωijmj=

kErk

|li−mj| ∧1

mj. (4.62)

Then,

mj≥2li ⇒ ωijmj ≤2kErk, (4.63)

mj ≤1

2li ⇒ ωijmj ≤ kErk. (4.64)

Proof For this proof, we will drop the superscript r onEr for convenience. First, note that for mj = 2li,

kEkmj

|li−mj| = 2kEkli

2li−li

= 2kEk. (4.65)

Furthermore, it holds that for mj > li, mj 7→ kEkmj/(mj−li) decreases monotonously as mj increases.

For the second inequality, observe that formj =12li, kEkmj

|li−mj| =

1 2kEkli

li12lj

=kEk, (4.66)

and ifmj< li,mj7→ kEkmj/(li−mj) is decreasing monotonously asmj decreases.

Based on the last lemma, we can bound the sum (4.60) as follows:

Lemma 4.67 Define the set

J(li) =n

j∈ {1, . . . , r}

1

2li ≤mj ≤2lio

. (4.68)

Then, withC(li) =|J(li)|,

r

X

j=1

ωijmj≤2liC(li) + 2rkErk. (4.69) Proof It holds that

r

X

j=1

ωijmj= X

j∈J(li)

ωijmj+ X

j /∈J(li)

ωijmj. (4.70)

For j ∈ J(li), ωijmj ≤ mj ≤ 2li, and for j /∈ J(li), ωijmj ≤ 2kEk by the previous lemma.

Therefore,

r

X

j=1

ωijmj≤ X

j∈J(li)

2li+ X

j /∈J(li)

2kEk ≤2liC(li) + (r−C(li))kEk. (4.71) Since C(li) will be rather small typically, we can simplify the bound by omitting the second occurrence of theC(li) term. This completes the proof of the lemma.

Note that of the two terms in (4.69), only the first term 2liC(li) does not scale with kErk.

This term relates to the number of eigenvalues which cluster aroundli. Therefore, we see that the perturbation is basically confined to the cluster aroundli.