• Keine Ergebnisse gefunden

Machine Learning Kernel-PCA

N/A
N/A
Protected

Academic year: 2022

Aktie "Machine Learning Kernel-PCA"

Copied!
12
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Machine Learning Kernel-PCA

Dmitrij Schlesinger

WS2014/2015, 08.12.2014

(2)

Principal Component Analysis

Problem – (very) high dimensions of feature spaces:

Especially im Computer Vision

– Feature is 5×5image path → feature space is R25 – SIFT: 18 histograms of 8 directions → vector in R128 Idea: the feature space is represented in another basis.

Assumption:

the directions of small variances correspond to noise and can be neglected

Approach:

project the feature space onto a linear subspace so that the variances in the projected space are maximal

(3)

Principal Component Analysis

A simplified example: data are centered, the subspace is one-dimensional, i.e. it is represented by a normlized vector

||e||2 = 1. Projection of an x on e ishx, ei. Hence, the task is

X

l

hxl, ei2 →min

e s.t. ||e||2 = 1 Lagrangian:

X

l

hxl, ei2+λ(||e||2−1)→min

λ max

e

Gradient with respect to e:

X

l

2hxl, ei ·xl+ 2λe= 2eX

l

xlxl+ 2λe= 0 e·cov =λe

(4)

Principal Component Analysis

e·cov =λe

e is an eigenvector andλ is the corresponding eigenvalue of the covariance matrix. Which one ?

For a pair e and λ the corresponding variance is

X

l

hxl, ei2 =e·X

k

xlxl·e=e·cov·e=||e||2·λ=λ

→chose the eigenvector corresponding to the maximal eigenvalue.

Similar approach: project the feature space into a subspace so that the summed squared distance between the points and their projections is minimal→ the result is the same.

(5)

Principal Component Analysis

Summary (for higher-dimensional subspaces Rm):

1. Compute the covariance matrix cov =P

l

xlxl 2. Find all eigenvalues and eigenvectors

3. Sort the eigenvalues in decreasing order

4. Choose m eigenvectors for them first eigenvalues 5. The n×m projection matrix consists of m columns, each

one being the corresponding eigenvector

Are projections onto alinear subspace good? Alternatives?

(6)

Do it with scalar products

The optimal direction vector can be expressed as a linear combination of data points, i.e. it is contained in the subspace that isspanned by the data points.

e=X

i

αixi

Note: in high-dimensional spaces it may happen that all data points lie in a linear subspace, i.e. do not span the whole space (e.g. the dimension of the space is higher as the number of the data points). It will be important later for the feature spaces.

Why is it so? Proof by contradiction: Assume, it is not the case – the optimal vector is not contained in the subspace that is spanned by the data points. Project it into the subspace – the subject become better.

(7)

Do it with scalar products

Let us do the task a bit more complicated – consider projections of the direction vector onto all data vectors (instead of the vector itself):

λ·e=cov·eλ·(xTk ·e) = xTk ·cov·e ∀k The right side follows from the left one directly.

The opposite is less trivial (see the board). It holds only ife can be represented as a linear combination of xi – here it is just the case (see the previous slide).

(8)

Do it with scalar products

Put it all together:

cov = 1 l

X

j

xjxTj, e=X

i

αixi, λ·(xTk ·e) =xTk ·cov·e ∀k

λ·(xTk ·X

i

αixi) =xTk · 1 l

X

j

xjxTj ·X

i

αixi ∀k λlX

i

αixTkxi =X

i

αiX

j

xTkxjxTjxi ∀k

LetK be the matrix of pair-vise scalar products: Kij =xTi xj Then (in the matrix form)

λlKα=K2αλlα=

The PCA can be expressed using scalar products only !!!

(9)

Do it with scalar products

λlα= with an unknown vector α= (α1, α2. . . αl)

→basically the same task: find eigenvalues (this time however of the matrixK instead of the covariance matrix cov).

Letα-s be known (already found). At the test time new data points are projected onto e, the length of the projection is

hx, ei=hx,X

i

αixii=X

i

αihx, xii

⇒the quantity of interest can be computed using scalar products, the direction vector is not explicitly necessary.

(10)

Using kernels

Now we would like to find a direction vector in the feature space.

All the matter remains exactly the same, but with the Kernel matrix

Kij =hΦ(xi),Φ(xj)i=κ(xi, xj) The projection onto the optimal direction is

hΦ(x), ei=hΦ(x),X

i

αiΦ(xi)i=X

i

αiκ(x, xi)

Kernel-PCA

(11)

An illstration

hΦ(x), ei=X

i

αiκ(x, xi) A linear function in the feature space ↔ a non-linear function in the input space.

(12)

Literature

Bernhard Schölkopf, Alexander Smola, Klaus-Robert Müller Kernel Principal Component Analysis (1997)

In the paper all the matter is presented immediately for the feature spaces and kernels.

Some further illustrations

Input space Polynomial kernel Gaussian kernel

Referenzen

ÄHNLICHE DOKUMENTE

Not surprising then, the PR ap- proach suggests higher abatement levels to be optimal, and the economists’ policy recommendations in terms of opti- mal carbon emission might

Rat liver ll/(- hydroxysteroid dehydrogenase complementary deoxyribonucleic acid encodes oxoreductase activity in a mineralocorticoid- responsive toad bladder cell line.. Escher

The cointegration test, shown in Table 9, (see Engle & Granger, 1987; Engle and Yoo, 1987, Table 2), shows that in the two cases with monthly data (models 5 and 6),

The European Union must take advantage from the Greek crisis in order to repair the structural defects of the economic and monetary union issued from the Maastricht Treaty

A broad comparison between Tables 2 and 3 suggests that the 44 studies taking the individual elements interaction approach provide far more mixed results with respect to any

As voluntary-use facilities, libraries will grow emptier unless they create an ambience where the physical space matches the cultural space of modern technology..

We have found that after immobilization by high-pressure freezing (HPF) in a number of different cell types, inner and outer membranes are in very close apposition: (Figures 1 to

Vrtti and abhinaya are classified on the basis of the innate human