• Keine Ergebnisse gefunden

Machine Learning

N/A
N/A
Protected

Academic year: 2022

Aktie "Machine Learning"

Copied!
14
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Machine Learning

Support Vector Machines

(2)

Linear Classifiers (recap)

A building block for almost all – a mapping ,

a partitioning of the input space into half-spaces that correspond to classes.

Decision rule:

is the normal to the hyper plane

(3)

Two learning tasks

Let a training dataset be given with (i) data and (ii) classes

The goal is to find a hyper plane that separates the data (correctly)

________________________________________________________

Now: The goal is to find a “corridor”

(stripe) of the maximal width that separates the data (correctly).

(4)

Linear SVM

Remember that the solution is defined only up to a common scale

→ Use canonical (with respect to the learning data) form in order to avoid ambiguity:

The margin:

The optimization problem:

(5)

Linear SVM

The Lagrangian of the problem:

The meaning of the dual variables v :

a) (a constraint is broken) → maximization

wrt. gives: (surely not a minimum) b) → maximization wrt. gives →

no influence on the Lagrangian

c) → does not mater, the vector is

located “on the wall of the corridor” – Support Vector

(6)

Linear SVM

Lagrangian:

Derivatives:

The solution is a linear combination of the data points.

(7)

Linear SVM

Substitute into the decision rule and obtain

→ the vector is not needed explicitly !!!

The decision rule can be expressed as a linear combination of scalar products with support vectors.

Only strictly positive (i.e. those corresponding to the support vectors) are necessary for that.

(8)

Linear SVM

Substitute

into the Lagrangian

and obtain the dual task

(9)

Feature spaces

1. The input space is mapped onto a feature space by a non- linear transformation

2. The data are separated (classified) by a linear decision rule in the feature space

Example: quadratic classifier

The transformation is

(the images are separable in the feature space)

(10)

Linear SVM + Feature spaces = Kernels

The images are not explicitly necessary in order to find the separating plane in the feature space, but their scalar products

For the example above:

→ the scalar product can be computed in the input space, it is not necessary to map the data points onto the feature space explicitly.

(11)

Kernels

Kernel is a function that implements scalar product in a feature space

Neither the corresponding space nor the mapping need to be specified thereby explicitly → “Black Box”.

“Alternative” definition: if a function is a kernel, then there exists such a mapping , that … The corresponding feature space is called the Hilbert space induced by the kernel . Let a function be given. Is it a kernel?

→ Mercer’s theorem.

(12)

Combining Kernels

Let and be two kernels.

Than are kernels as well

(there are also other possibilities to build kernels from kernels).

Popular Kernels:

• Polynomial:

• Sigmoid:

• Gaussian: (interesting : )

(13)

An example

The decision rule with a Gaussian kernel

(14)

Conclusion

• SVM is a representative of discriminative learning – i.e. with all corresponding advantages (power) and drawbacks (overfitting) – remember e.g. the Gaussian kernel with

• The building block – linear classifiers. All formalisms can be

expressed in terms of scalar products – the data are not needed explicitly.

Feature spaces – make non-linear decision rules in the input spaces possible.

Kernels – scalar product in feature spaces, the latter need not be necessarily defined explicitly.

• Note: this all works only, if the data are separable !!!

Literature (names):

• Bernhard Schölkopf, Alex Smola ...

Referenzen

ÄHNLICHE DOKUMENTE

enable two Soyuz spacecraft to rendezvous, and a docking system locks the two spacecraft together. A separate living compartment in each Soyuz is used for

We have found that after immobilization by high-pressure freezing (HPF) in a number of different cell types, inner and outer membranes are in very close apposition: (Figures 1 to

The cointegration test, shown in Table 9, (see Engle & Granger, 1987; Engle and Yoo, 1987, Table 2), shows that in the two cases with monthly data (models 5 and 6),

Key words: source, sink, wise exploitation, brute force, fear, education, emotions, hope, hate, fix cost, variable cost, productivity, game theory,

Key words: source, sink, ensemble, group selection, selfishness, wise exploitation, brute force, fear, education, emotions, hope, hate, fix cost, variable cost, productivity,

In its proposals for the Pharmaceutical Strategy for Europe, 3 The Guild has already acknowl- edged the idea of a European Health Data Space (EHDS) and

For training set size of ∼ 118k (90% of data set) we have found the additional out-of-sample error added by ML to be lower or as good as DFT errors at B3LYP level of theory relative

a certain graph, is shown, and he wants to understand what it means — this corre- sponds to reception, though it involves the understanding of a non-linguistic sign;