• Keine Ergebnisse gefunden

Pattern Recognition

N/A
N/A
Protected

Academic year: 2022

Aktie "Pattern Recognition"

Copied!
17
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Pattern Recognition

Neuron

(2)

Neuron

Hunan vs. Computer

(two nice pictures from Wikipedia)

(3)

Neuron (McCulloch and Pitt, 1943)

Input:

Weights:

Activation:

Output:

Step-function

Sigmoid-function (differentiable!!!)

If

otherwise

(4)

Geometric interpretation

Let be normalized, i.e.

the length of the projection of onto .

Separation plane:

Neuron implements a linear classifier

(5)

Special case − Boolean functions

Input:

Output:

Find and so, that

Disjunction, other Boolean functions, but XOR

(6)

Learning

Given: training data

Find: so that for all For a step-neuron: system of linear inequalities

Solution is not unique in general !

if if

(7)

“Preparation 1”

Eliminate the bias:

The trick − modify the training data

0 0

1

Example in 1D

non-separable without the bias separable without the bias

(8)

“Preparation 2”

Remove the sign:

The trick − the same

for all with for all with

All in all:

if if

(9)

Perceptron Algorithm (Rosenblatt, 1958)

Solution of a system of linear inequalities:

1. Search for an equation that is not satisfied, i.e.

2. If not found − Stop else update

go to 1.

• The algorithm terminates if a solution exists (the training data are separable)

• The solution is a convex combination of the data points

(10)

Proof of convergence

The idea: look for quantities that a) grow/decrease quite fast, b) are bounded.

Consider the length of at n-th iteration:

with

<0, because added by the algorithm

(11)

Proof of convergence

Another quantity − the projection of onto the solution .

With − the Margin

>0, because of the solution

(12)

Proof of convergence

All together:

But (Cauchy-Schwarz inequality) So and finally

If the solution exists,

the algorithm converges after steps at most.

and

(13)

An example problem.

Consider another decision rule for a real valued feature :

It is not a linear classifier anymore but a polynomial one.

The task is again to learn the unknown coefficients given the training data

Is it also possible to do that in a “Perceptron-like” fashion ?

(14)

An example problem.

The idea: reduce the given problem to the Perceptron-task.

Observation: although the decision rule is not linear with respect to , it is still linear with respect to the unknown coefficients

The same trick again − modify the data:

In general, it is very often possible to learn non-linear decision rules by the Perceptron algorithm using an appropriate transformation of the input space (further extension − SVM).

(15)

Kosinec Algorithm

The task: There are many solutions

for the Perceptron in general One has to choose one

Idea:

Search for a “stripe of the maximal width” that separates the data

width ↔ margin

“Maximum margin learning”

(16)

Kosinec Algorithm

After “Preparation 1 and 2”:

(compare with Perceptron)

(17)

Kosinec Algorithm (1963?)

-precise algorithm:

1. Search for an so that

2. If not found − Stop 3. Search for

4. Update go to 1.

The algorithm terminates after a finite number of steps, for

(proof similar to Perceptron), for does not always terminate 

Referenzen

ÄHNLICHE DOKUMENTE

Compare the classification train and test error by using different kernels (linear, polynomial, RBF). Discuss the

Many non-linear decision rules can be transformed to linear ones by the corresponding transformation of the input space. Classification into more than two classes is possible either

In general, it is very often possible to learn non-linear decision rules by the Perceptron algorithm using an appropriate transformation of the input space (further extension

• Longer attention is paid to rhythms with modified meter (2 Æ 3) (n.s.), to unfamiliar tunes (**), and in some cases to melodies with strong tempo modifications (double

Article 1(2) of the Convention defined terrorism as “any act of violence or threat thereof notwithstanding its motives or intentions perpetrated to carry out

In this section we show (following the proof of [5, Theorem 28.1]) that the proximal point type algorithm remains convergent when the function to be minimized is proper, lower

One of the possible interpretations of differences in economic impact of democracy may be that there is a significant difference in impact of varieties of non-democracies,

In this regime, the improvement in decision accuracy at the optimal group size, when compared with infinitely large groups (figure 2a) and to solitary individuals (figure 2b),