ANN: Perceptron Learning

Note: A perceptron assumes that the data is linear separable

Big Note: This is an assumption and not necessarily true! But: In case of linear separability, there are many “good”w~

Note: We are happy withoneseparative vector w~

Note: A perceptron assumes that the data is linear separable Big Note: This is an assumption and not necessarily true!

But: In case of linear separability, there are many “good”w~

Note: We are happy withoneseparative vector w~

ANN: Perceptron Learning

Note: A perceptron assumes that the data is linear separable Big Note: This is an assumption and not necessarily true!

But: In case of linear separability, there are many “good”w~

Note: We are happy withoneseparative vector w~

ANN: Perceptron Learning

Note: A perceptron assumes that the data is linear separable Big Note: This is an assumption and not necessarily true!

But: In case of linear separability, there are many “good”w~

Note: We are happy withoneseparative vector w~

ANN: Perceptron Learning

Question: How do we get the weights w?~

Observation: We look at~x·w~^T ≥0

if output was 0but should have been 1increment weights if output was 1but should have been 0decrement weights if output was correct, don’t change weights

1: w~ =rand(1, . . . , d+ 1)

2: while ERRORdo

3: for (~x_i, y_i)∈ D do

4: w~ =w~ +α·~xi·(yi−fb(~xi))

5: end for

6: end while

Note: α∈R>0 is a stepsize / learning rate

ANN: Perceptron Learning

Question: How do we get the weights w?~ Observation: We look at~x·w~^T ≥0

if output was 0but should have been 1increment weights if output was 1but should have been 0decrement weights if output was correct, don’t change weights

1: w~ =rand(1, . . . , d+ 1)

2: while ERRORdo

3: for (~x_i, y_i)∈ D do

4: w~ =w~ +α·~xi·(yi−fb(~xi))

5: end for

6: end while

Note: α∈R>0 is a stepsize / learning rate

ANN: Perceptron Learning

Question: How do we get the weights w?~ Observation: We look at~x·w~^T ≥0

if output was 0but should have been 1increment weights if output was 1but should have been 0decrement weights if output was correct, don’t change weights

1: w~ =rand(1, . . . , d+ 1)

2: while ERRORdo

3: for(~x_i, y_i)∈ D do

4: w~ =w~ +α·~xi·(yi−fb(~xi))

5: end for

6: end while

Note: α∈R>0 is a stepsize / learning rate

ANN: Perceptron Learning

Question: How do we get the weights w?~ Observation: We look at~x·w~^T ≥0

if output was 0but should have been 1increment weights if output was 1but should have been 0decrement weights if output was correct, don’t change weights

1: w~ =rand(1, . . . , d+ 1)

2: while ERRORdo

3: for(~x_i, y_i)∈ D do

4: w~ =w~ +α·~xi·(yi−fb(~xi))

5: end for

6: end while

Note: α∈R>0 is a stepsize / learning rate

ANN: Perceptron Learning

Update rule: w~new =w~old+α·~xi·(yi−fbold(~xi))

Wrong classification:

Case 1: yi−fb_old(~xi) = 1⇒yi = 1,fb_old(~xi) = 0

fbnew(~xi) = ~xi·(w~new)^T =~xi·(w~_old+α·1·~xi)^T

= ~xi·w~_old^T +α·~xi·~x^T_i =~xi·w~_old^T +α· ||~xi||²

→w~ is incremented and classification is moved towards1 X Case 2: yi−fbold(~xi) =−1⇒yi= 0,fbold(~xi) = 1

fbnew(~xi) = ~xi·(w~new)^T =~xi·(w~old−α·1·~xi)^T

= ~x_i·w~_old^T −α·~x_i·~x^T_i =~x_i·w~_old^T −α· ||~x_i||²

→w~ is decremented and classification is moved towards0X

ANN: Perceptron Learning

Update rule: w~new =w~old+α·~xi·(yi−fbold(~xi)) Wrong classification:

Case 1: yi−fb_old(~xi) = 1⇒yi = 1,fb_old(~xi) = 0

fbnew(~xi) = ~xi·(w~new)^T =~xi·(w~_old+α·1·~xi)^T

= ~xi·w~_old^T +α·~xi·~x^T_i =~xi·w~_old^T +α· ||~xi||²

→w~ is incremented and classification is moved towards1 X Case 2: yi−fbold(~xi) =−1⇒yi= 0,fbold(~xi) = 1

fbnew(~xi) = ~xi·(w~new)^T =~xi·(w~old−α·1·~xi)^T

= ~x_i·w~_old^T −α·~x_i·~x^T_i =~x_i·w~_old^T −α· ||~x_i||²

→w~ is decremented and classification is moved towards0X

ANN: Perceptron Learning

Update rule: w~new =w~old+α·~xi·(yi−fbold(~xi)) Wrong classification:

Case 1: yi−fb_old(~xi) = 1⇒yi = 1,fb_old(~xi) = 0

fbnew(~xi) = ~xi·(w~new)^T =~xi·(w~_old+α·1·~xi)^T

= ~xi·w~_old^T +α·~xi·~x^T_i =~xi·w~_old^T +α· ||~xi||²

→w~ is incremented and classification is moved towards1 X Case 2: yi−fbold(~xi) =−1⇒yi= 0,fbold(~xi) = 1

fbnew(~xi) = ~xi·(w~new)^T =~xi·(w~old−α·1·~xi)^T

= ~x_i·w~_old^T −α·~x_i·~x^T_i =~x_i·w~_old^T −α· ||~x_i||²

→w~ is decremented and classification is moved towards0X

ANN: Perceptron Learning

Update rule: w~new =w~old+α·~xi·(yi−fbold(~xi)) Wrong classification:

Case 1: yi−fb_old(~xi) = 1⇒yi = 1,fb_old(~xi) = 0

fbnew(~xi) = ~xi·(w~new)^T =~xi·(w~_old+α·1·~xi)^T

= ~xi·w~^T_old+α·~xi·~x^T_i =~xi·w~^T_old+α· ||~xi||²

→w~ is incremented and classification is moved towards1 X Case 2: yi−fbold(~xi) =−1⇒yi= 0,fbold(~xi) = 1

fbnew(~xi) = ~xi·(w~new)^T =~xi·(w~old−α·1·~xi)^T

= ~x_i·w~_old^T −α·~x_i·~x^T_i =~x_i·w~_old^T −α· ||~x_i||²

→w~ is decremented and classification is moved towards0X

ANN: Perceptron Learning

Update rule: w~new =w~old+α·~xi·(yi−fbold(~xi)) Wrong classification:

Case 1: yi−fb_old(~xi) = 1⇒yi = 1,fb_old(~xi) = 0

fbnew(~xi) = ~xi·(w~new)^T =~xi·(w~_old+α·1·~xi)^T

= ~xi·w~^T_old+α·~xi·~x^T_i =~xi·w~^T_old+α· ||~xi||²

→w~ is incremented and classification is moved towards1 X

Case 2: yi−fbold(~xi) =−1⇒yi= 0,fbold(~xi) = 1

fbnew(~xi) = ~xi·(w~new)^T =~xi·(w~old−α·1·~xi)^T

= ~x_i·w~_old^T −α·~x_i·~x^T_i =~x_i·w~_old^T −α· ||~x_i||²

→w~ is decremented and classification is moved towards0X

ANN: Perceptron Learning

Update rule: w~new =w~old+α·~xi·(yi−fbold(~xi)) Wrong classification:

Case 1: yi−fb_old(~xi) = 1⇒yi = 1,fb_old(~xi) = 0

fbnew(~xi) = ~xi·(w~new)^T =~xi·(w~_old+α·1·~xi)^T

= ~xi·w~^T_old+α·~xi·~x^T_i =~xi·w~^T_old+α· ||~xi||²

→w~ is incremented and classification is moved towards1 X Case 2: yi−fbold(~xi) =−1⇒yi= 0,fbold(~xi) = 1

fbnew(~xi) = ~xi·(w~new)^T =~xi·(w~old−α·1·~xi)^T

= ~x_i·w~_old^T −α·~x_i·~x^T_i =~x_i·w~_old^T −α· ||~x_i||²

→w~ is decremented and classification is moved towards0X

ANN: Perceptron Learning

Update rule: w~new =w~old+α·~xi·(yi−fbold(~xi)) Wrong classification:

Case 1: yi−fb_old(~xi) = 1⇒yi = 1,fb_old(~xi) = 0

fbnew(~xi) = ~xi·(w~new)^T =~xi·(w~_old+α·1·~xi)^T

= ~xi·w~^T_old+α·~xi·~x^T_i =~xi·w~^T_old+α· ||~xi||²

→w~ is incremented and classification is moved towards1 X Case 2: yi−fbold(~xi) =−1⇒yi= 0,fbold(~xi) = 1

fbnew(~xi) = ~xi·(w~new)^T =~xi·(w~old−α·1·~xi)^T

= ~x_i·w~_old^T −α·~x_i·~x^T_i =~x_i·w~_old^T −α· ||~x_i||²

→w~ is decremented and classification is moved towards0X

ANN: Perceptron Learning

Update rule: w~new =w~old+α·~xi·(yi−fbold(~xi)) Wrong classification:

Case 1: yi−fb_old(~xi) = 1⇒yi = 1,fb_old(~xi) = 0

fbnew(~xi) = ~xi·(w~new)^T =~xi·(w~_old+α·1·~xi)^T

= ~xi·w~^T_old+α·~xi·~x^T_i =~xi·w~^T_old+α· ||~xi||²

→w~ is incremented and classification is moved towards1 X Case 2: yi−fbold(~xi) =−1⇒yi= 0,fbold(~xi) = 1

fbnew(~xi) = ~xi·(w~new)^T =~xi·(w~old−α·1·~xi)^T

= ~x_i·w~^T_old−α·~x_i·~x^T_i =~x_i·w~^T_old−α· ||~x_i||²

→w~ is decremented and classification is moved towards0X

ANN: Perceptron Learning

Update rule: w~new =w~old+α·~xi·(yi−fbold(~xi)) Wrong classification:

Case 1: yi−fb_old(~xi) = 1⇒yi = 1,fb_old(~xi) = 0

fbnew(~xi) = ~xi·(w~new)^T =~xi·(w~_old+α·1·~xi)^T

= ~xi·w~^T_old+α·~xi·~x^T_i =~xi·w~^T_old+α· ||~xi||²

→w~ is incremented and classification is moved towards1 X Case 2: yi−fbold(~xi) =−1⇒yi= 0,fbold(~xi) = 1

fbnew(~xi) = ~xi·(w~new)^T =~xi·(w~old−α·1·~xi)^T

= ~x_i·w~^T_old−α·~x_i·~x^T_i =~x_i·w~^T_old−α· ||~x_i||²

→w~ is decremented and classification is moved towards0X

ANN: Perceptron Learning

Update rule: w~new =w~old+α·~xi·(yi−fbold(~xi))

Correct classification: yi−fb(~xi) = 0

wnew=w~_old, thusw~ is unchanged X Rosenblatt 1958 showed:

Algorithms converges ifD is linear separable Algorithm may have exponential runtime

Variation: Batch processing - Updatew~ after testing all examples

w_new=w~_old+α X

(~xi,yi)∈Dwrong

~x_i·(y_i−fb_old(~x_i))

Usually: Faster convergence, but more memory needed

ANN: Perceptron Learning

Update rule: w~new =w~old+α·~xi·(yi−fbold(~xi)) Correct classification: yi−fb(~xi) = 0

w_new=w~_old, thusw~ is unchanged X

Rosenblatt 1958 showed:

Algorithms converges ifD is linear separable Algorithm may have exponential runtime

Variation: Batch processing - Updatew~ after testing all examples

w_new=w~_old+α X

(~xi,yi)∈Dwrong

~x_i·(y_i−fb_old(~x_i))

Usually: Faster convergence, but more memory needed

ANN: Perceptron Learning

Update rule: w~new =w~old+α·~xi·(yi−fbold(~xi)) Correct classification: yi−fb(~xi) = 0

w_new=w~_old, thusw~ is unchanged X Rosenblatt 1958 showed:

Algorithms converges ifD is linear separable Algorithm may have exponential runtime

Variation: Batch processing - Updatew~ after testing all examples

w_new=w~_old+α X

(~xi,yi)∈Dwrong

~x_i·(y_i−fb_old(~x_i))

Usually: Faster convergence, but more memory needed

ANN: Perceptron Learning

Update rule: w~new =w~old+α·~xi·(yi−fbold(~xi)) Correct classification: yi−fb(~xi) = 0

w_new=w~_old, thusw~ is unchanged X Rosenblatt 1958 showed:

Algorithms converges ifD is linear separable Algorithm may have exponential runtime

Variation: Batch processing - Updatew~ after testing all examples

w_new=w~_old+α X

(~xi,yi)∈Dwrong

~x_i·(y_i−fb_old(~x_i))

Usually: Faster convergence, but more memory needed

Im Dokument DeepLearning on FPGAs Introduction to Artiﬁcial Neural Networks Sebastian Buschj¨ager (Seite 32-53)