• Keine Ergebnisse gefunden

Neural Networks

N/A
N/A
Protected

Academic year: 2021

Aktie "Neural Networks"

Copied!
34
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Neural Networks

Stefan Edelkamp

(2)

1 Overview

- Introduction - Percepton - Hofield-Nets

- Self-Organizing Maps

- Feed-Forward Neural Networks - Backpropagation

(3)

2 Introduction

Idea: Mimic principle of biological neural networks with artificial neural networks

1

2

3 4 5

6

7 8

9

- adapt settled solutions of nature - parallelization ⇒ high performance - redundancy ⇒ tolerance for failures

(4)

Ingrediences

Needs for an artificial neural network:

• behavior artificial neurons

• order of computation

• activation function

• structure of the net (topology)

• recurrent nets

• feed-forward nets

• integration in environment

• learning algorithm

(5)

Percepton-Learning

. . . very simple network with no hidden neurons Inputs: x, weighted with w, weights added

Activating Function: Θ

Output: z, determined by computing Θ(wTx)

Additional: weighted input representing constant 1

(6)

Training

f : M ⊂ IRd → {0,1} net function

1. initialize counter i and initial weight vector w0 to 0

2. as long as there are vectors for which wix ≤ 0 set wi+1 to wi + x and increase i by 1

3. return wi+1

(7)

Termination on Training Data

Assume vector w to be normalized, and w to be final with ||w|| = 1

- f = Θ((x,1)w) , constants δ and γ with |(x,1)w| ≥ δ and ||(x,1)|| ≤ γ - for angle between wi and w we have 1 ≥ cosαi = wTi w/||wi||

- wi+1T w = (wi + xi)Tw = wiTw + xTi w - xT0w ≥ δ ⇒ wi+1T w ≥ δ(i + 1)

- ||wi+1|| =

q

(wi + xi)T(wi + xi)q||wi||2 + ||xi||2 + 2wTi xi

q

||wi||2 + γ2 ≤ γ√

i + 1 (Induction: ||wi|| ≤ γ√ i)

⇒ cosαi ≥ δ√

i + 1/γ converges to ∞ (as i goes to ∞)

(8)

3 Hopfield Nets

Neurons: 1 2 . . . d

Activations: x1 x2 . . . xd ; xi ∈ {0,1}

Connections: wij ∈ IR (1 ≤ i, j ≤ d) with

wii = 0, wij = wji ⇒ W := hwiji

d×d

Update: asynchronous & stochastic x0j :=

0 if Pdi=1xiwij < 0 1 if Pdi=1xiwij > 0 xj otherwise

(9)

Example

x3

1

−2 x1

3 x2

W =

0 1 −2 1 0 3

−2 3 0

Use:

• associative memory

• computing Boolean functions

• combinatorical optimization

(10)

Energy of a Hopfield-Net

x = (x1, x2, . . . , xd)T ⇒ E(x) :=12 xT W x =Pi<j xi wij xj be the energy of a Hopfield net

Theorem Every update, which changes the Hopfield-Netz, reduces the energy.

Proof Assume: Update changes xk into x0k ⇒ E(x) − E(x0) =X

i<j

xi wij xj + X

i<j

x0i wij x0j

= − X

j6=k

xk wkj xj + X

j6=k

x0k wkj x0j

=x|{z}j

= −xk + (−x0k) X

j6=k

wkj xj > 0

(11)

Solving a COP

Input: Combinatorial Optimization Problem (COP) Output: Solution for COP

Algorithm:

• select Hopfield net with parameters of COP as weights and solution at minimum of energy

• start net with random activation

• computer sequence of updates until stabilization

• read parameters

• test feasibility and optimality of solution

(12)

Multi Flop-Problem

Problem Instance: k, n ∈ IN , k < n

Feasible Solutions: ˜x = (x1, . . . , xn) ∈ {0,1}n Objective Function: P(˜x) = Pni=1 xi

Optimal Solution: solution ˜x with P(˜x) = k

Minimization Problem: d = n + 1, xd = 1, x = (x1, x2, . . . , xn, xd)T ⇒ E(x) = Xd

i=1 xi − (k + 1)2

=

d X

i=1

x2i

=x|{z}i

+ X

i6=j

xi xj − 2(k + 1)

d X

i=1

xi + (k + 1)2

= X

i6=j

xi xj − (2k + 1)

d−1 X

i=1

xi xd + k2

= −1 2

X

i<j

xi(−4)xj − 1 2

X

i<d

xi (4k + 2)xd + k2

(13)

Example

(n = 3, k = 1):

x1

x3

−2

−2

1

1 1

−2 x2 x4

(14)

Traveling Salesperson-Problem (TSP)

Problem Instance:

Cities: 1 2 . . . n

Distances: dij ∈ IR+ (1 ≤ i, j ≤ n) with dii = 0 Feasible Solution: permutation π of (1,2, . . . , n)

Objective Functions: P(π) = Pni=1dπ(i),π(i mod n+1)

Optimal Solutions: feasible solution π with minimal P(π)

(15)

Encoding

Idea: Hopfield-Net with d = n2 + 1 neurons:

+

−d32

−d12

−d23

−d21

π(i)

i

Problem: ”‘Size”’ of the weights to allow both feasible and good solutions

Trick: Transition to continuous Hopfield-Net and modified weights ⇒ good solution of TSP

(16)

4 Self-Organizing Maps (SOM)

Neurons:

Input: 1,2, . . . , d for components xi

Map: 1,2, . . . , m; regular (linear-, rectangular, hexagonial-) Grid r to store pattern vectors µi ∈ IRd

Output:1,2, . . . , d for µc Update:

L ⊂ IRd, learning set; at time t ∈ IN+, x ∈ L is chosen by random ⇒ c ∈ {1, . . . , m} determined with

kx − µck ≤ kx − µik (∀i ∈ {1, . . . , m})

and adapted to the pattern: µ0i := µi + h(c, i, t) (x − µi) ∀i ∈ {1, . . . , m}

with h(c, i, t) time-dependent neighborhood-relation

and h(c, i, t) → 0 for t → ∞, e.g.: h(c, i, t) = α(t) · exp −krc − rik2 2σ(t)2

!

(17)

Application of SOM

. . . include:

visualization and interpretation, dimension reduction schemes, clustering, and

classification, COPs . . .

(18)

A size- 50 map adapts to a triangle

(19)

A 15 × 15 -Grid is adapted to a triangle

(20)
(21)
(22)

SOM for Combinatorial Optimization

-TSP

Idea: Use growing ring (elastic band) of neurons

Tests with n ≤ 2392 show that the running time scales linearly and deviates from the optimum by less than 9 %

(23)

SOM for Combinatorial Optimization

(24)

10 neurons 50 neurons

(25)
(26)

SOM for Combinatorial Optimization

Tour with 2526 neurons:

(27)
(28)

5 Layered Feed-Forward Nets (MLP)

1 2 3

(29)

Formalization

A L-layered MLP (multi-layered perceptron) Layer: S0, S1, . . . , SL−1, SL

Connection: Of each neuron i in S` to j in S`+1 with weight wij, exept 1-neurons

Update: layer-wise synchronously mixed x0j := ϕ

X

i∈V(j) xi wij

with ϕ differenciable,

z.B. ϕ(a) = σ(a) = 1+exp(−a)1

5 -5

(30)

Layered Feed-Forward Nets

Applications: function approximation, classification

Theorem: All Boolean functions can be computed with a 2-layered MLP (no proof)

Theorem: continuous real functions and their derivatives can be jointly approximated to an arbitrary precision on a compact sets

(no proof)

(31)

Learning Parameters in MLP

Given: x1, . . . , xN ∈ IRd und t1, . . . , tN ∈ IRc, MLP with d input and c output neurons,

w = w1, . . . , wM contains all weights, f(x,w) is the net function Task: find optimal w, that minimizes the error

E(w) := 1 2

N X

n=1 c X

k=1

fk(xn,w) − tnk2

partial derivatives of f exists with respect to the inputs and the parameters

⇒ any gradient-based optimization methods can be used (conjugated gradient, . . . )

wE(w) =

N X

c

X fk(xn,w) − tnkwfk(xn,w)

(32)

Backpropagation

Basic Calculus:

∂t f(g(t))

t=t

0

= ∂

∂s f(s)

s=g(t

0)

!

∂t g(t)

t=t

0

!

Example: ϕ(a) := 9 − a2, x = (1,2)T, w = (1,1)T, t = 2:

w2

x1

x2

+ w1

.2

2

ϕ

t

f E

(33)

wE(w)|w=(1,1)T =

h(x, y) = x ∗ y ⇒ ∂/∂x h(x, y) = y h(x, y) = x + y ⇒ ∂/∂x h(x, y) = 1 h(x, y) = x − y ⇒ ∂/∂x h(x, y) = 1 ϕ(x) = 9 − x2 ⇒ ∂/∂x ϕ(x) = −2x

h(x) = x2/2 ⇒ ∂/∂x h(x) = x

(34)

Backpropagation

Theorem: ∇wE(w) can be computed in time O(N × M) if network is of size O(M)

Algorithm:

∀n ∈ {1, . . . , N}

• compute net functions f(xn,w) and associated error E in forward direction store values in net

• compute partial derivatives of E with respect to all intermediates in backward direction and add all parts for total gradient

Referenzen

ÄHNLICHE DOKUMENTE

(In fact, the author observed that it is not hard to get that abstract theorem back from Theorem 3 using compactness argu- ments.) Putinar’s solution to the moment problem can then

We start by introducing important back- ground information in Chapter 2, which includes extensive reviews of the exper- imental results on criticality (Sec. 2.5), and the

of a finite number of points) the problem of solving quasilinear systems can be reduced to that of solving several linear systems of algebraic equations (we shall illustrate

alignment of the film with public policy on pandemic preparedness and the 2009/10 influenza pandemic gestures towards the interconnections of pandemic, expert public health

Its technical feasibility and its benefits will be showcased by exemplary privacy filters applied to personal or business information, using both a commercial ISOBUS simulator and

Assume that before an avalanche, the vector of all membrane potentials (except for the neuron which gets external input) is uniformly distributed in [0, 1) N , let α be the

As we pointed out, the Baire space E is Choquet, and by [23, Theorem 2.2], the existence of a separately continuous everywhere discontinuous function on the product B × L of a

The basic dierences of our approach to other existing embeddings are that there are no equivalence classes (as in [13], [15]) and secondly, that dierences of directed convex sets in R