Practical Course on Pattern Recognition

(1)

Practical Course on Pattern Recognition

Version: 063006.10 Sommer-Semester 2006

Prof. Dr. Stefan Posch Dipl.Bioinform. Andr´e Gohr (andre.gohr@informatik.uni-halle.de)

Institute of Computer Science University Halle

Series 10

Exercise 10.1(4 points)

Define the back-prop learning rule for a multilayer perceptron that also allows connections (edges between neurons) between non-adjacent layers. But all connections stay feed-forward.

Solution 10.1

Notation: ^kNi denotes the ith neuron in layer k. Layer k has M^k neurons at all.

w^k_N_i(^qNj) denotes the weight of the edge ingoing into neuron^kNioutgoing from neuron

qNj if any. The multilayer perceptron consists of Llayers. LayerLis the output-layer.

P^k_N_i denotes the set of ”predecessor” neurons of^kNi that are neurons having an outgoing edge being directed towards neuron^kNi. D^k_N_idenotes the set of ”direct descendant”

neurons of^kNi.

Further on σ(h^k_N_i) = y^k_N_i denotes the answer of neuron ^kNi being in activation state h^k_N_i. x^k_N_i(^qNj) =y^qNj denotes the input of neuron ^kNi coming from neuron ^qNj. The activation state h^k_N_i of neuron ^kNi in layer k is determined by all weighted inputs (weighted answers of ”predeseccor” neurons):

h^k_N_i = P

dNz∈P_{k N}

i

x^k_N_i(^dNz)w^k_N_i(^dNz) = P

dNz∈P_{k N}

i

y^d_N_z w^k_N_i(^dNz).

~

y_L = (y^L_N₁, . . . , y^L_N

M L) is the vector of outputs of the perceptron. The error-function is denoted by E(~yL, ~t,w) =

M^L

P

i=1

E(y^L_N_i, ti,w). ~t= (t1, . . . , t_M^L) is the vector of target outputs the perceptron should give for a given data-set. wdenotes the set of all weights of the perceptron.

1

(2)

Weights of edges ingoing into any neuron^LN_i of layerL:

4w^L_N_i(^kNj) = −ε ∂E(~y, ~tL,w)

∂w^L_N_i(^kN_j) with: ^kNj ∈ P^L_N_i

= −ε

M^L

X

r=1

∂E(y^L_N_r, tr,w)

∂w^L_N_i(^kN_j)

= −ε

M^L

X

r=1

∂E(yLNr, t_r,w)

∂y^L_N_r

∂yLNr

∂h^L_N_r

∂hLNr

∂w^L_N_i(^kN_j)

= −ε

M^L

X

r=1

∂E(y^L_N_r, tr,w)

∂y^L_N_r σ⁰(h^L_N_r)

∂ P

dNz∈P_LNr

y^d_N_z w^L_N_r(^dN_z)

∂w^L_N_i(^kN_j)

∂ P

dNz∈P LNr

y_dNzw_{k N}

i(^dNz)

∂w_LN

i(^kNj) is always equal to zero unless^dN_z =^kN_j and^LN_r =^LN_i. Hence we get:

4w^L_N_i(^kN_j) = −ε ∂E(y^L_N_i, ti,w)

∂y^L_N_i σ⁰(h^L_N_i)y^k_N_j

= −ε δ^L_N_iy^k_N_j (1)

with: δ^L_N_i = ^∂E(y_∂y^LNⁱ^,tⁱ^,w)

LNi

σ⁰(h^L_N_i).

Weights of edges ingoing into any neuron^L−1N_i of layerL−1:

4w^L−1_N_i(^kN_j) = −ε ∂E(~y, ~t_L,w)

∂w^L−1_N_i(^kN_j) with: ^kN_j ∈ P^L−1_N_i

= −ε

M^L

X

r=1

∂E(y^L_N_r, t_r,w)

∂y^L_N_r

∂h^L_N_r

∂w^L−1_N_i(^kN_j)

= −ε

M^L

X

r=1

∂E(y^L_N_r, t_r,w)

∂y^L_N_r σ⁰(hLNr) X

dNz∈P_LNr

∂ydNzw^L_N_r(^dN_z)

∂y^d_N_z

∂ydNz

∂w^L−1_N_i(^kN_j)

= −ε

M^L

X

r=1

δ^L_N_r X

dNz∈P_LNr

∂y^d_N_zw^L_N_r(^dN_z)

∂y^d_N_z

∂h^d_N_z

∂h^d_N_z w^L−1_N_i(^kNj)

= −ε

M^L

X

r=1

δ^L_N_r X

dNz∈P_LNr

w^L_N_r(^dN_z)σ⁰(h^d_N_z) X

sNt∈P_dNz

∂y^s_N_tw^d_N_z(^sN_t) w^L−1_N_i(^kN_j)

∂y_sNtw_dNz(^sNt)

wL−1Ni(^kNj) is always equal to zero unless ^sN_t = ^kN_j and ^dN_z = ^L−1N_i. Hence we get:

2

(3)

4w^L−1_N_i(^kN_j) = −ε

M^L

X

r=1

δ^L_N_r w^L_N_r(^L−1N_i)σ⁰(h^L−1_N_i)y^k_N_j

4w^L−1_N_i(^kNj) = −ε y^k_N_jσ⁰(h^L−1_N_i)

M^L

X

r=1

δ^L_N_r w^L_N_r(^L−1Ni)

All outgoing edges of neuron ^L−1N_i are ingoing into neurons of layer L since layer L is the last layer. One may state this fact more general: All outgoing edges of neuron

L−1N_i are ingoing into one neuron of D^L−1_N_i.

4w^L−1_N_i(^kN_j) = −ε y^k_N_j σ⁰(h^L−1_N_i) X

zNv∈D_L−1

Ni

δ^z_N_vw_z^N_v(^L−1N_i)

= −ε y^k_N_j σ⁰(h^L−1_N_i)δ^L−1_N_i (2)

with: δ^L−1_N_i = P

zNv∈DL−1Ni

δ^zNv w^zNv(^L−1Ni).

Weights of edges ingoing into any neuron ^mN_i of layer m in analogy to the previous derivation (especially equation (2)):

4w^mNi(^kNj) =−ε y^k_N_jσ⁰(h^mNi)δ^mNi with: ^kNj ∈ P^mNi (3)

with: δ^m_N_i = P

zNv∈D_mN

i

δ^z_N_vw^z_N_v(^mN_i).

3