• Keine Ergebnisse gefunden

Remember the model

N/A
N/A
Protected

Academic year: 2022

Aktie "Remember the model"

Copied!
14
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Data analysis:

Statistical principals and computational methods Statistical Learning in MRF-s

Dmitrij Schlesinger, Carsten Rother

SS2014, 02.07.2014

(2)

Remember the model

(3)

Remember the model

GraphG= (V,E),K – label set, F – observation set y∈ Y :VK – labeling, x∈ X :VF – observation An elemantary event is a pair(x, y). Its (negative) energy:

E(x, y) = X

ij∈E

ψij(yi, yj) +X

i∈V

ψi(xi, yi)

Its probability:

p(x, y) = 1

Z exph−E(x, y)i With the partition function:

Z = X

x∈X,y∈Y

exph−E(x, y)i

(4)

Remember the inference with an additive loss

1. Compute marginalprobability distributions for values p(ki0=l|x) = X

k0:k0i=l

p(k0|x)

for each variablei and each valuel

2. Decide for each variable “independently” according to its marginal p.d. and the local lossci

X

l∈K

ci(ki, l)·p(ki0=l|x)→min

ki

This is again a Bayesian Decision Problem – minimize the average loss

(5)

Remember the "question"

How to compute the marginal probability distributions p(yi=l|x) = X

y:yi=l

p(k|x)

It is not necessary to eat up the whole kettle completely in order to test a soup. It is often enough to stir it carefully and take just a spoon.

The idea: instead to sum overall labelings, sample a couple of them according to the target probability distribution and average →the probabilities are substituted by the relative frequencies

(6)

Sampling

Example: the values of a discrete Variablex∈ {1,2,3,4,5,6}

have to be drawn fromp(x) = (0.1,0.2,0.4,0.05,0.15,0.1)

The algorithm: input – p(x), output – a sample from p(x) a[1] =p[1]

for i=2 bis n

a[i] =a[i−1] +p[i]

r=rand[0,1]

for i= 1 bis n

if a[i]> r return i

(7)

Gibbs Sampling

Task – draw anx= (x1, x2. . . xm) (vector) fromp(x) Problem: p(x)is not given explicitly

The way out:

– start with an arbitrary x0

– sample the new one xt+1 "component-wise" from conditional probability distributions

p(xi|xt1. . .xti−1, xti+1. . .xtm)

– repeat it for all components i(Komponenten) many times

After such a sampling procedure (under some mild conditions):

xn does not depend on x0

xn follows the target probability distributionp(x)

(8)

Gibbs Sampling

In MRF-s the conditional probability distributions can be easily computed !!!

The Markovian property

p(yi|yV\i) = p(yi|yN(i))

(i.e. under the condition that the labels in the neighbouring nodes are fixed, N(i) – neighbourhood structure) leads to

p(yi=k|yN(i))∝exp

−ψi(k)− X

j∈N(i)

ψij(k, yj)

(9)

Gibbs Sampling

A relation to Iterated Conditional Modes:

– ICM considers the "conditional energies"

Ei(k) = ψi(k) + X

j∈N(i)

ψij(k, yj)

and decides for the bestlabel – Gibbs Sampling draws new labels

according to the conditional probabilities

p(yi=k|yN(i))∝exph−Ei(k)i

(10)

Maximum Likelihood for MRF-s (supervised)

The Model – no hidden variables, the energy is parameterized by a parameterθ to be learned:

p(y) = 1

Z(θ)exph−E(y;θ)i with Z(θ) = X

y

exph−E(y;θ)i Let a training set L= (y1, y2. . . y|L|)be given.

The Maximum Likelihood reads:

p(L;θ) =Y

l

p(yl;θ) = Y

l

1

Z(θ)exph−E(yl;θ)i→max

θ

Take the logarithm:

F(θ) = lnp(L;θ) = X

l

h−E(yl;θ)−lnZ(θ)i=

=−X

l

E(yl;θ)− |L| ·lnZ(θ)→max

θ

(11)

Maximum Likelihood for MRF-s (supervised)

Consider the derivative with respect to θ (the gradient)

∂F(θ)

∂θ =−X

l

∂E(yl;θ)

∂θ − |L| · lnZ(θ)

∂θ

Apply the chain rule for the second addent:

lnZ(θ)

∂θ = 1

Z(θ)

X

y

exph−E(y;θ)i· −∂E(y;θ)

∂θ =

=−X

y

1

Z(θ)exph−E(y;θ)i· ∂E(y;θ)

∂θ =

=−X

y

p(y;θ)· ∂E(y;θ)

∂θ

(12)

Maximum Likelihood for MRF-s (supervised)

All together (the complete normalized gradient)

∂F(θ)

∂θ =− 1

|L|

X

l

∂E(yl;θ)

∂θ +X

y

p(y;θ)· ∂E(y;θ)

∂θ

The gradient is the difference of two expectations:

∂F(θ)

∂θ =−Edata

"

∂E(y;θ)

∂θ

#

+Emodel

"

∂E(y;θ)

∂θ

#

one over the training set and other over all elementary events.

The first one is called data statistics the second one is the model statistics.

(13)

Maximum Likelihood for MRF-s (supervised)

What is∂E(y;θ)/∂θ ?

Example: let the unknown parameter θ is composed of unknown pairwise potentialsψij(k, k0)(tables for all edges) Consider a particular edge (i, j)and a label pair (k, k0)

∂E(y;ψ)

∂ψij(k, k0) =

( 1 if yi =k, yj =k0 0 otherwise

It follows:

1

|L|

X

l

∂E(y;ψ)

∂ψij(k, k0) =nij(k, k0)

X

y

p(y;ψ)· ∂E(y;ψ)

∂ψij(k, k0) =p(yi=k, yj=k0;ψ) the first addend is the frequencies in the training set the second one is the corresponding marginal probability

(14)

Maximum Likelihood for MRF-s (supervised)

To summarize (for the example, whereψ are learned) Algorithm:

1. Compute nij(k, k0) from the training set 2. Repeat until convergence:

a) Estimate the current marginal probabilities p(yi=k, yj=k0;ψ)(e.g. by Gibbs Sampling)

b) Compute the gradient as p(yi=k, yj=k0;ψ)nij(k, k0) and apply it with a small step size

Further topics: supervised learning for hidden MRF-s, unsupervised learning (by gradient ascent, Expectation Maximization), conditional likelihood (the next lecture) etc.

Referenzen

ÄHNLICHE DOKUMENTE

A variety of technical means is available to improve the fuel efficiency, and it is beyond the scope of the RAINS integrated assessment to model all available options in

Käesolevas töös uuritakse eeliseid ja probleeme, mis sülearvuti igapäevase kasutamisega õppetöös kaasnevad, seda nii uurimuses osalenud õpilaste kui õpetajate poolt

Specif- ically, to allow for negative or below average shocks, we have used the double exponential (Laplace) distribution obtained as the di¤erence between two ex-

So wie die Tiefe des Meeres alle Zeit ruhig bleibt, die Oberfläche mag noch so wüten, ebenso zeigt der Ausdruck in den Figuren der Griechen bei allen Leidenschaften eine große

We compare model results from a 3-dimensional coupled ice-ocean model which disperses the 99Tc from the source to the Nordic Seas and the Arctic Ocean, with recent measurements of

The Money of the Mind and the God of Commodities – The real abstraction. according

The methodology for the structuring of the NDP components by the criterion of innovation receives further development, which makes it possible to prove the importance of the

The situation is further complicated by the fact that rates of profits indicated in relations (10) refer to vertically integrated sectors and not to industries: thus each