• Keine Ergebnisse gefunden

Intelligent Systems:

N/A
N/A
Protected

Academic year: 2022

Aktie "Intelligent Systems:"

Copied!
47
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Intelligent Systems:

Undirected Graphical models (Factor Graphs) (2 lectures)

Carsten Rother

15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM

(2)

Roadmap for next two lectures

• Definition and Visualization of Factor Graphs

• Converting Directed Graphical Models to Factor Graphs

• Probabilistic Programing

• Queries and making decisions

• Binary-valued Factors graphs: Models and Optimization (ICM, Graph Cut)

• Multi-valued Factors Graphs: Models and Optimization

(Alpha Expansion)

(3)

Reminder: Structured Models - when to use what representation?

15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM 3

Directed graphical model: The unknown variables have different

“meanings”

Example: MaryCalls, JohnCalls(J), AlarmOn(A), burglarInHouse(B)

Factor Graphs: the unknown variables have all the same “meaning”

Examples: Pixel in an image, nuclei in C-Elegans (worm)

Undirected graphical model are used, instead of Factor graphs,

when we are interested in studying “conditional independency” (not relevant for our context)

(4)

Reminder: Machine Learning: Structured versus Unstructured Models

Structured Output Prediction:

• 𝑓/𝑝: Z → X (for example: X = R or X = N )

Important: the elements in X do not make independent decisions

𝑛 𝑛

𝑚 𝑛

Definition (not formal)

The Output consists of several parts, and not only the parts themselves contain information, but also the way in which the parts belong together.

Example: Image Labelling (Computer Vision)

Input: Image (Z𝑚) Output: Labeling (K𝑚)

K has a fixed vocabulary, e.g. K={Wall, Picture,

Person, Clutter,…}

Important: The labeling of neighboring pixels is

highly correlated

(5)

Reminder: Machine Learning: Structured versus Unstructured Models

15/01/2015 Intelligent Systems: Undirected Graphical Models 5

Structured Output Prediction:

• 𝑓/𝑝: Z → X (for example: X = R or X = N )

Important: the elements in X do not make independent decisions

𝑛 𝑛

𝑚 𝑛

Example: Text Processing

Output: X𝑚 (Parse tree of a sentence) Input: Text (Z𝑚)

“The boy went home”

Definition (not formal)

The Output consists of several parts, and not only the parts themselves contain

information, but also the way in which the parts belong together.

(6)

Factor Graph model - Example

• A Factor Graph defines a distribution as:

𝑓: Partition function so that distribution is normalized 𝐹: Factor

𝔽: Set of all factors

𝑁(𝐹):Neighbourhood of a factor

𝜓𝐹: function (not distribution) depending on 𝒙𝑁(𝐹) (𝜓𝐶: 𝐾|𝐶| → 𝑅 where 𝑥𝑖 ∈ 𝐾)

𝑝 𝒙 = 1

𝑓 𝐹∈𝔽

𝜓𝐹(𝒙𝑁 𝐹 ) where 𝑓 = 𝒙 𝐹∈𝔽 𝜓𝐹(𝒙𝑁 𝐹 )

𝑝 𝑥1, 𝑥2 = 1

𝑓 𝜓1 𝑥1, 𝑥2 𝜓2 𝑥2

• Example

𝒙𝑁 1 = {𝑥1, 𝑥2}

𝑥𝑖 ∈ 0,1 ; 𝐾 = 2

𝜓1 0,0 = 1; 𝜓1 0,1 = 0;𝜓1 1,0 = 1;𝜓1 1,1 = 2 𝒙𝑁 2 = {𝑥2} 𝜓2 0 = 1; 𝜓2 1 = 0;

𝑓 = 1 ∗ 1 + 0 ∗ 0 + 1 ∗ 1 + 2 ∗ 0 = 2 Check yourself that: 𝒙𝑝 𝒙 = 1

(7)

Factor Graph model - Visualization

15/01/2015 Intelligent Systems: Undirected Graphical Models 7

𝑝 𝑥1, 𝑥2 = 1

𝑓 𝜓1 𝑥1, 𝑥2 𝜓2 𝑥2

• Example

𝒙𝑁 1 = {𝑥1, 𝑥2}

𝑥𝑖 ∈ 0,1 ; 𝐾 = 2

𝜓1 0,0 = 1; 𝜓1 0,1 = 0;𝜓1 1,0 = 1;𝜓1 1,1 = 2 𝒙𝑁 2 = {𝑥2} 𝜓2 0 = 1; 𝜓2 1 = 0;

𝑓 = 1 ∗ 1 + 0 ∗ 0 + 1 ∗ 1 + 2 ∗ 0 = 2 Check yourself that: 𝒙𝑝 𝒙 = 1

visualizes a factor node visualizes a variable node

means that these variables are in one factor

𝑥1 𝑥2

Visualization of:

𝑝 𝑥1, 𝑥2 = 1

𝑓 𝜓1 𝑥1, 𝑥2 𝜓2 𝑥2 𝜓1

𝜓2

For visualization: utilize an undirected Graph 𝐺 = (𝑉, 𝐹, 𝐸), where 𝑉, 𝐹 are the set of nodes and 𝐸 the set of Edges

(8)

Factor Graph model - Visualization

• Example:

visualizes a factor node visualizes a variable node

means that these variables are in one factor

𝑝 𝑥1, 𝑥2, 𝑥3, 𝑥4, 𝑥5 = 1

𝑓 𝜓 𝑥1, 𝑥2, 𝑥4 𝜓 𝑥2, 𝑥3 𝜓 𝑥3, 𝑥4 𝜓 𝑥4, 𝑥5 𝜓 𝑥4 𝜓𝑠 are specified in some way

𝑥1 𝑥2

𝑥3 𝑥4

𝑥5

Visualization

(9)

Probabilities and Energies

15/01/2015 9

𝑝 𝒙 =

1

𝑓 𝐹∈

𝔽 𝜓

𝐹

(𝒙

𝑁 𝐹

) =

1

𝑓 𝐹∈

𝔽 exp{−𝜃

𝐹

(𝒙

𝑁 𝐹

)} =

1

𝑓

exp{ −

𝐹∈

𝔽 𝜃

𝐹

𝑥

𝑁 𝐹

} =

1

𝑓

exp{ −𝐸(𝒙) } The energy 𝐸 𝒙 is just a sum of factors:

E 𝒙 =

𝐹∈𝔽

𝜃𝐹 𝑥𝑁 𝐹

The most likely solution 𝑥

is reached by minimizing the energy:

𝑥

= 𝑎𝑟𝑔𝑚𝑎𝑥

𝑥

𝑝 𝑥 𝑥

= 𝑎𝑟𝑔𝑚𝑖𝑛

𝑥

𝐸(𝑥)

Note:

1) If 𝑥 is minimizer of 𝑓(𝑥) then also of log 𝑓(𝑥) (note that 𝑥1 ≤ 𝑥2 means that log 𝑥1 ≤ log 𝑥2)

2) It is: log 𝑃 𝑥 = −log 𝑓 − 𝐸 𝒙 = constant − 𝐸(𝒙)

Intelligent Systems: Undirected Graphical Models

(10)

Names

• The Probability distribution: 𝑝 𝒙 =

1

𝑓

exp{−𝐸(𝒙)}

𝐸 𝒙 =

𝐹∈

𝔽

𝜃

𝐹

𝑥

𝑁 𝐹

is a so-called Gibbs distribution with energy:

• We define the order of a Factor Graph as the arity (number of variables) of the largest factor. Example of an order 3 model:

arity 3 arity 2

𝐸 𝒙 = 𝜃 𝑥1, 𝑥2, 𝑥4 + 𝜃 𝑥2, 𝑥3 + 𝜃 𝑥3, 𝑥4 + 𝜃 𝑥5, 𝑥4 + 𝜃(𝑥4) arity 1

• A different name for factor graph / undirected graphical model is

Markov Random Field. This is an extension of Markov Chains to Fields.

The name Markov stands for the “Markov property” that means essentially that the order of a factor is small.

and 𝑓 =

𝒙

exp{−𝐸(𝒙)}

(11)

Examples: Order

15/01/2015 11

4-connected;

pairwise MRF

Higher-order RF

𝐸(𝒙) = 𝜃

𝑖𝑗

(𝑥

𝑖

, 𝑥

𝑗

)

𝑖, 𝑗Є𝑁4

higher(8)-connected;

pairwise MRF

Order 2 Order 2 Order n

𝐸(𝒙) = 𝜃

𝑖𝑗

(𝑥

𝑖

, 𝑥

𝑗

)

“Pairwise energy” “higher-order energy”

𝐸(𝒙) = 𝜃

𝑖𝑗

(𝑥

𝑖

, 𝑥

𝑗

)

𝑖, 𝑗Є𝑁8 𝑖, 𝑗Є𝑁4

+𝜃(𝑥1, … , 𝑥𝑛)

Intelligent Systems: Undirected Graphical Models

(12)

Roadmap for next two lectures

• Definition and Visualization of Factor Graphs

• Converting Directed Graphical Models to Factor Graphs

• Probabilistic Programing

• Queries and making decisions

• Binary-valued Factors graphs: Models and Optimization (ICM, Graph Cut)

• Multi-valued Factors Graphs: Models and Optimization

(Alpha Expansion)

(13)

Converting Directed Graphical Model to Factor Graph

15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM 13

𝑥1 𝑥2

𝑥3

𝑝 𝑥1, 𝑥2, 𝑥3 = 𝑝 𝑥3 𝑥2 𝑝 𝑥2 𝑥1 𝑝(𝑥1)

A simple case:

𝑥1 𝑥2

𝑥3

𝑝 𝑥1, 𝑥2, 𝑥3 =

𝑓 = 1

𝜓 𝑥3, 𝑥2 = 𝑝 𝑥3 𝑥2 𝜓 𝑥2, 𝑥1 = 𝑝 𝑥2 𝑥1 𝜓 𝑥1 = 𝑝(𝑥1)

where:

1

𝑓 𝜓 𝑥3, 𝑥2 𝜓(𝑥2, 𝑥1) 𝜓(𝑥1)

(14)

Converting Directed Graphical Model to Factor Graph

A more complex case:

𝑓 = 1

𝜓 𝑥1, 𝑥2, 𝑥3 = 𝑝(𝑥1|𝑥2, 𝑥3) 𝜓 𝑥2 = 𝑝(𝑥2)

𝜓 𝑥3, 𝑥4 = 𝑝(𝑥3|𝑥4) 𝜓 𝑥4 = 𝑝(𝑥4)

𝑝 𝑥1, 𝑥2, 𝑥3, 𝑥4 = 𝑝 𝑥1, 𝑥2, 𝑥3, 𝑥4 =

𝑝 𝑥1 𝑥2, 𝑥3 𝑝 𝑥2 𝑝 𝑥3 𝑥4 𝑝(𝑥4)

𝑥1

𝑥3 𝑥2

𝑥4

where:

𝑥1

𝑥3 𝑥2

𝑥4

1

𝑓(𝜓 𝑥1, 𝑥2, 𝑥3 𝜓 𝑥2 𝜓 𝑥3, 𝑥4 𝜓 𝑥4 )

(15)

Converting Directed Graphical Model to Factor Graph

15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM 15

𝑓 = 1

𝜓 𝑥1, 𝑥2, 𝑥3 = 𝑝(𝑥1|𝑥2, 𝑥3) 𝜓 𝑥2 = 𝑝(𝑥2)

𝜓 𝑥3, 𝑥4 = 𝑝(𝑥3|𝑥4) 𝜓 𝑥4 = 𝑝(𝑥4)

Factor Graph: 𝑝 𝑥1, 𝑥2, 𝑥3, 𝑥4 = 1

𝑓(𝜓 𝑥1, 𝑥2, 𝑥3 𝜓 𝑥2 𝜓 𝑥3, 𝑥4 𝜓 𝑥4 ) Directed GM: 𝑝 𝑥1, 𝑥2, 𝑥3, 𝑥4 = 𝑝 𝑥1 𝑥2, 𝑥3 𝑝 𝑥2 𝑝 𝑥3 𝑥4 𝑝(𝑥4)

where:

• Take each conditional probability and convert it to a factor (without conditioning), i.e.

replace conditional “symbols” with commas.

• Set normalization constant 𝑓 = 1

• Visualization: all parents to a certain node form a new factor (this step is called moralization)

• Comment the other direction is more complicated since factors 𝜓𝑠 have to be

converted correctly to individual probabilities, such that overall joint distribution stays the same.

Our example:

(16)

Roadmap for next two lectures

• Definition and Visualization of Factor Graphs

• Converting Directed Graphical Models to Factor Graphs

• Probabilistic Programing

• Queries and making decisions

• Binary-valued Factors graphs: Models and Optimization (ICM, Graph Cut)

• Multi-valued Factors Graphs: Models and Optimization

(Alpha Expansion)

(17)

Probabilistic programming Languages

15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM 17

See http://probabilistic-programming.org/wiki/Home

Bool coin = 0; % Normally C++:

Bool coin = Bernoulli(0.5); % Bernoulli is a distribution with 2 states. Probabilistic Program.

• A programming language for machine learning tasks. In particular for modelling, learning and making predictions in directed graphical models (DGM), undirected graphical models (UGM), and factor graphs (FG)

• Comment DGM and UGM are converted to Factor graphs. All operations are run on factor graphs.

• Basic Idea is to associate with each variables a distribution:

(18)

An Example: Two coins

• Random variables: coin1 (𝑥

1

) and coin2 (𝑥

2

),

event 𝑧 about the state of both variables

• We know: coin1 (𝑥

1

) and coin2 (𝑥

2

) are independent

• Each coin has equal probability to be head (1) or tail (0)

• New random variable 𝑧 which is true if and only if both coins are head: 𝑧 = 𝑥

1

& 𝑥

2

Example: You draw two fair coins.

What is the chance that both a head?

(19)

An Example: Two coins

15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM 19

𝑥1 𝑥2 𝑥𝑖 ∈ 0,1 𝑝 𝑥𝑖 = 1 = 𝑝 𝑥𝑖 = 0 = 0.5

𝑧

𝑃(𝑧 = 1|𝑥1, 𝑥2) 𝑃(𝑧 = 0|𝑥1, 𝑥2) 𝑉𝑎𝑙𝑢𝑒 𝑥1 𝑉𝑎𝑙𝑢𝑒 𝑥2

0 1 0 0

0 1 0 1

0 1 1 0

1 0 1 1

Compute Marginal:

𝑝 𝑧 =

𝑥1,𝑥2

𝑝 𝑧, 𝑥1, 𝑥2 =

𝑥1,𝑥2

𝑝 𝑧|𝑥1, 𝑥2 𝑝 𝑥1 𝑝(𝑥2) 𝑝 𝑧 = 1 = 1 ∗ 0.5 ∗ 0.5 = 0.25

𝑝 𝑧 = 0 = 1 ∗ 0.5 ∗ 0.5 + 1 ∗ 0.5 ∗ 0.5 + 1 ∗ 0.5 ∗ 0.5 = 0.75 Joint: 𝑝 𝑥1, 𝑥2, 𝑧 = 𝑝 𝑧|𝑥1, 𝑥2 𝑝 𝑥1 𝑝(𝑥2)

(20)

An Example: Two coins - Infer.net

Program:

Run it:

Add evidence to the program:

(21)

Roadmap for next two lectures

• Definition and Visualization of Factor Graphs

• Converting Directed Graphical Models to Factor Graphs

• Probabilistic Programing

• Queries and making decisions

• Binary-valued Factors graphs: Models and Optimization (ICM, Graph Cut)

• Multi-valued Factors Graphs: Models and Optimization (Alpha Expansion)

15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM 21

(22)

What to infer?

Same as in directed graphical models:

• MAP inference (Maximum a posterior state):

𝒙

= 𝑎𝑟𝑔𝑚𝑎𝑥

𝒙

𝑝 𝒙 = 𝑎𝑟𝑔𝑚𝑖𝑛

𝒙

𝐸 𝒙

• Probabilistic Inference, so-called marginal:

𝑝 𝑥

𝑖

= 𝑘 =

𝒙 | 𝑥𝑖=𝑘

𝑝(𝑥

1

, … 𝑥

𝑖

= 𝑘, … , 𝑥

𝑛

) This can be used to make a maximum marginal decision:

𝑥

𝑖

= 𝑎𝑟𝑔𝑚𝑎𝑥

𝑥𝑖

𝑝 𝑥

𝑖

(23)

MAP versus Marginal - visually

15/01/2015 23

Input Image Ground Truth Labeling

MAP solution 𝒙 (each pixel has 0,1 label)

Marginal 𝑝 𝑥𝑖

(each pixel has a probability between 0 and 1)

Intelligent Systems: Undirected Graphical Models

(24)

MAP versus Marginal – Making Decisions

Which solution 𝒙

would you choose?

Space of all solutions 𝒙 (sorted by pixel difference)

𝑝(𝒙|𝒛)

(25)

Reminder: How to make a decision

15/01/2015 25

Question: What solution 𝒙

should we give out?

Answer: Choose 𝒙

which minimizes the Bayesian risk:

𝒙

= 𝑎𝑟𝑔𝑚𝑖𝑛

𝒙

𝒙

𝑝 𝒙 𝒛 𝐶(𝒙, 𝒙

) Assume model 𝑝 𝒙 𝒛 is known

𝐶 𝒙

𝟏

, 𝒙

𝟐

is called the loss function

(or cost function) of comparing to results 𝒙

𝟏

, 𝒙

𝟐

Intelligent Systems: Undirected Graphical Models

(26)

MAP versus Marginal – Making Decisions

Space of all solutions 𝒙 (sorted by pixel difference)

𝑝(𝒙|𝒛)

MAP solution (red) takes globally optimal solution

𝒙 = 𝑎𝑟𝑔𝑚𝑎𝑥𝒙 𝑝 𝒙|𝒛 = 𝑎𝑟𝑔𝑚𝑖𝑛𝒙𝐸 𝒙, 𝒛

Which one is the MAP solution?

(27)

Reminder: The Cost Function behind MAP

15/01/2015 27

𝑥

= 𝑎𝑟𝑔𝑚𝑖𝑛

𝒙

𝒙

𝑝 𝒙 𝒛 𝐶(𝒙, 𝒙

)

= 𝑎𝑟𝑔𝑚𝑖𝑛

𝒙

1 − 𝑝 𝒙 = 𝒙

𝒛 = 𝑎𝑟𝑔𝑚𝑎𝑥

𝒙

𝑝(𝒙|𝒛)

The cost function for MAP: 𝐶 𝒙, 𝒙

= 0 if 𝒙 = 𝒙

, otherwise 1

The MAP estimation optimizes a “global 0-1 loss”

Intelligent Systems: Undirected Graphical Models

(28)

The Cost Function behind Max Marginal

Probabilistic Inference give marginal.

We can take the max-marginal solution:

𝑥

𝑖

= 𝑎𝑟𝑔𝑚𝑎𝑥

𝑥𝑖

𝑝 𝑥

𝑖

(where 𝑝 𝑥

𝑖

= 𝑘 =

𝒙 | 𝑥

𝑖=𝑘

𝑝(𝑥

1

, … 𝑥

𝑖

= 𝑘, … , 𝑥

𝑛

)

This represents the decision with minimum Bayesian Risk:

𝑥

= 𝑎𝑟𝑔𝑚𝑖𝑛

𝒙

𝒙

𝑝 𝑥 𝒛 𝐶(𝒙, 𝒙

)

where 𝐶 𝒙, 𝒙

=

𝑖

||𝑥

𝑖

− 𝑥

𝑖

||

2

For 𝑥

𝑖

∈ {0,1} this counts the number of differently labeled pixels

(proof not done)

𝐶 𝒙𝟏, 𝒙𝟐 = 10

𝒙𝟏 𝒙𝟐

Example:

𝐶 𝒙𝟐, 𝒙𝟑 = 10 𝒙𝟑 𝐶 𝒙𝟏, 𝒙𝟑 = 20

(29)

MAP versus Marginal – Making Decisions

15/01/2015 29

Space of all solutions 𝒙 (sorted by pixel difference)

𝑝(𝒙|𝒛)

Input image 𝒛

Intelligent Systems: Undirected Graphical Models

Which one is the Max-Marginal solution?

C=1 C=1 C=100

𝑥 is red then Risk is (sum only over 4 solutions): 0.1+0.1+100*0.2 = 20.2 𝑝 = 0.2 𝑝 = 0.1

𝑝 = 0.11 𝑝 = 0.1

𝑥 is blue then Risk is (sum only over 4 solutions): 11+10+10 = 31 Hence red is the max-marginal solution

(all numbers are arbitrary chosen)

(30)

This lecture: MAP Inference in order 2 models

𝑝 𝒙 =

1

𝑓

exp{ −𝐸(𝒙) } 𝐸 𝒙 =

𝑖

𝜃

𝑖

𝑥

𝑖

+

𝑖,𝑗

𝜃

𝑖𝑗

𝑥

𝑖

, 𝑥

𝑗

+

𝑖,𝑗,𝑘

𝜃

𝑖,𝑗,𝑘

𝑥

𝑖

, 𝑥

𝑗

, 𝑥

𝑘

+ … Gibbs distribution

Unary terms Pairwise terms Higher-order terms

• MAP inference: 𝒙

= 𝑎𝑟𝑔𝑚𝑎𝑥

𝒙

𝑝 𝒙 = 𝑎𝑟𝑔𝑚𝑖𝑛

𝒙

𝐸 𝑥

• Label space: binary 𝑥

𝑖

∈ 0,1 or multi-label 𝑥

𝑖

∈ 0, … , 𝐾

• We only look at energies with unary and pairwise factors

(31)

Roadmap for next two lectures

• Definition and Visualization of Factor Graphs

• Converting Directed Graphical Models to Factor Graphs

• Probabilistic Programing

• Queries and making decisions

• Binary-valued Factors graphs: Models and Optimization (ICM, Graph Cut)

• Multi-valued Factors Graphs: Models and Optimization (Alpha Expansion)

15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM 31

(32)

Image Segmentation

𝜃𝑖𝑗(𝑥𝑖, 𝑥𝑗) 𝑥𝑗

𝜃𝑖(𝑥𝑖) 𝑥𝑖

𝐸 𝒙 =

𝑖

𝜃𝑖 𝑥𝑖 +

𝑖,𝑗∈𝑁4

𝜃𝑖𝑗 𝑥𝑖, 𝑥𝑗

Binary Label: 𝑥𝑖 ∈ {0,1}

Input Image

with user brush strokes

(blue-background; red-foreground)

Desired binary output labeling

We will use the following energy:

Unary term Pairwise term

𝑁4 is the set of all neighboring pixels

(33)

Image Segmentation: Energy

15/01/2015 Computer Vision I: Introduction 33

Goal: formulate 𝐸(𝒙) such that

MAP solution: 𝒙

= 𝑎𝑟𝑔𝑚𝑖𝑛

𝑥

𝐸(𝒙)

𝐸 𝒙 = 0.01 𝐸 𝒙 = 0.05 𝐸 𝒙 = 0.05 𝐸 𝒙 = 10

(numbers may not represent real numbers)

(34)

Unary term

Red

Gr ee n

Red

Gr ee n

user labelled pixels

(cross foreground; dot background)

Gaussian Mixture Model fit

Foreground model is blue

Background model is red

(35)

Unary term

15/01/2015 Computer Vision I: Introduction 35

Optimum with unary terms only Dark means likely

background

Dark means likely foreground

𝜃

𝑖

(𝑥

𝑖

= 0) 𝜃

𝑖

(𝑥

𝑖

= 1)

New query image 𝑧𝑖

𝐸 𝒙 =

𝑖

𝜃𝑖 𝑥𝑖

𝜃

𝑖

𝑥

𝑖

= 0 =

− log 𝑃

𝑟𝑒𝑑

(𝑧

𝑖

|𝑥

𝑖

= 0) 𝜃

𝑖

𝑥

𝑖

= 1 =

− log 𝑃

𝑏𝑙𝑢𝑒

(𝑧

𝑖

|𝑥

𝑖

= 1)

𝒙 = 𝑎𝑟𝑔𝑚𝑖𝑛𝑥 𝐸(𝒙)

(36)

Pairwise term

Lowest energy

Intermediate

energy Very high

energy

This models the assumption that the object is spatially coherent

• We choose a so-called Ising Prior:

𝜃

𝑖𝑗

𝑥

𝑖

, 𝑥

𝑗

= |𝑥

𝑖

− 𝑥

𝑗

|

which gives an energy: 𝐸 𝒙 =

𝑖,𝑗∈𝑁

4

𝜃

𝑖𝑗

𝑥

𝑖

, 𝑥

𝑗 𝜃𝑖𝑗(𝑥𝑖, 𝑥𝑗)

𝑥𝑗

𝜃𝑖(𝑥𝑖) 𝑥𝑖

• Questions:

• What labelling has lowest energy?

• What labelling has highest energy?

Lowest energy

(37)

Adding unary and Pairwise term

15/01/2015 Computer Vision I: Introduction 37

𝜔 = 10 𝜔 = 0

𝜔 = 200 𝜔 = 40

Energy: 𝐸 𝒙 = 𝑖 𝜃𝑖 𝑥𝑖 + 𝜔 𝑖,𝑗∈𝑁4|𝑥𝑖 − 𝑥𝑗|

Question (done in exercise) Can the global optimum be computed with graph cut?

Please prove.

Question: What happens when 𝜔 increases further?

(38)

Is it the best we can do?

4-connected segmentation

Zoom-in on image

zoom zoom

(39)

Is it the best we can do?

15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM 39

0 0 0 0 0 0

0 1 1 1 1 0

0 1 1 1 1 0

0 1 1 1 1 0

0 1 1 1 1 0

0 0 0 0 0 0

0 0 0 0 0 0

0 0 1 1 0 0

0 1 1 1 1 0

0 1 1 1 1 0

0 0 1 1 0 0

0 0 0 0 0 0

Answers:

1) It depends on unary costs

2) The pairwise cost is the same in both cases (16 edges are cut of 𝑁

4

)

Given: 𝐸 𝒙 =

𝑖

𝜃

𝑖

𝑥

𝑖

+

𝑖,𝑗∈𝑁

4

|𝑥

𝑖

− 𝑥

𝑗

|

Which segmentation has higher energy?

(40)

From 4-connected to 8-connected Factor Graph

Larger connectivity can model true Euclidean length (also other metric possible)

Eucl.

Length of the paths:

4-con.

5.65 8 1

8-con.

6.28 6.28

5.08 6.75

4-connected 8-connected

(41)

Going to 8-connectivty

15/01/2015 41

4-connected Euclidean

8-connected Euclidean (MRF)

Zoom-in image

Intelligent Systems: Undirected Graphical Models

Is it the best we can do?

(42)

Adapt the pairwise term

Standard 4-connected Edge-dependent

𝐸 𝒙 =

𝑖

𝜃

𝑖

𝑥

𝑖

+

𝑖,𝑗∈𝑁4

𝜃

𝑖𝑗

𝑥

𝑖

, 𝑥

𝑗

𝛽 is a constant

𝜃𝑖𝑗 𝑥𝑖, 𝑥𝑗 = 𝑥𝑖 − 𝑥𝑗 (𝑒𝑥𝑝 −𝛽 𝑧𝑖 − 𝑧𝑗 2 ) 𝑒𝑥𝑝

𝛽𝑧𝑖𝑧𝑗2

Question (done in exercise) Can the global optimum be computed with graph cut?

Please prove.

What is this term doing?

𝑧𝑖 − 𝑧𝑗 2

(43)

A probabilistic view

15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM 43

𝐸 𝒙, 𝒛 =

𝑖

𝜃𝑖 𝑥𝑖, 𝑧𝑖 +

𝑖,𝑗∈𝑁4

𝜃𝑖𝑗 𝑥𝑖, 𝑥𝑗

1. Just look at the conditional distribution

𝜃𝑖 𝑥𝑖 = 1, 𝑧𝑖 = − log 𝑝𝑏𝑙𝑢𝑒(𝑧𝑖|𝑥𝑖 = 1) 𝜃𝑖 𝑥𝑖 = 0, 𝑧𝑖 = − log 𝑝𝑟𝑒𝑑(𝑧𝑖|𝑥𝑖 = 0)

𝜃𝑖𝑗 𝑥𝑖, 𝑥𝑗 = 𝑥𝑖 − 𝑥𝑗

2. Factorize the conditional distribution 𝑝 𝒙|𝒛 = 1

𝑝(𝒛) 𝑝 𝒛 𝒙 𝑝(𝒙) 𝑝(𝒛) is a constant factor

𝑝 𝒙 = 1 𝑓1

𝒊,𝒋∈𝑁4

exp{− 𝑥𝑖 − 𝑥𝑗 } 𝑝 𝒛|𝒙 = 1 𝑓2

𝒊

𝑝(𝑧𝑖|𝑥𝑖) =

Check yourself: 𝑝 𝒙|𝒛 = 1

𝑓𝑝 𝒛 𝒙 𝑝 𝒙 = 1

𝑓 exp{−𝐸(𝒙, 𝒛) } 1

𝑓2

𝒊

(𝑝𝑏𝑙𝑢𝑒 𝑧𝑖 𝑥𝑖 = 1 𝑥𝑖 + 𝑝𝑟𝑒𝑑 𝑧𝑖 𝑥𝑖 = 0 1 − 𝑥𝑖 )

and 𝑓 =

𝒙

exp{−𝐸(𝒙, 𝒛)}

Gibbs Distribution: 𝑝 𝒙|𝒛 =

1

𝑓

exp{ −𝐸(𝒙, 𝒛) }

exp − 0 = 1 exp − 1 = 0.36

(44)

ICM - Iterated conditional mode

Energy:

𝐸 𝒙 =

𝑖

𝜃

𝑖

𝑥

𝑖

+

𝑖,𝑗∈𝑁4

𝜃

𝑖𝑗

𝑥

𝑖

, 𝑥

𝑗

Idea: fix all variable but one and

optimize for this one.

Example:

Insight: The optimization has an implicit energy that depends only on a few factors

𝐸′ 𝑥1|𝒙\1 = 𝑖𝜃𝑖 𝑥𝑖 + 𝑖,𝑗∈𝑁4𝜃𝑖𝑗 𝑥𝑖, 𝑥𝑗

= 𝜃𝑖 𝑥1 + 𝜃12 𝑥1, 𝑥2 + 𝜃13 𝑥1, 𝑥3 + 𝜃14 𝑥1, 𝑥4 + 𝜃15 𝑥1, 𝑥5

𝑥1|𝒙\1 means all labels but 𝑥1 are fixed

𝑥2

𝑥1

𝑥3 𝑥4

𝑥5

(45)

ICM - Iterated conditional mode

15/01/2015 45

Energy:

Intelligent Systems: Undirected Graphical Models

𝐸 𝒙 =

𝑖

𝜃

𝑖

𝑥

𝑖

+

𝑖,𝑗∈𝑁4

𝜃

𝑖𝑗

𝑥

𝑖

, 𝑥

𝑗

𝑥2

𝑥1

𝑥3 𝑥4

𝑥5

Problems:

• Can get stuck in local minima

• Depends on initialization Algorithm:

1. Fix 𝒙 = 0

2. From 𝑖 = 1 … 𝑛

3. Update 𝑥

𝑖

= 𝑎𝑟𝑔𝑚𝑖𝑛

𝐸′ 𝑥𝑖|𝒙\i

4. Go to Step 2. if 𝐸(𝒙) has not change wrt previous Iteration

ICM Global optimum (with graph cut) 𝑥𝑖

(46)

ICM - parallelization

• The schedule is a more complex task in graphs which are not 4-connected Normal procedure:

Step 1 Step 2 Step 3 Step 4

Parallel procedure:

Step 1 Step 2 Step 3 Step 4

(47)

Roadmap for next two lectures

• Definition and Visualization of Factor Graphs

• Converting Directed Graphical Models to Factor Graphs

• Probabilistic Programing

• Queries and making decisions

• Binary-valued Factors graphs: Models and Optimization (ICM, Graph Cut)

• Multi-valued Factors Graphs: Models and Optimization (Alpha Expansion)

15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM 47

Referenzen

ÄHNLICHE DOKUMENTE

• Expert knowledge is most often available in causal form Conditional independence relations. • Careful: Generative models are

For terminal states found during training or by applying the trained network we perform a further Monte Carlo analysis in the space of order one coefficients a ij , b ij (which

2.6 Message-passing Algorithms for Approximate Inference It is the goal of the remainder of the survey to develop a general theoret- ical framework for understanding and

1.Read batch of B examples counting frequencies for every instance of state/transition features f k.. Create a thread for each class and example, B |Y|

Comparison to Previous Work To our knowledge, the best performance in previous work on disease linking has been obtained by Lee et al. While these results are slightly higher than

In Table 2 we report Micro F 1 and Macro F 1 measures of compared systems for 14 data sets. Kea [28] outperforms all systems on the AQUAINT and the DBpedia Spotlight data.. Our

Since, in this case, an estimate of the model error (which would be the prediction error if the calibration data were not available) is in fact available, a comparison of

Another approach, which is used here, is to estimate the simulation error of an identified linear transfer function with neural networks. For this purpose, a linear transfer function