Intelligent Systems:

(1)

Intelligent Systems:

Undirected Graphical models (Factor Graphs) (2 lectures)

Carsten Rother

15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM

(2)

Roadmap for next two lectures

• Definition and Visualization of Factor Graphs

• Converting Directed Graphical Models to Factor Graphs

• Probabilistic Programing

• Queries and making decisions

• Binary-valued Factors graphs: Models and Optimization (ICM, Graph Cut)

• Multi-valued Factors Graphs: Models and Optimization

(Alpha Expansion)

(3)

Reminder: Structured Models - when to use what representation?

15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM 3

• Directed graphical model: The unknown variables have different

“meanings”

Example: MaryCalls, JohnCalls(J), AlarmOn(A), burglarInHouse(B)

• Factor Graphs: the unknown variables have all the same “meaning”

Examples: Pixel in an image, nuclei in C-Elegans (worm)

• Undirected graphical model are used, instead of Factor graphs,

when we are interested in studying “conditional independency” (not relevant for our context)

(4)

Reminder: Machine Learning: Structured versus Unstructured Models

Structured Output Prediction:

• 𝑓/𝑝: Z → X (for example: X = R or X = N )

Important: the elements in X do not make independent decisions

𝑛 𝑛

𝑚 𝑛

Definition (not formal)

The Output consists of several parts, and not only the parts themselves contain information, but also the way in which the parts belong together.

Example: Image Labelling (Computer Vision)

Input: Image (Z^𝑚) Output: Labeling (K^𝑚)

K has a fixed vocabulary, e.g. K={Wall, Picture,

Person, Clutter,…}

Important: The labeling of neighboring pixels is

highly correlated

(5)

Reminder: Machine Learning: Structured versus Unstructured Models

15/01/2015 Intelligent Systems: Undirected Graphical Models 5

Structured Output Prediction:

• 𝑓/𝑝: Z → X (for example: X = R or X = N )

Important: the elements in X do not make independent decisions

𝑛 𝑛

𝑚 𝑛

Example: Text Processing

Output: X^𝑚 (Parse tree of a sentence) Input: Text (Z^𝑚)

“The boy went home”

Definition (not formal)

The Output consists of several parts, and not only the parts themselves contain

information, but also the way in which the parts belong together.

(6)

Factor Graph model - Example

• A Factor Graph defines a distribution as:

𝑓: Partition function so that distribution is normalized 𝐹: Factor

𝔽: Set of all factors

𝑁(𝐹):Neighbourhood of a factor

𝜓_𝐹: function (not distribution) depending on 𝒙_𝑁(𝐹) (𝜓_𝐶: 𝐾^|𝐶| → 𝑅 where 𝑥_𝑖 ∈ 𝐾)

𝑝 𝒙 = 1

𝑓 𝐹∈𝔽

𝜓_𝐹(𝒙_{𝑁 𝐹} ) where 𝑓 = _𝒙 _𝐹∈𝔽 𝜓^𝐹(𝒙_{𝑁 𝐹} )

𝑝 𝑥₁, 𝑥₂ = ¹

𝑓 𝜓₁ 𝑥₁, 𝑥₂ 𝜓₂ 𝑥₂

• Example

𝒙_{𝑁 1} = {𝑥₁, 𝑥₂}

𝑥_𝑖 ∈ 0,1 ; 𝐾 = 2

𝜓₁ 0,0 = 1; 𝜓₁ 0,1 = 0;𝜓₁ 1,0 = 1;𝜓₁ 1,1 = 2 𝒙_{𝑁 2} = {𝑥₂} 𝜓₂ 0 = 1; 𝜓₂ 1 = 0;

𝑓 = 1 ∗ 1 + 0 ∗ 0 + 1 ∗ 1 + 2 ∗ 0 = 2 Check yourself that: _𝒙𝑝 𝒙 = 1

(7)

Factor Graph model - Visualization

15/01/2015 Intelligent Systems: Undirected Graphical Models 7

𝑝 𝑥₁, 𝑥₂ = ¹

𝑓 𝜓₁ 𝑥₁, 𝑥₂ 𝜓₂ 𝑥₂

• Example

𝒙_{𝑁 1} = {𝑥₁, 𝑥₂}

𝑥_𝑖 ∈ 0,1 ; 𝐾 = 2

𝜓₁ 0,0 = 1; 𝜓₁ 0,1 = 0;𝜓₁ 1,0 = 1;𝜓₁ 1,1 = 2 𝒙_{𝑁 2} = {𝑥₂} 𝜓₂ 0 = 1; 𝜓₂ 1 = 0;

𝑓 = 1 ∗ 1 + 0 ∗ 0 + 1 ∗ 1 + 2 ∗ 0 = 2 Check yourself that: _𝒙𝑝 𝒙 = 1

visualizes a factor node visualizes a variable node

means that these variables are in one factor

𝑥₁ 𝑥₂

Visualization of:

𝑝 𝑥₁, 𝑥₂ = 1

𝑓 𝜓₁ 𝑥₁, 𝑥₂ 𝜓₂ 𝑥₂ 𝜓₁

𝜓₂

For visualization: utilize an undirected Graph 𝐺 = (𝑉, 𝐹, 𝐸), where 𝑉, 𝐹 are the set of nodes and 𝐸 the set of Edges

(8)

Factor Graph model - Visualization

• Example:

visualizes a factor node visualizes a variable node

means that these variables are in one factor

𝑝 𝑥₁, 𝑥₂, 𝑥₃, 𝑥₄, 𝑥₅ = ¹

𝑓 𝜓 𝑥₁, 𝑥₂, 𝑥₄ 𝜓 𝑥₂, 𝑥₃ 𝜓 𝑥₃, 𝑥₄ 𝜓 𝑥₄, 𝑥₅ 𝜓 𝑥₄ 𝜓^′𝑠 are specified in some way

𝑥₁ 𝑥₂

𝑥₃ 𝑥₄

𝑥₅

Visualization

(9)

Probabilities and Energies

15/01/2015 9

𝑝 𝒙 =

¹

𝑓 𝐹∈

𝔽 𝜓

^𝐹

(𝒙

_{𝑁 𝐹}

) =

¹

𝑓 𝐹∈

𝔽 exp{−𝜃

^𝐹

(𝒙

_{𝑁 𝐹}

)} =

1

𝑓

exp{ −

_𝐹∈

𝔽 𝜃

^𝐹

𝑥

_{𝑁 𝐹}

} =

¹

𝑓

exp{ −𝐸(𝒙) } The energy 𝐸 𝒙 is just a sum of factors:

E 𝒙 =

𝐹∈𝔽

𝜃_𝐹 𝑥_{𝑁 𝐹}

The most likely solution 𝑥

^∗

is reached by minimizing the energy:

𝑥

^∗

= 𝑎𝑟𝑔𝑚𝑎𝑥

_𝑥

𝑝 𝑥 𝑥

^∗

= 𝑎𝑟𝑔𝑚𝑖𝑛

_𝑥

𝐸(𝑥)

Note:

1) If 𝑥^∗ is minimizer of 𝑓(𝑥) then also of log 𝑓(𝑥) (note that 𝑥₁ ≤ 𝑥₂ means that log 𝑥₁ ≤ log 𝑥₂)

2) It is: log 𝑃 𝑥 = −log 𝑓 − 𝐸 𝒙 = constant − 𝐸(𝒙)

Intelligent Systems: Undirected Graphical Models

(10)

Names

• The Probability distribution: 𝑝 𝒙 =

¹

𝑓

exp{−𝐸(𝒙)}

𝐸 𝒙 =

𝐹∈

𝔽

𝜃

_𝐹

𝑥

_{𝑁 𝐹}

is a so-called Gibbs distribution with energy:

• We define the order of a Factor Graph as the arity (number of variables) of the largest factor. Example of an order 3 model:

arity 3 arity 2

𝐸 𝒙 = 𝜃 𝑥₁, 𝑥₂, 𝑥₄ + 𝜃 𝑥₂, 𝑥₃ + 𝜃 𝑥₃, 𝑥₄ + 𝜃 𝑥₅, 𝑥₄ + 𝜃(𝑥₄) arity 1

• A different name for factor graph / undirected graphical model is

Markov Random Field. This is an extension of Markov Chains to Fields.

The name Markov stands for the “Markov property” that means essentially that the order of a factor is small.

and 𝑓 =

_𝒙

exp{−𝐸(𝒙)}

(11)

Examples: Order

15/01/2015 11

4-connected;

pairwise MRF

Higher-order RF

𝐸(𝒙) = 𝜃

_𝑖𝑗

(𝑥

_𝑖

, 𝑥

_𝑗

)

𝑖, 𝑗Є𝑁₄

higher(8)-connected;

pairwise MRF

Order 2 Order 2 Order n

𝐸(𝒙) = 𝜃

_𝑖𝑗

(𝑥

_𝑖

, 𝑥

_𝑗

)

“Pairwise energy” “higher-order energy”

𝐸(𝒙) = 𝜃

_𝑖𝑗

(𝑥

_𝑖

, 𝑥

_𝑗

)

𝑖, 𝑗Є𝑁₈ 𝑖, 𝑗Є𝑁₄

+𝜃(𝑥₁, … , 𝑥_𝑛)

(12)

Roadmap for next two lectures

• Definition and Visualization of Factor Graphs

• Converting Directed Graphical Models to Factor Graphs

• Probabilistic Programing

• Queries and making decisions

• Binary-valued Factors graphs: Models and Optimization (ICM, Graph Cut)

• Multi-valued Factors Graphs: Models and Optimization

(Alpha Expansion)

(13)

Converting Directed Graphical Model to Factor Graph

𝑥₁ 𝑥₂

𝑥₃

𝑝 𝑥₁, 𝑥₂, 𝑥₃ = 𝑝 𝑥₃ 𝑥₂ 𝑝 𝑥₂ 𝑥₁ 𝑝(𝑥₁)

A simple case:

𝑥₁ 𝑥₂

𝑥₃

𝑝 𝑥₁, 𝑥₂, 𝑥₃ =

𝑓 = 1

𝜓 𝑥₃, 𝑥₂ = 𝑝 𝑥₃ 𝑥₂ 𝜓 𝑥₂, 𝑥₁ = 𝑝 𝑥₂ 𝑥₁ 𝜓 𝑥₁ = 𝑝(𝑥₁)

where:

1

𝑓 𝜓 𝑥₃, 𝑥₂ 𝜓(𝑥₂, 𝑥₁) 𝜓(𝑥₁)

(14)

Converting Directed Graphical Model to Factor Graph

A more complex case:

𝑓 = 1

𝜓 𝑥₁, 𝑥₂, 𝑥₃ = 𝑝(𝑥₁|𝑥₂, 𝑥₃) 𝜓 𝑥₂ = 𝑝(𝑥₂)

𝜓 𝑥₃, 𝑥₄ = 𝑝(𝑥₃|𝑥₄) 𝜓 𝑥₄ = 𝑝(𝑥₄)

𝑝 𝑥₁, 𝑥₂, 𝑥₃, 𝑥₄ = 𝑝 𝑥₁, 𝑥₂, 𝑥₃, 𝑥₄ =

𝑝 𝑥₁ 𝑥₂, 𝑥₃ 𝑝 𝑥₂ 𝑝 𝑥₃ 𝑥₄ 𝑝(𝑥₄)

𝑥₁

𝑥₃ 𝑥₂

𝑥₄

where:

𝑥₁

𝑥₃ 𝑥₂

𝑥₄

1

𝑓(𝜓 𝑥₁, 𝑥₂, 𝑥₃ 𝜓 𝑥₂ 𝜓 𝑥₃, 𝑥₄ 𝜓 𝑥₄ )

(15)

Converting Directed Graphical Model to Factor Graph

𝑓 = 1

𝜓 𝑥₁, 𝑥₂, 𝑥₃ = 𝑝(𝑥₁|𝑥₂, 𝑥₃) 𝜓 𝑥₂ = 𝑝(𝑥₂)

𝜓 𝑥₃, 𝑥₄ = 𝑝(𝑥₃|𝑥₄) 𝜓 𝑥₄ = 𝑝(𝑥₄)

Factor Graph: 𝑝 𝑥₁, 𝑥₂, 𝑥₃, 𝑥₄ = ¹

𝑓(𝜓 𝑥₁, 𝑥₂, 𝑥₃ 𝜓 𝑥₂ 𝜓 𝑥₃, 𝑥₄ 𝜓 𝑥₄ ) Directed GM: 𝑝 𝑥₁, 𝑥₂, 𝑥₃, 𝑥₄ = 𝑝 𝑥₁ 𝑥₂, 𝑥₃ 𝑝 𝑥₂ 𝑝 𝑥₃ 𝑥₄ 𝑝(𝑥₄)

where:

• Take each conditional probability and convert it to a factor (without conditioning), i.e.

replace conditional “symbols” with commas.

• Set normalization constant 𝑓 = 1

• Visualization: all parents to a certain node form a new factor (this step is called moralization)

• Comment the other direction is more complicated since factors 𝜓^′𝑠 have to be

converted correctly to individual probabilities, such that overall joint distribution stays the same.

Our example:

(16)

Roadmap for next two lectures

• Definition and Visualization of Factor Graphs

• Converting Directed Graphical Models to Factor Graphs

• Probabilistic Programing

• Queries and making decisions

• Binary-valued Factors graphs: Models and Optimization (ICM, Graph Cut)

• Multi-valued Factors Graphs: Models and Optimization

(Alpha Expansion)

(17)

Probabilistic programming Languages

See http://probabilistic-programming.org/wiki/Home

Bool coin = 0; % Normally C++:

Bool coin = Bernoulli(0.5); % Bernoulli is a distribution with 2 states. Probabilistic Program.

• A programming language for machine learning tasks. In particular for modelling, learning and making predictions in directed graphical models (DGM), undirected graphical models (UGM), and factor graphs (FG)

• Comment DGM and UGM are converted to Factor graphs. All operations are run on factor graphs.

• Basic Idea is to associate with each variables a distribution:

(18)

An Example: Two coins

• Random variables: coin1 (𝑥

₁

) and coin2 (𝑥

₂

),

event 𝑧 about the state of both variables

• We know: coin1 (𝑥

₁

) and coin2 (𝑥

₂

) are independent

• Each coin has equal probability to be head (1) or tail (0)

• New random variable 𝑧 which is true if and only if both coins are head: 𝑧 = 𝑥

₁

& 𝑥

₂

Example: You draw two fair coins.

What is the chance that both a head?

(19)

An Example: Two coins

𝑥₁ 𝑥₂ 𝑥_𝑖 ∈ 0,1 𝑝 𝑥_𝑖 = 1 = 𝑝 𝑥_𝑖 = 0 = 0.5

𝑧

𝑃(𝑧 = 1|𝑥₁, 𝑥₂) 𝑃(𝑧 = 0|𝑥₁, 𝑥₂) 𝑉𝑎𝑙𝑢𝑒 𝑥₁ 𝑉𝑎𝑙𝑢𝑒 𝑥₂

0 1 0 0

0 1 0 1

0 1 1 0

1 0 1 1

Compute Marginal:

𝑝 𝑧 =

𝑥₁,𝑥₂

𝑝 𝑧, 𝑥₁, 𝑥₂ =

𝑥₁,𝑥₂

𝑝 𝑧|𝑥₁, 𝑥₂ 𝑝 𝑥₁ 𝑝(𝑥₂) 𝑝 𝑧 = 1 = 1 ∗ 0.5 ∗ 0.5 = 0.25

𝑝 𝑧 = 0 = 1 ∗ 0.5 ∗ 0.5 + 1 ∗ 0.5 ∗ 0.5 + 1 ∗ 0.5 ∗ 0.5 = 0.75 Joint: 𝑝 𝑥₁, 𝑥₂, 𝑧 = 𝑝 𝑧|𝑥₁, 𝑥₂ 𝑝 𝑥₁ 𝑝(𝑥₂)

(20)

An Example: Two coins - Infer.net

Program:

Run it:

Add evidence to the program:

(21)

Roadmap for next two lectures

• Definition and Visualization of Factor Graphs

• Converting Directed Graphical Models to Factor Graphs

• Probabilistic Programing

• Queries and making decisions

• Binary-valued Factors graphs: Models and Optimization (ICM, Graph Cut)

• Multi-valued Factors Graphs: Models and Optimization (Alpha Expansion)

(22)

What to infer?

Same as in directed graphical models:

• MAP inference (Maximum a posterior state):

𝒙

^∗

= 𝑎𝑟𝑔𝑚𝑎𝑥

_𝒙

𝑝 𝒙 = 𝑎𝑟𝑔𝑚𝑖𝑛

_𝒙

𝐸 𝒙

• Probabilistic Inference, so-called marginal:

𝑝 𝑥

_𝑖

= 𝑘 =

𝒙 | 𝑥_𝑖=𝑘

𝑝(𝑥

₁

, … 𝑥

_𝑖

= 𝑘, … , 𝑥

_𝑛

) This can be used to make a maximum marginal decision:

𝑥

_𝑖^∗

= 𝑎𝑟𝑔𝑚𝑎𝑥

_𝑥_𝑖

𝑝 𝑥

_𝑖

(23)

MAP versus Marginal - visually

15/01/2015 23

Input Image Ground Truth Labeling

MAP solution 𝒙^∗ (each pixel has 0,1 label)

Marginal 𝑝 𝑥_𝑖

(each pixel has a probability between 0 and 1)

(24)

MAP versus Marginal – Making Decisions

Which solution 𝒙

^∗

would you choose?

Space of all solutions 𝒙 (sorted by pixel difference)

𝑝(𝒙|𝒛)

(25)

Reminder: How to make a decision

15/01/2015 25

Question: What solution 𝒙

^∗

should we give out?

Answer: Choose 𝒙

^∗

which minimizes the Bayesian risk:

𝒙

^∗

= 𝑎𝑟𝑔𝑚𝑖𝑛

_𝒙

𝒙

𝑝 𝒙 𝒛 𝐶(𝒙, 𝒙

^∗

) Assume model 𝑝 𝒙 𝒛 is known

𝐶 𝒙

_𝟏

, 𝒙

_𝟐

is called the loss function

(or cost function) of comparing to results 𝒙

_𝟏

, 𝒙

_𝟐

(26)

MAP versus Marginal – Making Decisions

Space of all solutions 𝒙 (sorted by pixel difference)

𝑝(𝒙|𝒛)

MAP solution (red) takes globally optimal solution

𝒙^∗ = 𝑎𝑟𝑔𝑚𝑎𝑥_𝒙 𝑝 𝒙|𝒛 = 𝑎𝑟𝑔𝑚𝑖𝑛_𝒙𝐸 𝒙, 𝒛

Which one is the MAP solution?

(27)

Reminder: The Cost Function behind MAP

15/01/2015 27

𝑥

^∗

= 𝑎𝑟𝑔𝑚𝑖𝑛

_𝒙

𝒙

𝑝 𝒙 𝒛 𝐶(𝒙, 𝒙

^∗

)

= 𝑎𝑟𝑔𝑚𝑖𝑛

_𝒙

1 − 𝑝 𝒙 = 𝒙

^∗

𝒛 = 𝑎𝑟𝑔𝑚𝑎𝑥

_𝒙

𝑝(𝒙|𝒛)

The cost function for MAP: 𝐶 𝒙, 𝒙

^∗

= 0 if 𝒙 = 𝒙

^∗

, otherwise 1

The MAP estimation optimizes a “global 0-1 loss”

(28)

The Cost Function behind Max Marginal

Probabilistic Inference give marginal.

We can take the max-marginal solution:

𝑥

_𝑖^∗

= 𝑎𝑟𝑔𝑚𝑎𝑥

_𝑥_𝑖

𝑝 𝑥

_𝑖

(where 𝑝 𝑥

_𝑖

= 𝑘 =

_{𝒙 | 𝑥}

𝑖=𝑘

𝑝(𝑥

₁

, … 𝑥

_𝑖

= 𝑘, … , 𝑥

_𝑛

)

This represents the decision with minimum Bayesian Risk:

𝑥

^∗

= 𝑎𝑟𝑔𝑚𝑖𝑛

_𝒙

𝒙

𝑝 𝑥 𝒛 𝐶(𝒙, 𝒙

^∗

)

where 𝐶 𝒙, 𝒙

^∗

=

_𝑖

||𝑥

_𝑖

− 𝑥

_𝑖^∗

||

₂

For 𝑥

_𝑖

∈ {0,1} this counts the number of differently labeled pixels

(proof not done)

𝐶 𝒙_𝟏, 𝒙_𝟐 = 10

𝒙_𝟏 𝒙_𝟐

Example:

𝐶 𝒙_𝟐, 𝒙_𝟑 = 10 𝒙_𝟑 𝐶 𝒙_𝟏, 𝒙_𝟑 = 20

(29)

MAP versus Marginal – Making Decisions

15/01/2015 29

Space of all solutions 𝒙 (sorted by pixel difference)

𝑝(𝒙|𝒛)

Input image 𝒛

Which one is the Max-Marginal solution?

C=1 C=1 C=100

𝑥^∗ is red then Risk is (sum only over 4 solutions): 0.1+0.1+100*0.2 = 20.2 𝑝 = 0.2 𝑝 = 0.1

𝑝 = 0.11 𝑝 = 0.1

𝑥^∗ is blue then Risk is (sum only over 4 solutions): 11+10+10 = 31 Hence red is the max-marginal solution

(all numbers are arbitrary chosen)

(30)

This lecture: MAP Inference in order 2 models

𝑝 𝒙 =

¹

𝑓

exp{ −𝐸(𝒙) } 𝐸 𝒙 =

𝑖

𝜃

_𝑖

𝑥

_𝑖

+

𝑖,𝑗

𝜃

_𝑖𝑗

𝑥

_𝑖

, 𝑥

_𝑗

+

𝑖,𝑗,𝑘

𝜃

_{𝑖,𝑗,𝑘}

𝑥

_𝑖

, 𝑥

_𝑗

, 𝑥

_𝑘

+ … Gibbs distribution

Unary terms Pairwise terms Higher-order terms

• MAP inference: 𝒙

^∗

= 𝑎𝑟𝑔𝑚𝑎𝑥

_𝒙

𝑝 𝒙 = 𝑎𝑟𝑔𝑚𝑖𝑛

_𝒙

𝐸 𝑥

• Label space: binary 𝑥

_𝑖

∈ 0,1 or multi-label 𝑥

_𝑖

∈ 0, … , 𝐾

• We only look at energies with unary and pairwise factors

(31)

Roadmap for next two lectures

• Definition and Visualization of Factor Graphs

• Converting Directed Graphical Models to Factor Graphs

• Probabilistic Programing

• Queries and making decisions

• Binary-valued Factors graphs: Models and Optimization (ICM, Graph Cut)

• Multi-valued Factors Graphs: Models and Optimization (Alpha Expansion)

(32)

Image Segmentation

𝜃_𝑖𝑗(𝑥_𝑖, 𝑥_𝑗) 𝑥_𝑗

𝜃_𝑖(𝑥_𝑖) 𝑥_𝑖

𝐸 𝒙 =

𝑖

𝜃_𝑖 𝑥_𝑖 +

𝑖,𝑗∈𝑁₄

𝜃_𝑖𝑗 𝑥_𝑖, 𝑥_𝑗

Binary Label: 𝑥_𝑖 ∈ {0,1}

Input Image

with user brush strokes

(blue-background; red-foreground)

Desired binary output labeling

We will use the following energy:

Unary term Pairwise term

𝑁₄ is the set of all neighboring pixels

(33)

Image Segmentation: Energy

15/01/2015 Computer Vision I: Introduction 33

Goal: formulate 𝐸(𝒙) such that

MAP solution: 𝒙

^∗

= 𝑎𝑟𝑔𝑚𝑖𝑛

_𝑥

𝐸(𝒙)

𝐸 𝒙 = 0.01 𝐸 𝒙 = 0.05 𝐸 𝒙 = 0.05 𝐸 𝒙 = 10

(numbers may not represent real numbers)

(34)

Unary term

Red

Gr ee n

Red

Gr ee n

user labelled pixels

(cross foreground; dot background)

Gaussian Mixture Model fit

Foreground model is blue

Background model is red

(35)

Unary term

Optimum with unary terms only Dark means likely

background

Dark means likely foreground

𝜃

_𝑖

(𝑥

_𝑖

= 0) 𝜃

_𝑖

(𝑥

_𝑖

= 1)

New query image 𝑧_𝑖

𝐸 𝒙 =

𝑖

𝜃_𝑖 𝑥_𝑖

𝜃

_𝑖

𝑥

_𝑖

= 0 =

− log 𝑃

^𝑟𝑒𝑑

(𝑧

_𝑖

|𝑥

_𝑖

= 0) 𝜃

_𝑖

𝑥

_𝑖

= 1 =

− log 𝑃

^{𝑏𝑙𝑢𝑒}

(𝑧

_𝑖

|𝑥

_𝑖

= 1)

𝒙^∗ = 𝑎𝑟𝑔𝑚𝑖𝑛_𝑥 𝐸(𝒙)

(36)

Pairwise term

Lowest energy

Intermediate

energy Very high

energy

This models the assumption that the object is spatially coherent

• We choose a so-called Ising Prior:

𝜃

_𝑖𝑗

𝑥

_𝑖

, 𝑥

_𝑗

= |𝑥

_𝑖

− 𝑥

_𝑗

|

which gives an energy: 𝐸 𝒙 =

_{𝑖,𝑗∈𝑁}

4

𝜃

_𝑖𝑗

𝑥

_𝑖

, 𝑥

_𝑗 ^𝜃^𝑖𝑗^(𝑥^𝑖^{, 𝑥}^𝑗⁾

𝑥_𝑗

𝜃_𝑖(𝑥_𝑖) 𝑥_𝑖

• Questions:

• What labelling has lowest energy?

• What labelling has highest energy?

Lowest energy

(37)

Adding unary and Pairwise term

𝜔 = 10 𝜔 = 0

𝜔 = 200 𝜔 = 40

Energy: 𝐸 𝒙 = _𝑖 𝜃_𝑖 𝑥_𝑖 + 𝜔 _{𝑖,𝑗∈𝑁}₄|𝑥_𝑖 − 𝑥_𝑗|

Question (done in exercise) Can the global optimum be computed with graph cut?

Please prove.

Question: What happens when 𝜔 increases further?

(38)

Is it the best we can do?

4-connected segmentation

Zoom-in on image

zoom zoom

(39)

Is it the best we can do?

0 0 0 0 0 0

0 1 1 1 1 0

0 0 0 0 0 0

0 0 1 1 0 0

0 1 1 1 1 0

0 0 1 1 0 0

0 0 0 0 0 0

Answers:

1) It depends on unary costs

2) The pairwise cost is the same in both cases (16 edges are cut of 𝑁

₄

)

Given: 𝐸 𝒙 =

_𝑖

𝜃

_𝑖

𝑥

_𝑖

+

_{𝑖,𝑗∈𝑁}

4

|𝑥

_𝑖

− 𝑥

_𝑗

|

Which segmentation has higher energy?

(40)

From 4-connected to 8-connected Factor Graph

Larger connectivity can model true Euclidean length (also other metric possible)

Eucl.

Length of the paths:

4-con.

5.65 8 1

8-con.

6.28 6.28

5.08 6.75

4-connected 8-connected

(41)

Going to 8-connectivty

15/01/2015 41

4-connected Euclidean

8-connected Euclidean (MRF)

Zoom-in image

Is it the best we can do?

(42)

Adapt the pairwise term

Standard 4-connected Edge-dependent

𝐸 𝒙 =

𝑖

𝜃

_𝑖

𝑥

_𝑖

+

𝑖,𝑗∈𝑁₄

𝜃

_𝑖𝑗

𝑥

_𝑖

, 𝑥

_𝑗

𝛽 is a constant

𝜃_𝑖𝑗 𝑥_𝑖, 𝑥_𝑗 = 𝑥_𝑖 − 𝑥_𝑗 (𝑒𝑥𝑝 −𝛽 𝑧_𝑖 − 𝑧_𝑗 ² ) ^𝑒𝑥𝑝

−𝛽𝑧𝑖−𝑧𝑗2

Question (done in exercise) Can the global optimum be computed with graph cut?

Please prove.

What is this term doing?

𝑧_𝑖 − 𝑧_𝑗 ²

(43)

A probabilistic view

𝐸 𝒙, 𝒛 =

𝑖

𝜃_𝑖 𝑥_𝑖, 𝑧_𝑖 +

𝑖,𝑗∈𝑁₄

𝜃_𝑖𝑗 𝑥_𝑖, 𝑥_𝑗

1. Just look at the conditional distribution

𝜃_𝑖 𝑥_𝑖 = 1, 𝑧_𝑖 = − log 𝑝^{𝑏𝑙𝑢𝑒}(𝑧_𝑖|𝑥_𝑖 = 1) 𝜃_𝑖 𝑥_𝑖 = 0, 𝑧_𝑖 = − log 𝑝^𝑟𝑒𝑑(𝑧_𝑖|𝑥_𝑖 = 0)

𝜃_𝑖𝑗 𝑥_𝑖, 𝑥_𝑗 = 𝑥_𝑖 − 𝑥_𝑗

2. Factorize the conditional distribution 𝑝 𝒙|𝒛 = 1

𝑝(𝒛) 𝑝 𝒛 𝒙 𝑝(𝒙) 𝑝(𝒛) is a constant factor

𝑝 𝒙 = 1 𝑓₁

𝒊,𝒋∈𝑁₄

exp{− 𝑥_𝑖 − 𝑥_𝑗 } 𝑝 𝒛|𝒙 = 1 𝑓₂

𝒊

𝑝(𝑧_𝑖|𝑥_𝑖) =

Check yourself: 𝑝 𝒙|𝒛 = ¹

𝑓𝑝 𝒛 𝒙 𝑝 𝒙 = ¹

𝑓 exp{−𝐸(𝒙, 𝒛) } 1

𝑓₂

𝒊

(𝑝^{𝑏𝑙𝑢𝑒} 𝑧_𝑖 𝑥_𝑖 = 1 𝑥_𝑖 + 𝑝^𝑟𝑒𝑑 𝑧_𝑖 𝑥_𝑖 = 0 1 − 𝑥_𝑖 )

and 𝑓 =

_𝒙

exp{−𝐸(𝒙, 𝒛)}

Gibbs Distribution: 𝑝 𝒙|𝒛 =

¹

𝑓

exp{ −𝐸(𝒙, 𝒛) }

exp − 0 = 1 exp − 1 = 0.36

(44)

ICM - Iterated conditional mode

Energy:

𝐸 𝒙 =

𝑖

𝜃

_𝑖

𝑥

_𝑖

+

𝑖,𝑗∈𝑁₄

𝜃

_𝑖𝑗

𝑥

_𝑖

, 𝑥

_𝑗

Idea: fix all variable but one and

optimize for this one.

Example:

Insight: The optimization has an implicit energy that depends only on a few factors

𝐸′ 𝑥₁|𝒙_\1 = _𝑖𝜃_𝑖 𝑥_𝑖 + _{𝑖,𝑗∈𝑁}₄𝜃_𝑖𝑗 𝑥_𝑖, 𝑥_𝑗

= 𝜃_𝑖 𝑥₁ + 𝜃₁₂ 𝑥₁, 𝑥₂ + 𝜃₁₃ 𝑥₁, 𝑥₃ + 𝜃₁₄ 𝑥₁, 𝑥₄ + 𝜃₁₅ 𝑥₁, 𝑥₅

𝑥₁|𝒙_\1 means all labels but 𝑥₁ are fixed

𝑥₂

𝑥₁

𝑥₃ 𝑥₄

𝑥₅

(45)

ICM - Iterated conditional mode

15/01/2015 45

Energy:

𝐸 𝒙 =

𝑖

𝜃

_𝑖

𝑥

_𝑖

+

𝑖,𝑗∈𝑁₄

𝜃

_𝑖𝑗

𝑥

_𝑖

, 𝑥

_𝑗

𝑥₂

𝑥₁

𝑥₃ 𝑥₄

𝑥₅

Problems:

• Can get stuck in local minima

• Depends on initialization Algorithm:

1. Fix 𝒙 = 0

2. From 𝑖 = 1 … 𝑛

3. Update 𝑥

_𝑖

= 𝑎𝑟𝑔𝑚𝑖𝑛

𝐸′ 𝑥_𝑖|𝒙_\i

4. Go to Step 2. if 𝐸(𝒙) has not change wrt previous Iteration

ICM Global optimum (with graph cut) 𝑥_𝑖

(46)

ICM - parallelization

• The schedule is a more complex task in graphs which are not 4-connected Normal procedure: