Intelligent Systems:
Undirected Graphical models (Factor Graphs) (2 lectures)
Carsten Rother
15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM
Roadmap for next two lectures
• Definition and Visualization of Factor Graphs
• Converting Directed Graphical Models to Factor Graphs
• Probabilistic Programing
• Queries and making decisions
• Binary-valued Factors graphs: Models and Optimization (ICM, Graph Cut)
• Multi-valued Factors Graphs: Models and Optimization
(Alpha Expansion)
Reminder: Structured Models - when to use what representation?
15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM 3
• Directed graphical model: The unknown variables have different
“meanings”
Example: MaryCalls, JohnCalls(J), AlarmOn(A), burglarInHouse(B)
• Factor Graphs: the unknown variables have all the same “meaning”
Examples: Pixel in an image, nuclei in C-Elegans (worm)
• Undirected graphical model are used, instead of Factor graphs,
when we are interested in studying “conditional independency” (not relevant for our context)
Reminder: Machine Learning: Structured versus Unstructured Models
Structured Output Prediction:
• 𝑓/𝑝: Z → X (for example: X = R or X = N )
Important: the elements in X do not make independent decisions
𝑛 𝑛
𝑚 𝑛
Definition (not formal)
The Output consists of several parts, and not only the parts themselves contain information, but also the way in which the parts belong together.
Example: Image Labelling (Computer Vision)
Input: Image (Z𝑚) Output: Labeling (K𝑚)
K has a fixed vocabulary, e.g. K={Wall, Picture,
Person, Clutter,…}
Important: The labeling of neighboring pixels is
highly correlated
Reminder: Machine Learning: Structured versus Unstructured Models
15/01/2015 Intelligent Systems: Undirected Graphical Models 5
Structured Output Prediction:
• 𝑓/𝑝: Z → X (for example: X = R or X = N )
Important: the elements in X do not make independent decisions
𝑛 𝑛
𝑚 𝑛
Example: Text Processing
Output: X𝑚 (Parse tree of a sentence) Input: Text (Z𝑚)
“The boy went home”
Definition (not formal)
The Output consists of several parts, and not only the parts themselves contain
information, but also the way in which the parts belong together.
Factor Graph model - Example
• A Factor Graph defines a distribution as:
𝑓: Partition function so that distribution is normalized 𝐹: Factor
𝔽: Set of all factors
𝑁(𝐹):Neighbourhood of a factor
𝜓𝐹: function (not distribution) depending on 𝒙𝑁(𝐹) (𝜓𝐶: 𝐾|𝐶| → 𝑅 where 𝑥𝑖 ∈ 𝐾)
𝑝 𝒙 = 1
𝑓 𝐹∈𝔽
𝜓𝐹(𝒙𝑁 𝐹 ) where 𝑓 = 𝒙 𝐹∈𝔽 𝜓𝐹(𝒙𝑁 𝐹 )
𝑝 𝑥1, 𝑥2 = 1
𝑓 𝜓1 𝑥1, 𝑥2 𝜓2 𝑥2
• Example
𝒙𝑁 1 = {𝑥1, 𝑥2}
𝑥𝑖 ∈ 0,1 ; 𝐾 = 2
𝜓1 0,0 = 1; 𝜓1 0,1 = 0;𝜓1 1,0 = 1;𝜓1 1,1 = 2 𝒙𝑁 2 = {𝑥2} 𝜓2 0 = 1; 𝜓2 1 = 0;
𝑓 = 1 ∗ 1 + 0 ∗ 0 + 1 ∗ 1 + 2 ∗ 0 = 2 Check yourself that: 𝒙𝑝 𝒙 = 1
Factor Graph model - Visualization
15/01/2015 Intelligent Systems: Undirected Graphical Models 7
𝑝 𝑥1, 𝑥2 = 1
𝑓 𝜓1 𝑥1, 𝑥2 𝜓2 𝑥2
• Example
𝒙𝑁 1 = {𝑥1, 𝑥2}
𝑥𝑖 ∈ 0,1 ; 𝐾 = 2
𝜓1 0,0 = 1; 𝜓1 0,1 = 0;𝜓1 1,0 = 1;𝜓1 1,1 = 2 𝒙𝑁 2 = {𝑥2} 𝜓2 0 = 1; 𝜓2 1 = 0;
𝑓 = 1 ∗ 1 + 0 ∗ 0 + 1 ∗ 1 + 2 ∗ 0 = 2 Check yourself that: 𝒙𝑝 𝒙 = 1
visualizes a factor node visualizes a variable node
means that these variables are in one factor
𝑥1 𝑥2
Visualization of:
𝑝 𝑥1, 𝑥2 = 1
𝑓 𝜓1 𝑥1, 𝑥2 𝜓2 𝑥2 𝜓1
𝜓2
For visualization: utilize an undirected Graph 𝐺 = (𝑉, 𝐹, 𝐸), where 𝑉, 𝐹 are the set of nodes and 𝐸 the set of Edges
Factor Graph model - Visualization
• Example:
visualizes a factor node visualizes a variable node
means that these variables are in one factor
𝑝 𝑥1, 𝑥2, 𝑥3, 𝑥4, 𝑥5 = 1
𝑓 𝜓 𝑥1, 𝑥2, 𝑥4 𝜓 𝑥2, 𝑥3 𝜓 𝑥3, 𝑥4 𝜓 𝑥4, 𝑥5 𝜓 𝑥4 𝜓′𝑠 are specified in some way
𝑥1 𝑥2
𝑥3 𝑥4
𝑥5
Visualization
Probabilities and Energies
15/01/2015 9
𝑝 𝒙 =
1𝑓 𝐹∈
𝔽 𝜓
𝐹(𝒙
𝑁 𝐹) =
1𝑓 𝐹∈
𝔽 exp{−𝜃
𝐹(𝒙
𝑁 𝐹)} =
1
𝑓
exp{ −
𝐹∈𝔽 𝜃
𝐹𝑥
𝑁 𝐹} =
1𝑓
exp{ −𝐸(𝒙) } The energy 𝐸 𝒙 is just a sum of factors:
E 𝒙 =
𝐹∈𝔽
𝜃𝐹 𝑥𝑁 𝐹
The most likely solution 𝑥
∗is reached by minimizing the energy:
𝑥
∗= 𝑎𝑟𝑔𝑚𝑎𝑥
𝑥𝑝 𝑥 𝑥
∗= 𝑎𝑟𝑔𝑚𝑖𝑛
𝑥𝐸(𝑥)
Note:
1) If 𝑥∗ is minimizer of 𝑓(𝑥) then also of log 𝑓(𝑥) (note that 𝑥1 ≤ 𝑥2 means that log 𝑥1 ≤ log 𝑥2)
2) It is: log 𝑃 𝑥 = −log 𝑓 − 𝐸 𝒙 = constant − 𝐸(𝒙)
Intelligent Systems: Undirected Graphical Models
Names
• The Probability distribution: 𝑝 𝒙 =
1𝑓
exp{−𝐸(𝒙)}
𝐸 𝒙 =
𝐹∈
𝔽
𝜃
𝐹𝑥
𝑁 𝐹is a so-called Gibbs distribution with energy:
• We define the order of a Factor Graph as the arity (number of variables) of the largest factor. Example of an order 3 model:
arity 3 arity 2
𝐸 𝒙 = 𝜃 𝑥1, 𝑥2, 𝑥4 + 𝜃 𝑥2, 𝑥3 + 𝜃 𝑥3, 𝑥4 + 𝜃 𝑥5, 𝑥4 + 𝜃(𝑥4) arity 1
• A different name for factor graph / undirected graphical model is
Markov Random Field. This is an extension of Markov Chains to Fields.
The name Markov stands for the “Markov property” that means essentially that the order of a factor is small.
and 𝑓 =
𝒙exp{−𝐸(𝒙)}
Examples: Order
15/01/2015 11
4-connected;
pairwise MRF
Higher-order RF
𝐸(𝒙) = 𝜃
𝑖𝑗(𝑥
𝑖, 𝑥
𝑗)
𝑖, 𝑗Є𝑁4
higher(8)-connected;
pairwise MRF
Order 2 Order 2 Order n
𝐸(𝒙) = 𝜃
𝑖𝑗(𝑥
𝑖, 𝑥
𝑗)
“Pairwise energy” “higher-order energy”
𝐸(𝒙) = 𝜃
𝑖𝑗(𝑥
𝑖, 𝑥
𝑗)
𝑖, 𝑗Є𝑁8 𝑖, 𝑗Є𝑁4
+𝜃(𝑥1, … , 𝑥𝑛)
Intelligent Systems: Undirected Graphical Models
Roadmap for next two lectures
• Definition and Visualization of Factor Graphs
• Converting Directed Graphical Models to Factor Graphs
• Probabilistic Programing
• Queries and making decisions
• Binary-valued Factors graphs: Models and Optimization (ICM, Graph Cut)
• Multi-valued Factors Graphs: Models and Optimization
(Alpha Expansion)
Converting Directed Graphical Model to Factor Graph
15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM 13
𝑥1 𝑥2
𝑥3
𝑝 𝑥1, 𝑥2, 𝑥3 = 𝑝 𝑥3 𝑥2 𝑝 𝑥2 𝑥1 𝑝(𝑥1)
A simple case:
𝑥1 𝑥2
𝑥3
𝑝 𝑥1, 𝑥2, 𝑥3 =
𝑓 = 1
𝜓 𝑥3, 𝑥2 = 𝑝 𝑥3 𝑥2 𝜓 𝑥2, 𝑥1 = 𝑝 𝑥2 𝑥1 𝜓 𝑥1 = 𝑝(𝑥1)
where:
1
𝑓 𝜓 𝑥3, 𝑥2 𝜓(𝑥2, 𝑥1) 𝜓(𝑥1)
Converting Directed Graphical Model to Factor Graph
A more complex case:
𝑓 = 1
𝜓 𝑥1, 𝑥2, 𝑥3 = 𝑝(𝑥1|𝑥2, 𝑥3) 𝜓 𝑥2 = 𝑝(𝑥2)
𝜓 𝑥3, 𝑥4 = 𝑝(𝑥3|𝑥4) 𝜓 𝑥4 = 𝑝(𝑥4)
𝑝 𝑥1, 𝑥2, 𝑥3, 𝑥4 = 𝑝 𝑥1, 𝑥2, 𝑥3, 𝑥4 =
𝑝 𝑥1 𝑥2, 𝑥3 𝑝 𝑥2 𝑝 𝑥3 𝑥4 𝑝(𝑥4)
𝑥1
𝑥3 𝑥2
𝑥4
where:
𝑥1
𝑥3 𝑥2
𝑥4
1
𝑓(𝜓 𝑥1, 𝑥2, 𝑥3 𝜓 𝑥2 𝜓 𝑥3, 𝑥4 𝜓 𝑥4 )
Converting Directed Graphical Model to Factor Graph
15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM 15
𝑓 = 1
𝜓 𝑥1, 𝑥2, 𝑥3 = 𝑝(𝑥1|𝑥2, 𝑥3) 𝜓 𝑥2 = 𝑝(𝑥2)
𝜓 𝑥3, 𝑥4 = 𝑝(𝑥3|𝑥4) 𝜓 𝑥4 = 𝑝(𝑥4)
Factor Graph: 𝑝 𝑥1, 𝑥2, 𝑥3, 𝑥4 = 1
𝑓(𝜓 𝑥1, 𝑥2, 𝑥3 𝜓 𝑥2 𝜓 𝑥3, 𝑥4 𝜓 𝑥4 ) Directed GM: 𝑝 𝑥1, 𝑥2, 𝑥3, 𝑥4 = 𝑝 𝑥1 𝑥2, 𝑥3 𝑝 𝑥2 𝑝 𝑥3 𝑥4 𝑝(𝑥4)
where:
• Take each conditional probability and convert it to a factor (without conditioning), i.e.
replace conditional “symbols” with commas.
• Set normalization constant 𝑓 = 1
• Visualization: all parents to a certain node form a new factor (this step is called moralization)
• Comment the other direction is more complicated since factors 𝜓′𝑠 have to be
converted correctly to individual probabilities, such that overall joint distribution stays the same.
Our example:
Roadmap for next two lectures
• Definition and Visualization of Factor Graphs
• Converting Directed Graphical Models to Factor Graphs
• Probabilistic Programing
• Queries and making decisions
• Binary-valued Factors graphs: Models and Optimization (ICM, Graph Cut)
• Multi-valued Factors Graphs: Models and Optimization
(Alpha Expansion)
Probabilistic programming Languages
15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM 17
See http://probabilistic-programming.org/wiki/Home
Bool coin = 0; % Normally C++:
Bool coin = Bernoulli(0.5); % Bernoulli is a distribution with 2 states. Probabilistic Program.
• A programming language for machine learning tasks. In particular for modelling, learning and making predictions in directed graphical models (DGM), undirected graphical models (UGM), and factor graphs (FG)
• Comment DGM and UGM are converted to Factor graphs. All operations are run on factor graphs.
• Basic Idea is to associate with each variables a distribution:
An Example: Two coins
• Random variables: coin1 (𝑥
1) and coin2 (𝑥
2),
event 𝑧 about the state of both variables
• We know: coin1 (𝑥
1) and coin2 (𝑥
2) are independent
• Each coin has equal probability to be head (1) or tail (0)
• New random variable 𝑧 which is true if and only if both coins are head: 𝑧 = 𝑥
1& 𝑥
2Example: You draw two fair coins.
What is the chance that both a head?
An Example: Two coins
15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM 19
𝑥1 𝑥2 𝑥𝑖 ∈ 0,1 𝑝 𝑥𝑖 = 1 = 𝑝 𝑥𝑖 = 0 = 0.5
𝑧
𝑃(𝑧 = 1|𝑥1, 𝑥2) 𝑃(𝑧 = 0|𝑥1, 𝑥2) 𝑉𝑎𝑙𝑢𝑒 𝑥1 𝑉𝑎𝑙𝑢𝑒 𝑥2
0 1 0 0
0 1 0 1
0 1 1 0
1 0 1 1
Compute Marginal:
𝑝 𝑧 =
𝑥1,𝑥2
𝑝 𝑧, 𝑥1, 𝑥2 =
𝑥1,𝑥2
𝑝 𝑧|𝑥1, 𝑥2 𝑝 𝑥1 𝑝(𝑥2) 𝑝 𝑧 = 1 = 1 ∗ 0.5 ∗ 0.5 = 0.25
𝑝 𝑧 = 0 = 1 ∗ 0.5 ∗ 0.5 + 1 ∗ 0.5 ∗ 0.5 + 1 ∗ 0.5 ∗ 0.5 = 0.75 Joint: 𝑝 𝑥1, 𝑥2, 𝑧 = 𝑝 𝑧|𝑥1, 𝑥2 𝑝 𝑥1 𝑝(𝑥2)
An Example: Two coins - Infer.net
Program:
Run it:
Add evidence to the program:
Roadmap for next two lectures
• Definition and Visualization of Factor Graphs
• Converting Directed Graphical Models to Factor Graphs
• Probabilistic Programing
• Queries and making decisions
• Binary-valued Factors graphs: Models and Optimization (ICM, Graph Cut)
• Multi-valued Factors Graphs: Models and Optimization (Alpha Expansion)
15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM 21
What to infer?
Same as in directed graphical models:
• MAP inference (Maximum a posterior state):
𝒙
∗= 𝑎𝑟𝑔𝑚𝑎𝑥
𝒙𝑝 𝒙 = 𝑎𝑟𝑔𝑚𝑖𝑛
𝒙𝐸 𝒙
• Probabilistic Inference, so-called marginal:
𝑝 𝑥
𝑖= 𝑘 =
𝒙 | 𝑥𝑖=𝑘
𝑝(𝑥
1, … 𝑥
𝑖= 𝑘, … , 𝑥
𝑛) This can be used to make a maximum marginal decision:
𝑥
𝑖∗= 𝑎𝑟𝑔𝑚𝑎𝑥
𝑥𝑖𝑝 𝑥
𝑖MAP versus Marginal - visually
15/01/2015 23
Input Image Ground Truth Labeling
MAP solution 𝒙∗ (each pixel has 0,1 label)
Marginal 𝑝 𝑥𝑖
(each pixel has a probability between 0 and 1)
Intelligent Systems: Undirected Graphical Models
MAP versus Marginal – Making Decisions
Which solution 𝒙
∗would you choose?
Space of all solutions 𝒙 (sorted by pixel difference)
𝑝(𝒙|𝒛)
Reminder: How to make a decision
15/01/2015 25
Question: What solution 𝒙
∗should we give out?
Answer: Choose 𝒙
∗which minimizes the Bayesian risk:
𝒙
∗= 𝑎𝑟𝑔𝑚𝑖𝑛
𝒙𝒙
𝑝 𝒙 𝒛 𝐶(𝒙, 𝒙
∗) Assume model 𝑝 𝒙 𝒛 is known
𝐶 𝒙
𝟏, 𝒙
𝟐is called the loss function
(or cost function) of comparing to results 𝒙
𝟏, 𝒙
𝟐Intelligent Systems: Undirected Graphical Models
MAP versus Marginal – Making Decisions
Space of all solutions 𝒙 (sorted by pixel difference)
𝑝(𝒙|𝒛)
MAP solution (red) takes globally optimal solution
𝒙∗ = 𝑎𝑟𝑔𝑚𝑎𝑥𝒙 𝑝 𝒙|𝒛 = 𝑎𝑟𝑔𝑚𝑖𝑛𝒙𝐸 𝒙, 𝒛
Which one is the MAP solution?
Reminder: The Cost Function behind MAP
15/01/2015 27
𝑥
∗= 𝑎𝑟𝑔𝑚𝑖𝑛
𝒙𝒙
𝑝 𝒙 𝒛 𝐶(𝒙, 𝒙
∗)
= 𝑎𝑟𝑔𝑚𝑖𝑛
𝒙1 − 𝑝 𝒙 = 𝒙
∗𝒛 = 𝑎𝑟𝑔𝑚𝑎𝑥
𝒙𝑝(𝒙|𝒛)
The cost function for MAP: 𝐶 𝒙, 𝒙
∗= 0 if 𝒙 = 𝒙
∗, otherwise 1
The MAP estimation optimizes a “global 0-1 loss”
Intelligent Systems: Undirected Graphical Models
The Cost Function behind Max Marginal
Probabilistic Inference give marginal.
We can take the max-marginal solution:
𝑥
𝑖∗= 𝑎𝑟𝑔𝑚𝑎𝑥
𝑥𝑖𝑝 𝑥
𝑖(where 𝑝 𝑥
𝑖= 𝑘 =
𝒙 | 𝑥𝑖=𝑘
𝑝(𝑥
1, … 𝑥
𝑖= 𝑘, … , 𝑥
𝑛)
This represents the decision with minimum Bayesian Risk:
𝑥
∗= 𝑎𝑟𝑔𝑚𝑖𝑛
𝒙𝒙
𝑝 𝑥 𝒛 𝐶(𝒙, 𝒙
∗)
where 𝐶 𝒙, 𝒙
∗=
𝑖||𝑥
𝑖− 𝑥
𝑖∗||
2For 𝑥
𝑖∈ {0,1} this counts the number of differently labeled pixels
(proof not done)
𝐶 𝒙𝟏, 𝒙𝟐 = 10
𝒙𝟏 𝒙𝟐
Example:
𝐶 𝒙𝟐, 𝒙𝟑 = 10 𝒙𝟑 𝐶 𝒙𝟏, 𝒙𝟑 = 20
MAP versus Marginal – Making Decisions
15/01/2015 29
Space of all solutions 𝒙 (sorted by pixel difference)
𝑝(𝒙|𝒛)
Input image 𝒛
Intelligent Systems: Undirected Graphical Models
Which one is the Max-Marginal solution?
C=1 C=1 C=100
𝑥∗ is red then Risk is (sum only over 4 solutions): 0.1+0.1+100*0.2 = 20.2 𝑝 = 0.2 𝑝 = 0.1
𝑝 = 0.11 𝑝 = 0.1
𝑥∗ is blue then Risk is (sum only over 4 solutions): 11+10+10 = 31 Hence red is the max-marginal solution
(all numbers are arbitrary chosen)
This lecture: MAP Inference in order 2 models
𝑝 𝒙 =
1𝑓
exp{ −𝐸(𝒙) } 𝐸 𝒙 =
𝑖
𝜃
𝑖𝑥
𝑖+
𝑖,𝑗
𝜃
𝑖𝑗𝑥
𝑖, 𝑥
𝑗+
𝑖,𝑗,𝑘
𝜃
𝑖,𝑗,𝑘𝑥
𝑖, 𝑥
𝑗, 𝑥
𝑘+ … Gibbs distribution
Unary terms Pairwise terms Higher-order terms
• MAP inference: 𝒙
∗= 𝑎𝑟𝑔𝑚𝑎𝑥
𝒙𝑝 𝒙 = 𝑎𝑟𝑔𝑚𝑖𝑛
𝒙𝐸 𝑥
• Label space: binary 𝑥
𝑖∈ 0,1 or multi-label 𝑥
𝑖∈ 0, … , 𝐾
• We only look at energies with unary and pairwise factors
Roadmap for next two lectures
• Definition and Visualization of Factor Graphs
• Converting Directed Graphical Models to Factor Graphs
• Probabilistic Programing
• Queries and making decisions
• Binary-valued Factors graphs: Models and Optimization (ICM, Graph Cut)
• Multi-valued Factors Graphs: Models and Optimization (Alpha Expansion)
15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM 31
Image Segmentation
𝜃𝑖𝑗(𝑥𝑖, 𝑥𝑗) 𝑥𝑗
𝜃𝑖(𝑥𝑖) 𝑥𝑖
𝐸 𝒙 =
𝑖
𝜃𝑖 𝑥𝑖 +
𝑖,𝑗∈𝑁4
𝜃𝑖𝑗 𝑥𝑖, 𝑥𝑗
Binary Label: 𝑥𝑖 ∈ {0,1}
Input Image
with user brush strokes
(blue-background; red-foreground)
Desired binary output labeling
We will use the following energy:
Unary term Pairwise term
𝑁4 is the set of all neighboring pixels
Image Segmentation: Energy
15/01/2015 Computer Vision I: Introduction 33
Goal: formulate 𝐸(𝒙) such that
MAP solution: 𝒙
∗= 𝑎𝑟𝑔𝑚𝑖𝑛
𝑥𝐸(𝒙)
𝐸 𝒙 = 0.01 𝐸 𝒙 = 0.05 𝐸 𝒙 = 0.05 𝐸 𝒙 = 10
(numbers may not represent real numbers)
Unary term
Red
Gr ee n
Red
Gr ee n
user labelled pixels
(cross foreground; dot background)
Gaussian Mixture Model fit
Foreground model is blue
Background model is red
Unary term
15/01/2015 Computer Vision I: Introduction 35
Optimum with unary terms only Dark means likely
background
Dark means likely foreground
𝜃
𝑖(𝑥
𝑖= 0) 𝜃
𝑖(𝑥
𝑖= 1)
New query image 𝑧𝑖
𝐸 𝒙 =
𝑖
𝜃𝑖 𝑥𝑖
𝜃
𝑖𝑥
𝑖= 0 =
− log 𝑃
𝑟𝑒𝑑(𝑧
𝑖|𝑥
𝑖= 0) 𝜃
𝑖𝑥
𝑖= 1 =
− log 𝑃
𝑏𝑙𝑢𝑒(𝑧
𝑖|𝑥
𝑖= 1)
𝒙∗ = 𝑎𝑟𝑔𝑚𝑖𝑛𝑥 𝐸(𝒙)
Pairwise term
Lowest energy
Intermediate
energy Very high
energy
This models the assumption that the object is spatially coherent
• We choose a so-called Ising Prior:
𝜃
𝑖𝑗𝑥
𝑖, 𝑥
𝑗= |𝑥
𝑖− 𝑥
𝑗|
which gives an energy: 𝐸 𝒙 =
𝑖,𝑗∈𝑁4
𝜃
𝑖𝑗𝑥
𝑖, 𝑥
𝑗 𝜃𝑖𝑗(𝑥𝑖, 𝑥𝑗)𝑥𝑗
𝜃𝑖(𝑥𝑖) 𝑥𝑖
• Questions:
• What labelling has lowest energy?
• What labelling has highest energy?
Lowest energy
Adding unary and Pairwise term
15/01/2015 Computer Vision I: Introduction 37
𝜔 = 10 𝜔 = 0
𝜔 = 200 𝜔 = 40
Energy: 𝐸 𝒙 = 𝑖 𝜃𝑖 𝑥𝑖 + 𝜔 𝑖,𝑗∈𝑁4|𝑥𝑖 − 𝑥𝑗|
Question (done in exercise) Can the global optimum be computed with graph cut?
Please prove.
Question: What happens when 𝜔 increases further?
Is it the best we can do?
4-connected segmentation
Zoom-in on image
zoom zoom
Is it the best we can do?
15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM 39
0 0 0 0 0 0
0 1 1 1 1 0
0 1 1 1 1 0
0 1 1 1 1 0
0 1 1 1 1 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 1 1 0 0
0 1 1 1 1 0
0 1 1 1 1 0
0 0 1 1 0 0
0 0 0 0 0 0
Answers:
1) It depends on unary costs
2) The pairwise cost is the same in both cases (16 edges are cut of 𝑁
4)
Given: 𝐸 𝒙 =
𝑖𝜃
𝑖𝑥
𝑖+
𝑖,𝑗∈𝑁4
|𝑥
𝑖− 𝑥
𝑗|
Which segmentation has higher energy?
From 4-connected to 8-connected Factor Graph
Larger connectivity can model true Euclidean length (also other metric possible)
Eucl.
Length of the paths:
4-con.
5.65 8 1
8-con.
6.28 6.28
5.08 6.75
4-connected 8-connected
Going to 8-connectivty
15/01/2015 41
4-connected Euclidean
8-connected Euclidean (MRF)
Zoom-in image
Intelligent Systems: Undirected Graphical Models
Is it the best we can do?
Adapt the pairwise term
Standard 4-connected Edge-dependent
𝐸 𝒙 =
𝑖
𝜃
𝑖𝑥
𝑖+
𝑖,𝑗∈𝑁4
𝜃
𝑖𝑗𝑥
𝑖, 𝑥
𝑗𝛽 is a constant
𝜃𝑖𝑗 𝑥𝑖, 𝑥𝑗 = 𝑥𝑖 − 𝑥𝑗 (𝑒𝑥𝑝 −𝛽 𝑧𝑖 − 𝑧𝑗 2 ) 𝑒𝑥𝑝
−𝛽𝑧𝑖−𝑧𝑗2
Question (done in exercise) Can the global optimum be computed with graph cut?
Please prove.
What is this term doing?
𝑧𝑖 − 𝑧𝑗 2
A probabilistic view
15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM 43
𝐸 𝒙, 𝒛 =
𝑖
𝜃𝑖 𝑥𝑖, 𝑧𝑖 +
𝑖,𝑗∈𝑁4
𝜃𝑖𝑗 𝑥𝑖, 𝑥𝑗
1. Just look at the conditional distribution
𝜃𝑖 𝑥𝑖 = 1, 𝑧𝑖 = − log 𝑝𝑏𝑙𝑢𝑒(𝑧𝑖|𝑥𝑖 = 1) 𝜃𝑖 𝑥𝑖 = 0, 𝑧𝑖 = − log 𝑝𝑟𝑒𝑑(𝑧𝑖|𝑥𝑖 = 0)
𝜃𝑖𝑗 𝑥𝑖, 𝑥𝑗 = 𝑥𝑖 − 𝑥𝑗
2. Factorize the conditional distribution 𝑝 𝒙|𝒛 = 1
𝑝(𝒛) 𝑝 𝒛 𝒙 𝑝(𝒙) 𝑝(𝒛) is a constant factor
𝑝 𝒙 = 1 𝑓1
𝒊,𝒋∈𝑁4
exp{− 𝑥𝑖 − 𝑥𝑗 } 𝑝 𝒛|𝒙 = 1 𝑓2
𝒊
𝑝(𝑧𝑖|𝑥𝑖) =
Check yourself: 𝑝 𝒙|𝒛 = 1
𝑓𝑝 𝒛 𝒙 𝑝 𝒙 = 1
𝑓 exp{−𝐸(𝒙, 𝒛) } 1
𝑓2
𝒊
(𝑝𝑏𝑙𝑢𝑒 𝑧𝑖 𝑥𝑖 = 1 𝑥𝑖 + 𝑝𝑟𝑒𝑑 𝑧𝑖 𝑥𝑖 = 0 1 − 𝑥𝑖 )
and 𝑓 =
𝒙exp{−𝐸(𝒙, 𝒛)}
Gibbs Distribution: 𝑝 𝒙|𝒛 =
1𝑓
exp{ −𝐸(𝒙, 𝒛) }
exp − 0 = 1 exp − 1 = 0.36
ICM - Iterated conditional mode
Energy:
𝐸 𝒙 =
𝑖
𝜃
𝑖𝑥
𝑖+
𝑖,𝑗∈𝑁4
𝜃
𝑖𝑗𝑥
𝑖, 𝑥
𝑗Idea: fix all variable but one and
optimize for this one.
Example:
Insight: The optimization has an implicit energy that depends only on a few factors
𝐸′ 𝑥1|𝒙\1 = 𝑖𝜃𝑖 𝑥𝑖 + 𝑖,𝑗∈𝑁4𝜃𝑖𝑗 𝑥𝑖, 𝑥𝑗
= 𝜃𝑖 𝑥1 + 𝜃12 𝑥1, 𝑥2 + 𝜃13 𝑥1, 𝑥3 + 𝜃14 𝑥1, 𝑥4 + 𝜃15 𝑥1, 𝑥5
𝑥1|𝒙\1 means all labels but 𝑥1 are fixed
𝑥2
𝑥1
𝑥3 𝑥4
𝑥5
ICM - Iterated conditional mode
15/01/2015 45
Energy:
Intelligent Systems: Undirected Graphical Models
𝐸 𝒙 =
𝑖
𝜃
𝑖𝑥
𝑖+
𝑖,𝑗∈𝑁4
𝜃
𝑖𝑗𝑥
𝑖, 𝑥
𝑗𝑥2
𝑥1
𝑥3 𝑥4
𝑥5
Problems:
• Can get stuck in local minima
• Depends on initialization Algorithm:
1. Fix 𝒙 = 0
2. From 𝑖 = 1 … 𝑛
3. Update 𝑥
𝑖= 𝑎𝑟𝑔𝑚𝑖𝑛
𝐸′ 𝑥𝑖|𝒙\i4. Go to Step 2. if 𝐸(𝒙) has not change wrt previous Iteration
ICM Global optimum (with graph cut) 𝑥𝑖
ICM - parallelization
• The schedule is a more complex task in graphs which are not 4-connected Normal procedure:
Step 1 Step 2 Step 3 Step 4
Parallel procedure:
Step 1 Step 2 Step 3 Step 4
Roadmap for next two lectures
• Definition and Visualization of Factor Graphs
• Converting Directed Graphical Models to Factor Graphs
• Probabilistic Programing
• Queries and making decisions
• Binary-valued Factors graphs: Models and Optimization (ICM, Graph Cut)
• Multi-valued Factors Graphs: Models and Optimization (Alpha Expansion)
15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM 47