Intelligent Systems:
Undirected Graphical models – Inference and Applications
Carsten Rother
Roadmap for remaining lectures
• 11.12 (1): Computer Vision – a hard case for AI
• 11.12 (2): Probability Theory
• 18.12 (1): Exercise: probability theory
18.12 (2): Decision Making (Unstructured models)
• 8.1 (1): Maximum Likelihood Principle (Unstructured models)
• 8.1 (2): Discriminative Learning (Unstructured models)
• 15.1 (1): Exercise: Learning
• 15.1 (2): Discriminative (unstructured) Models
Lecturers: Carsten Rother and Dimitri Schlesinger
Roadmap for remaining lectures
• 22.1 (1): Undirected Graphical models: Inference and Applications
• 22.1 (2): Undirected Graphical models: Inference and Applications
• 29.1 (1): Exercise: Undirected Graphical models
• 29.1 (2): Recognition in Practice
• 5.2 (1): Probabilistic Inference in Undirected and Directed Graphical models
• 5.2 (2): Wrap up; Robot localization
and Learning Interactive Systems
Lecturers: Carsten Rother and Dimitri Schlesinger
Roadmap next two lectures
• Define: Structured Models
• Formulate applications as discrete labeling problems
• Discrete Inference:
• Pixels-based: Iterative Conditional Mode (ICM)
• Line-based: Dynamic Programming (DP)
• Field-based: Graph Cut and Alpha-Expansion
• Interactive Image Segmentation
• From Generative models to
• Discriminative models to
• Discriminative function
Roadmap next two lectures
• Define: Structured Models
• Formulate applications as discrete labeling problems
• Discrete Inference:
• Pixels-based: Iterative Conditional Mode (ICM)
• Line-based: Dynamic Programming (DP)
• Field-based: Graph Cut and Alpha-Expansion
• Interactive Image Segmentation
• From Generative models to
• Discriminative models to
• Discriminative function
Machine Learning: Big Picture
”Normal” Machine Learning:
f : Z N (classification) f : Z R (regression) Input: Image, text
Output: real number(s)
f : Z X
Input: Image , text
Output: complex structure object (labelling, parse tree)
Parse tree of a sentence
Image labelling Chemical structure
Structured Output Prediction:
Structured Output Prediction
Ad hoc definition
(from [Nowozin et al. 2011])Data that consists of several parts, and not only the parts themselves contain information, but also the way in which the parts belong
together.
Graphical models to capture structured problems
Write probability distribution as a Graphical model:
• Directed graphical model (also called Bayesian Networks)
• Undirected graphical model (also called Markov Random Field)
• Factor graphs (which we will use predominately)
• A visualization to represent a family of distributions
• Key concept is conditional independency
• You can also convert between distributions Basic idea:
References:
- Pattern Recognition and Machine Learning [Bishop ‘08, chapter 8]
- several lectures at the Machine Learning Summer School 2009 (see video lectures)
Notation
• Dimtri Schlesinger has used the following notation:
• Input data (discrete, continuous): 𝑥
• Output Class (discrete): 𝑘 ∈ 𝐾
• Parameters (derived during learning): 𝜃
• Posterior for recognition: 𝑝 𝑘 𝑥, 𝜃
• I will use (consistently) a different notation:
• Input data (discrete, continuous): 𝑧
• Output Class (discrete): 𝑥 ∈ 𝐾 (𝑜𝑟 𝐿)
• Parameters (derived during learning): 𝜃
• Posterior for recognition: 𝑝 𝑥 𝑧, 𝜃
Note for images we have many, e.g. 1 Million, variables.
The random variables are then: 𝐱 = (x
1, … , x
i, x
j, … x
n)
Probabilities - Reminder
• A random variable is denoted with 𝑥 ∈ {0, … , 𝐿}
• Discrete probability distribution: 𝑃(𝑥) satisfies
𝑥
𝑃(𝑥) = 1 where random variable 𝑥 ∈ {0, … , 𝐿}
• Joint distribution of two random variables: 𝑃(𝑥, 𝑧)
• Conditional distribution: 𝑃 𝑧 𝑥
• Sum rule (marginal distribution): 𝑃 𝑧 =
𝑥𝑃(𝑥, 𝑧)
• Independent probability distribution: 𝑃 𝑥, 𝑧 = 𝑃 𝑧 𝑃 𝑥
• Product rule: 𝑃 𝑥, 𝑧 = 𝑃 𝑧 𝑥 𝑃(𝑥)
• Bayes’ rule: 𝑃 𝑥|𝑧 =
𝑃𝑧 𝑥
𝑃 𝑥𝑃(𝑧)
Defining families of distributions
• 𝑃(𝑥 1 , 𝑥 2 ) general distribution
• 𝑃 2 (𝑥 1 , 𝑥 2 ) = 𝑃 𝑥 1 𝑥 2 𝑃 𝑥 2 = 𝑃(𝑥 1 )𝑃 𝑥 2 restricted family
• 𝑃 3 𝑥 1 , 𝑥 2 = 𝑁 𝑥 1 𝜇 1 , 𝜎 1 𝑁 𝑥 2 𝜇 2 , 𝜎 2 concrete realization
𝑃(𝑥
1, 𝑥
2) general distribution 𝑃
2𝑥
1, 𝑥
2= 𝑃 𝑥
1𝑃(𝑥
2) restricted family
𝑃
3𝑥
1, 𝑥
2concrete realization
Undirected Graphical models - example
Defines an unobserved variable Defines an observed variable
𝑥
1𝑥
2𝑥
3𝑥
4𝑥
5Clique: a set of nodes where ALL nodes are connected, example {𝑥
1, 𝑥
2, 𝑥
4}
𝑧
Undirected Graphical models - example
Defines an unobserved variable Defines an observed variable
𝑥
1𝑥
2𝑥
3𝑥
4𝑥
5𝑃 𝑥
1, 𝑥
2, 𝑥
3, 𝑥
4, 𝑥
5= 1
𝑓 𝜓 𝑥
1, 𝑥
2, 𝑥
4𝜓 𝑥
2, 𝑥
3, 𝑥
4𝜓 𝑥
1, 𝑥
2𝜓 𝑥
1, 𝑥
2𝜓 𝑥
1, 𝑥
4𝜓 𝑥
4, 𝑥
2𝜓 𝑥
3, 𝑥
2𝜓 𝑥
3, 𝑥
4𝜓 𝑥
4, 𝑥
5𝜓 𝑥
1𝜓 𝑥
2𝜓 𝑥
3𝜓 𝑥
4𝜓 𝑥
5𝑧
Maximum cliques (no node can be added): x
1, 𝑥
2, 𝑥
4, x
4, 𝑥
2, 𝑥
3, x
4, 𝑥
5Clique: a set of nodes where ALL nodes are
connected, example {𝑥
1, 𝑥
2, 𝑥
4}
Definition: Undirected Graphical models
• Given an undirected Graph 𝐺 = (𝑉, 𝐸), where 𝑉 is the set of nodes and 𝐸 the set of Edges
• An undirected Graphical Model defines a family of distributions:
𝑓: partition function C(G): Set of all cliques
C: a clique, i.e. a subset of variable indices.
𝜓
𝐶: factor (not distribution) depending on 𝒙
𝐶(𝜓
𝐶: 𝐾
|𝐶|→ 𝑅 where x
i∈ 𝐾)
Definition: Clique is a set of nodes where all nodes are linked with an edge
𝑃 𝒙 = 1
𝑓 𝐶∈𝐶(𝐺)
𝜓 𝐶 (𝒙 𝐶 ) where 𝑓 = 𝒙 𝐶∈𝐶(𝐺) 𝜓 𝐶 (𝒙 𝐶 )
Comment on definition
In some books the set 𝐶(𝐺) is defined as the set of all maximum cliques only.
𝑥
1𝑥
2𝑥
3𝑥
4𝑥
5The set of families of distributions is equivalent.
For instance, a factor 𝜓 𝑥
1, 𝑥
2= (𝑥
1+𝑥
2) 𝑥
1can also be written as two factors: 𝜓′ 𝑥
1, 𝑥
2𝜓′ 𝑥
1where 𝜓′ 𝑥
1, 𝑥
2= 𝑥
1+ 𝑥
2; 𝜓′ 𝑥
1= 𝑥
1𝑃 𝑥1, 𝑥2, 𝑥3, 𝑥4, 𝑥5 =1
𝑓 𝜓 𝑥1, 𝑥2, 𝑥4 𝜓 𝑥2, 𝑥3, 𝑥4 𝜓 𝑥1, 𝑥2 𝜓 𝑥1, 𝑥2 𝜓 𝑥1, 𝑥4 𝜓 𝑥4, 𝑥2
𝜓 𝑥3, 𝑥2 𝜓 𝑥3, 𝑥4 𝜓 𝑥4, 𝑥5 𝜓 𝑥1 𝜓 𝑥2 𝜓 𝑥3 𝜓 𝑥4 𝜓 𝑥5
𝑃 𝑥1, 𝑥2, 𝑥3, 𝑥4, 𝑥5 = 1
𝑓 𝜓 𝑥1, 𝑥2, 𝑥4 𝜓 𝑥2, 𝑥3, 𝑥4 𝜓 𝑥4, 𝑥5
Using maximum cliques.
Filter View of Undirected Graphical Models
𝑥
1𝑥
2𝑥
3𝑥
4𝑥
5𝑃 𝑥
1, 𝑥
2, 𝑥
3, 𝑥
4, 𝑥
5Arbitrary probability
𝑃 𝑥1, 𝑥2, 𝑥3, 𝑥4, 𝑥5 =1
𝑓 𝜓 𝑥1, 𝑥2, 𝑥4 𝜓 𝑥2, 𝑥3, 𝑥4 𝜓 𝑥1, 𝑥2 𝜓 𝑥1, 𝑥2 𝜓 𝑥1, 𝑥4 𝜓 𝑥4, 𝑥2 𝜓 𝑥3, 𝑥2 𝜓 𝑥3, 𝑥4 𝜓 𝑥4, 𝑥5
𝜓 𝑥1 𝜓 𝑥2 𝜓 𝑥3 𝜓 𝑥4 𝜓 𝑥5
Smaller family
When would the filter let through all distributions?
Conditional Independency
𝑥
1𝑥
2𝑥
3𝑥
4𝑥
5A
B C
Does it hold: 𝑃(𝐴, 𝐶|𝐵) = 𝑃(𝐴|𝐵) 𝑃(𝐶|𝐵) ? - Yes if all paths from 𝐴 to 𝐶 go through 𝐵
- Otherwise No
This is also written: 𝐴 𝐶 | 𝐵
Hammersley-Clifford Theorem
• Let 𝑈𝐹 be the family of distributions defined by:
𝑃 𝒙 = 1
𝑓 𝐶∈𝐶(𝐺)
𝜓 𝐶 (𝒙 𝐶 )
• Let 𝑈𝐼 be the set of distributions that are consistent with set of conditional independence statements that can be read from the graph
• The Theorem states that 𝑈𝐼 and 𝑈𝐹 are indentical
Definition: Factor Graph models
• Given an undirected Graph 𝐺 = (𝑉, 𝐹, 𝐸), where 𝑉, 𝐹 are the set of nodes and 𝐸 the set of Edges
• A Factor Graph defines a family of distributions:
𝑓: partition function 𝐹 : Factor
𝔽: Set of all factors
𝑁(𝐹): Neighbourhood of a factor
𝜓
𝐹: function (not distribution) depending on 𝒙
𝑁(𝐹)(𝜓
𝐶: 𝐾
|𝐶|→ 𝑅 where x
i∈ 𝐾)
𝑃 𝒙 = 1
𝑓 𝐹∈ 𝔽
𝜓 𝐹 (𝒙 𝑁 𝐹 ) where 𝑓 = 𝒙 𝐹∈ 𝔽 𝜓 𝐹 (𝒙 𝑁 𝐹 )
Note the definition of factor is not linked to a property of a Graph (as with cliques)
Factor Graphs - example
Defines an unobserved variable
means that these variables are in one factor Defines an observed variable
𝑃 𝑥
1, 𝑥
2, 𝑥
3, 𝑥
4, 𝑥
5=
1𝑓
𝜓 𝑥
1, 𝑥
2, 𝑥
4𝜓 𝑥
2, 𝑥
3𝜓 𝑥
3, 𝑥
4𝜓 𝑥
4, 𝑥
5𝜓 𝑥
4𝑥
1𝑥
2𝑥
3𝑥
4𝑥
5Defines a factor node
Introducing energies
𝑃 𝒙 = 1
𝑓 𝐹∈ 𝔽 𝜓 𝐹 (𝒙 𝑁 𝐹 ) = 1
𝑓 𝐹∈ 𝔽 exp{−𝜃 𝐹 (𝒙 𝑁 𝐹 )} =
1
𝑓 exp{ − 𝐹∈ 𝔽 𝜃 𝐹 𝑥 𝑁 𝐹 } = 1
𝑓 exp{ −𝐸(𝒙) } The energy 𝐸 𝒙 is just a sum of factors:
E 𝒙 =
𝐹∈
𝔽
𝜃
𝐹𝑥
𝑁 𝐹The most likely solution 𝑥 ∗ is reached by minimizing the energy:
𝑥 ∗ = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑥 𝑃 𝑥 𝑥 ∗ = 𝑎𝑟𝑔𝑚𝑖𝑛 𝑥 𝐸(𝑥)
(since it is: − log 𝑃 𝑥 = log 𝑓 + 𝐸 𝒙 = constant + 𝐸(𝒙) )
Gibbs Distribution
𝑃 𝑥 = 1
𝑓 exp{ −𝐸(𝒙) } E 𝒙 =
𝐹∈ 𝔽
𝜃 𝐹 𝑥 𝑁 𝐹
Is a so-called Gibbs distribution or Boltzmann Distribution
with energy 𝐸
Definition: Order
arity 3 arity 2
Definitions:
• Order: the arity (number of variables) of the largest factor
• Markov Random Field: Random Field with low-order factors
𝑥
1𝑥
2𝑥
3𝑥
4𝑥
5Factor graph with order 3
𝐸 𝑥 = 𝜃 𝑥
1, 𝑥
2, 𝑥
4+ 𝜃 𝑥
2, 𝑥
3+ 𝜃 𝑥
3, 𝑥
4+ 𝜃 𝑥
5, 𝑥
4+ 𝜃(𝑥
4)
arity 1
Two examples
𝑃
1𝑥
1, 𝑥
2, 𝑥
3= 1
𝑓 exp{ 𝑥
1𝑥
2+ 𝑥
2𝑥
3+ 𝑥
1𝑥
3}
𝑃
2𝑥
1, 𝑥
2, 𝑥
3= 1
𝑓 exp{𝑥
1𝑥
2𝑥
3}
We always tray to write distributions as “factorized” as possible
Note 𝑃
2cannot be written as a sum of pairwise energy terms
The family view of distributions
Family of all distributions
Family of all distributions with same 𝑈𝑛𝑑𝑖𝑟𝑒𝑐𝑡 𝐺𝑟𝑎𝑝ℎ𝑖𝑐𝑎𝑙 𝑀𝑜𝑑𝑒𝑙𝑠
Realization of a distribution 𝑃
1𝑥
1, 𝑥
2, 𝑥
3= 1
𝑓 exp{ 𝑥
1𝑥
2+ 𝑥
2𝑥
3+ 𝑥
1𝑥
3}
𝑃
1𝐹𝑥
1, 𝑥
2, 𝑥
3= 1
𝑓 ( 𝜓 𝑥
1, 𝑥
2𝜓 𝑥
2, 𝑥
3𝜓 𝑥
1, 𝑥
3)
𝑃
12𝑈𝑥
1, 𝑥
2, 𝑥
3= 1
𝑓 𝜓 𝑥
1, 𝑥
2, 𝑥
3𝑃
2𝑥
1, 𝑥
2, 𝑥
3= 1
𝑓 exp{𝑥
1𝑥
2𝑥
3}
𝑃
2𝐹𝑥
1, 𝑥
2, 𝑥
3= 1
𝑓 𝜓 𝑥
1, 𝑥
2, 𝑥
3Family of all distributions with same 𝑓𝑎𝑐𝑡𝑜𝑟 𝑔𝑟𝑎𝑝ℎ
𝑃1 𝑃2
𝑃1𝐹 𝑃2𝐹 𝑃12𝑈
“written in maximum clique form”
Undirected Graphical Models are less precise
𝑃
12𝑈𝑥
1, 𝑥
2, 𝑥
3= 1
𝑓 𝜓 𝑥
1, 𝑥
2, 𝑥
3𝑃
1𝐹𝑥
1, 𝑥
2, 𝑥
3= 1
𝑓 ( 𝜓 𝑥
1, 𝑥
2𝜓 𝑥
2, 𝑥
3𝜓 𝑥
1, 𝑥
3) 𝑃
2𝐹𝑥
1, 𝑥
2, 𝑥
3= 1
𝑓 𝜓 𝑥
1, 𝑥
2, 𝑥
3Easy to convert between the two representations
𝑥
1𝑥
2𝑥
3𝑥
4𝑥
5𝑥
1𝑥
2𝑥
3𝑥
4𝑥
5Make sure that every factor is represented by a clique
Family of distributions with this
factor graph
Family of distributions with this undirected graphical model
Convert a factor graph in such a way that the family of distributions of the
undirected graphical model covers all possible distributions of this factor graph:
Easy to convert between the two representations
𝑥
1𝑥
2𝑥
3𝑥
4𝑥
5𝑥
1𝑥
2𝑥
3𝑥
4𝑥
5Make sure that every clique has an associated factor
Family of distributions of this undirected graphical model and factor graph is the same
Convert an undirected graphical model in such a way that the family of
distributions of the factor graph covers all possible distributions of this
undirected graphical model:
Easy to convert between the two representations
𝑥
1𝑥
2𝑥
3𝑥
4𝑥
5𝑥
1𝑥
2𝑥
3𝑥
4𝑥
5Make sure that every clique has an associated factor
Family of distributions of this undirected graphical model and factor graph is the same
Convert an undirected graphical model in such a way that the family of distributions of the factor graph covers all possible distributions of this undirected graphical model:
Comment: this is also correct, but not minimal
representation
Easy to convert between the two representations
𝑥
1𝑥
2𝑥
3𝑥
4𝑥
5Make sure that every clique has an associated factor
Family of distributions of this undirected graphical model and factor graph is the same
Convert an undirected graphical model in such a way that the family of distributions of the factor graph covers all possible distributions of this undirected graphical model:
Comment: this is not correct
𝑥
1𝑥
2𝑥
3𝑥
4𝑥
5What to infer?
• MAP inference (Maximum a posterior state):
𝑥 ∗ = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑥 𝑃 𝑥 = 𝑎𝑟𝑔𝑚𝑖𝑛 𝑥 𝐸 𝑥
• Probabilistic Inference, so-called marginals:
𝑃 𝑥 𝑖 = 𝑘 =
𝒙 | 𝑥
𝑖=𝑘
𝑃(𝑥 1 , … 𝑥 𝑖 = 𝑘, … , 𝑥 𝑛 )
This can be used to make a maximum marginal decision:
𝑥 𝑖 ∗ = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑥
𝑖𝑃 𝑥 𝑖
MAP versus Marginals - visually
Input Image Ground Truth Labeling
MAP solution 𝒙
∗(each pixel has 0,1 label)
Marginals 𝑃 𝑥
𝑖(each pixel has a probability
between 0 and 1)
MAP versus Marginals – Making Decisions
Which solution 𝑥 ∗ would you choose?
Space of all solutions x (sorted by pixel difference) 𝑃(𝑥|𝑧)
Input image 𝑧
Reminder: How to make a decision
Question: What solution 𝑥 ∗ should we give out?
Answer: Choose 𝑥 ∗ which minimizes the Bayesian risk:
𝑥 ∗ = 𝑎𝑟𝑔𝑚𝑖𝑛 𝑥
𝑥
𝑃 𝑥 𝑧 𝐶(𝑥, 𝑥 ∗ ) Assume model 𝑃 𝑥 𝑧 is known
𝐶 𝑥 1 , 𝑥 2 is called the loss function
(or cost function) of comparing to results 𝑥 1 , 𝑥 2
Maximum A-Posterior Solution (MAP)
Space of all solutions x (sorted by pixel difference) 𝑃(𝑥|𝑧)
MAP solution takes globally optimal solution.
The Cost Function behind MAP
𝑥 ∗ = 𝑎𝑟𝑔𝑚𝑖𝑛 𝑥
𝑥
𝑃 𝑥 𝑧 𝐶(𝑥, 𝑥 ∗ )
= 𝑎𝑟𝑔𝑚𝑖𝑛 𝑥 1 − 𝑃 𝑥 = 𝑥 ∗ 𝑧 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑥 𝑃(𝑥|𝑧) Choose: 𝐶 𝑥, 𝑥 ∗ = 0 if 𝑥 = 𝑥 ∗ , otherwise 1
The MAP estimation optimizes a “global 0-1 loss”
The Cost Function behind Max Marginals
Probabilistic Inference give marginal. We can take the max- marginal solution:
𝑥 𝑖 ∗ = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑥
𝑖𝑃 𝑥 𝑖
(where 𝑃 𝑥
𝑖= 𝑘 =
𝒙 | 𝑥𝑖=𝑘
𝑃(𝑥
1, … 𝑥
𝑖= 𝑘, … , 𝑥
𝑛)
This represents the decision with minimum Bayesian Risk:
𝑥 ∗ = 𝑎𝑟𝑔𝑚𝑖𝑛 𝑥
𝑥
𝑃 𝑥 𝑧 𝐶(𝑥, 𝑥 ∗ ) where 𝐶 𝑥, 𝑥 ∗ = 𝑖 |𝑥 𝑖 − 𝑥 𝑖 ∗ |
This is a pixel-wise error, called “hamming loss”
Maximum A-Posterior Solution (MAP)
Space of all solutions x (sorted by pixel difference) 𝑃(𝑥|𝑧)
Maximum Marginal solution: 𝑥 𝑖 ∗ = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑥
𝑖𝑃 𝑥 𝑖
“guessed”
This lecture: Discrete Inference in Order two models
𝑃 𝑥 =
1𝑓
exp{ −𝐸(𝒙) }
E 𝒙 =
𝑖
𝜃
𝑖𝑥
𝑖+
𝑖,𝑗
𝜃
𝑖𝑗𝑥
𝑖, 𝑥
𝑗+
𝑖,𝑗,𝑘
𝜃
𝑖,𝑗,𝑘𝑥
𝑖, 𝑥
𝑗, 𝑥
𝑘+ … Gibbs distribution
Unary terms Pairwise terms Higher-order terms
MAP inference: 𝑥
∗= 𝑎𝑟𝑔𝑚𝑎𝑥
𝑥𝑃 𝑥 = 𝑎𝑟𝑔𝑚𝑖𝑛
𝑥𝐸 𝑥
Label space: binary 𝑥
𝑖∈ 0,1 or multi-label 𝑥
𝑖∈ 0, … , 𝐾
We only look at energies with unary and pairwise factors
Roadmap next two lectures
• Define: Structured Models
• Formulate applications as discrete labeling problems
• Discrete Inference:
• Pixels-based: Iterative Conditional Mode (ICM)
• Line-based: Dynamic Programming (DP)
• Field-based: Graph Cut and Alpha-Expansion
• Interactive Image Segmentation
• From Generative models to
• Discriminative models to
• Discriminative function
Example in Biology
Conditional graphical models for protein structural motif recognition.
Liu Y, Carbonell J, Gopalakrishnan V, Weigele P.
Example in Biology
A Discrete Chain Graph Model for 3d+t Cell Tracking with High
Misdetection Robustness. Bernhard X. Kausler, Martin Schiegg, Bjoern Andres,
Examples: Order
4-connected;
pairwise MRF
Higher-order RF
𝐸(𝑥) = 𝜃
𝑖𝑗(𝑥
𝑖, 𝑥𝑗)
𝑖, 𝑗Є𝑁4
higher(8)-connected;
pairwise MRF
Order 2 Order 2 Order n
𝐸(𝑥) = 𝜃
𝑖𝑗(𝑥
𝑖, 𝑥𝑗)
“Pairwise energy” “higher-order energy”
𝐸(𝑥) = 𝜃
𝑖𝑗(𝑥
𝑖, 𝑥𝑗)
𝑖, 𝑗Є𝑁8 𝑖, 𝑗Є𝑁4
+𝜃(𝑥
1, … , 𝑥
𝑛)
Stereo Vision (all Details will come in CV 1)
• Gray Images are given
• Transform to yellow images Rectification -
Corresponding points are now
on same scanline
Correspondence Search now in 1D
What is the right match?
Stereo Camera - Geometry
disparity
disparity
• Disparity zero means that 3D point is at infinity
• Large Disparity means that 3D point is close to camera
• From Disparities you can get out depth map
Stereo Matching – Formulate as MRF
d=4 d=0
Ground truth depth Image – left(a) Image – right(b)
• Images rectified
• Ignore occlusion
𝐸(𝒅): {0, … , 𝐷}
𝑛→ 𝑅
Energy:
Labels: 𝑑
𝑖disparity (shift) of pixel 𝑖
𝑑
𝑖Label only left image
Stereo Matching - Energy
• Unary terms (many options)
Patch-Cost for a pixel i with disparty 𝑑
𝑖is:
𝜃
𝑖𝑑
𝑖=
𝑗∈𝑁𝑖
𝐼
𝑗𝑙− 𝐼
𝑗−𝑑𝑟 𝑖 2Left: 𝐼
𝑙E 𝒅 =
𝑖
𝜃
𝑖𝑑
𝑖+
𝑖,𝑗
𝜃
𝑖𝑗𝑑
𝑖, 𝑑
𝑗Right: 𝐼
𝑟𝑑𝑖
𝜃𝑖(𝑑𝑖)
𝜃𝑖𝑗(𝑑𝑖, 𝑑𝑗)
Sum of squared differences in a window (SSD cost)
Stereo Matching Energy - Smoothness
[Olga Veksler PhD thesis, Daniel Cremers et al.]
cos t
No truncation (global min.)
𝑑𝑖
𝜃𝑖(𝑑𝑖)
𝜃𝑖𝑗(𝑑𝑖, 𝑑𝑗)
𝜃
𝑖𝑗𝑑
𝑖, 𝑑
𝑗= |𝑑
𝑖− 𝑑
𝑗|
|𝑑
𝑖− 𝑑
𝑗|
Stereo Matching Energy - Smoothness
discontinuity preserving potentials [Blake&Zisserman’83,’87]
cos t
No truncation (global min.)
with truncation (NP-hard optimization)
𝜃
𝑖𝑗𝑑
𝑖, 𝑑
𝑗= min( 𝑑
𝑖− 𝑑
𝑗, 𝜏)
|𝑑
𝑖− 𝑑
𝑗|
Stereo Matching: Simplified Random Fields
No MRF; Block Matching Pixel independent (WTA)
No horizontal links
Efficient since independent chains
Ground truth Pairwise MRF
[Boykov et al. ‘01]
Image Segmentation
𝜃
𝑖𝑗(𝑥
𝑖, 𝑥
𝑗) 𝑥
𝑗𝜃
𝑖(𝑥
𝑖) 𝑥
𝑖E 𝒙 =
𝑖
𝜃
𝑖𝑥
𝑖+
𝑖,𝑗
𝜃
𝑖𝑗𝑥
𝑖, 𝑥
𝑗Binary Label: 𝑥
𝑖∈ {0,1}
Unary term
Optimum with unary terms only
Dark means likely background
Dark means likely foreground
𝜃 𝑖 (𝑥 𝑖 = 0) 𝜃 𝑖 (𝑥 𝑖 = 1)
Encode that red/purple colors are more likely
foreground and yellow/dark colors are more
likely background – derivation next lecture
Pairwise term - Reminder
Most likely Most likely Intermediate likely
“Ising Prior”
most unlikely
This models the assumption that the object is spatially coherent 𝜃 𝑖𝑗 𝑥 𝑖 , 𝑥 𝑗 = |𝑥 𝑖 − 𝑥 𝑗 |
When is 𝜃 𝑖𝑗 (𝑥 𝑖 , 𝑥 𝑗 ) small, i.e. likely configuration ?
Texture Synthesis
Input
Output
[Kwatra et. al. Siggraph ‘03 ]
b a
O
1
E 𝒙 =
𝑖
𝜃𝑖 𝑥𝑖 +
𝑖,𝑗
𝜃𝑖𝑗 𝑥𝑖, 𝑥𝑗
Binary Label: 𝑥
𝑖∈ {0,1}
𝜃𝑖𝑗 𝑥𝑖, 𝑥𝑗
𝜃𝑖 𝑥𝑖
𝜃
𝑖𝑗𝑥
𝑖, 𝑥
𝑗= 0 𝑖𝑓 𝑥
𝑖= 𝑥
𝑗𝜃
𝑖𝑥
𝑖= ∞ if image does not
exist at pixel I otherwise 0
Texture Synthesis
Input
Output
b a
O
1
E 𝒙 =
𝑖
𝜃𝑖 𝑥𝑖 +
𝑖,𝑗
𝜃𝑖𝑗 𝑥𝑖, 𝑥𝑗
Binary Label: 𝑥
𝑖∈ {0,1}
𝜃𝑖𝑗 𝑥𝑖, 𝑥𝑗
𝜃𝑖 𝑥𝑖
a b
a b
i j i j
Good case: Bad case:
𝐸 𝒙 =
𝒊𝒋
𝒙
𝒊− 𝒙
𝒋( 𝑎
𝑖− 𝑏
𝑖+ 𝑎
𝑗− 𝑏
𝑗)
Panoramic Stitching
Use identical energy
Panoramic Stitching
Use identical energy
Interactive Digital Photomontage
[Agarwala et al., Siggraph 2004]
Interactive Digital Photomontage
[Agarwala et al., Siggraph 2004]
Face Freeman Image Quilting
[A. Efros and W. T Freeman, Image quilting for texture synthesis and transfer, SIGGRAPH 2001]
• Unary term matches “dark” rice pixels to dark face pixels
• Place source image at random positions on the output canvas
• You can also use dynamic program (see article)
source image
Output canvas
Video Synthesis
Output Input
Video
Video (duplicated)
A 3D labeling problem Same pairwise terms but now
in 𝑥, 𝑦, and 𝑡(𝑡𝑖𝑚𝑒)-direction
Image Retargeting
http://swieskowski.net/carve/
Image Retargeting
Image Retargeting
E 𝒙 =
𝑖
𝜃𝑖 𝑥𝑖 +
𝑖,𝑗
𝜃𝑖𝑗 𝑥𝑖, 𝑥𝑗
Binary Label: 𝑥
𝑖∈ {0,1}
𝜃𝑖𝑗 𝑥𝑖, 𝑥𝑗
𝜃𝑖 𝑥𝑖
Force label 0
Force label 1
Cut (sketched)
Label 0 Label 1
Goal: from each scan-line take out exactly one pixel. This gives then the new image (less one pixel in x-direction).
Cut should go through places with
low image gradient
First Idea
𝜃𝑖 𝑥𝑖
Force label 0
Force label 1
Violates our constraint to take exactly one pixel per scanline
𝜃𝑖𝑗 𝑥𝑖, 𝑥𝑗
Label 1 Label 0
𝒙
𝒊𝒙
𝒋value
0 0 0
0 1
Ii − Ij1 0
Ii − Ij1 1 0
All pairwise terms may look like this:
1,0 transition here
𝑖 𝑗
The correct graph
𝜃𝑖 𝑥𝑖
Force label 0
Force label 1
Does not violate our constraint
𝜃𝑖𝑗 𝑥𝑖, 𝑥𝑗
Label 1 Label 0
No 1,0 transition possible
𝒙
𝒊𝒙
𝒋value
0 0 0
0 1
Ii− Ij1 0 ∞
1 1 0
All horizontal
pairwise terms look like this:
𝒙
𝒊𝒙
𝒋value
0 0 0
0 1
Ii− Ij1 0
Ii− Ij1 1 0
All vertical pairwise terms look like this:
∞ means a very large number
[Improved Seam Carving for Video Retargeting, Rubinstein et al, Siggraph ‘08]
Extension – Scene Carving
Image Hand drawn depth-ordering
Normal seam carving Scene carving
Examples: Order
4-connected;
pairwise MRF
Higher-order RF
𝐸(𝑥) = 𝜃
𝑖𝑗(𝑥
𝑖, 𝑥𝑗)
𝑖, 𝑗Є𝑁4
higher(8)-connected;
pairwise MRF
Order 2 Order 2 Order n
𝐸(𝑥) = 𝜃
𝑖𝑗(𝑥
𝑖, 𝑥𝑗)
“Pairwise energy” “higher-order energy”
𝐸(𝑥) = 𝜃
𝑖𝑗(𝑥
𝑖, 𝑥𝑗)
𝑖, 𝑗Є𝑁8 𝑖, 𝑗Є𝑁4
+𝜃(𝑥
1, … , 𝑥
𝑛)
Avoid Discretization artefacts
Larger connectivity can model true Euclidean length (also other metric possible)
Eucl.
Length of the paths:
4-con.
5.65 8 1
8-con.
6.28 6.28
5.08 6.75
Can you choose edge weights in such a way
that the dark yellow and blue segmentation
have different length?
Avoid Discretization artefacts
4-connected Euclidean
8-connected Euclidean (MRF)
8-connected
geodesic (CRF)
Examples: Order
4-connected;
pairwise MRF
Higher-order RF
𝐸(𝑥) = 𝜃
𝑖𝑗(𝑥
𝑖, 𝑥𝑗)
𝑖, 𝑗Є𝑁4
higher(8)-connected;
pairwise MRF
Order 2 Order 2 Order n
𝐸(𝑥) = 𝜃
𝑖𝑗(𝑥
𝑖, 𝑥𝑗)
“Pairwise energy” “higher-order energy”
𝐸(𝑥) = 𝜃
𝑖𝑗(𝑥
𝑖, 𝑥𝑗)
𝑖, 𝑗Є𝑁8 𝑖, 𝑗Є𝑁4
+𝜃(𝑥
1, … , 𝑥
𝑛)
Advanced Object recognition
• Many other examples: ObjCut Kumar et. al. ’05; Deformable Part Model Felzenszwalb et al.; CVPR ’08; PoseCut Bray et al. ’06,
LayoutCRF Winn et al. ’06
• Maximizing / Marginalization over hidden variables
“parts”
“instance label”
“instance”
[LayoutCRF Winn et al. ’06]
Roadmap next two lectures
• Define: Structured Models
• Formulate applications as discrete labeling problems
• Discrete Inference:
• Pixels-based: Iterative Conditional Mode (ICM)
• Line-based: Dynamic Programming (DP)
• Field-based: Graph Cut and Alpha-Expansion
• Interactive Image Segmentation
• From Generative models to
• Discriminative models to
• Discriminative function
Inference – Big Picture (this will be done in Computer Vision 2)
• Combinatorial Optimization
• Binary, pairwise MRF: Graph cut, BHS (QPBO)
• Multiple label, pairwise: move-making; transformation
• Binary, higher-order factors: transformation
• Multi-label, higher-order factors:
move-making + transformation
• Dual/Problem Decomposition
• Decompose (NP-)hard problem into tractable once.
Solve with e.g. sub-gradient technique
• Local search / Genetic algorithms
• ICM, simulated annealing
Inference – Big Picture (this will be done in Computer Vision 2)
• Message Passing Techniques
• Methods can be applied to any model in theory (higher order, multi-label, etc.)
• DP, BP, TRW, TRW-S
• LP-relaxation
• Relax original problem (e.g. {0,1} to [0,1])
and solve with existing techniques (e.g. sub-gradient)
• Can be applied any model (dep. on solver used)
• Connections to message passing (TRW) and combinatorial
optimization (QPBO)
Function Minimization: The Problems
• Which functions are exactly solvable?
• Approximate solutions of NP-hard problems
Function Minimization: The Problems
• Which functions are exactly solvable?
Boros Hammer [1965], Kolmogorov Zabih[ECCV 2002, PAMI 2004] , Ishikawa [PAMI 2003], Schlesinger [EMMCVPR 2007], Kohli Kumar Torr [CVPR2007, PAMI 2008] , Ramalingam Kohli Alahari Torr [CVPR 2008] , Kohli Ladicky Torr [CVPR 2008, IJCV 2009] , Zivny Jeavons [CP 2008]
• Approximate solutions of NP-hard problems
Schlesinger[1976 ], Kleinberg and Tardos[FOCS 99], Chekuri et al.[2001], Boykov et al. [PAMI 2001], Wainwright et al. [NIPS 2001], Werner [PAMI 2007], Komodakis [PAMI 2005], Lempitsky et al. [ICCV 2007], Kumar et al. [NIPS 2007], Kumar et al. [ICML 2008], Sontag and Jakkola[NIPS 2007], Kohli et al. [ICML 2008], Kohli et al. [CVPR 2008, IJCV 2009], Rother et al. [2009]
ICM - Iterated conditional mode
Gibbs Energy:
𝑥
2𝑥
1𝑥
3𝑥
4𝑥
5𝐸 𝒙 = 𝜃
12𝑥
1, 𝑥
2+ 𝜃
13𝑥
1, 𝑥
3+
𝜃
14𝑥
1, 𝑥
4+ 𝜃
15𝑥
1, 𝑥
5+ ⋯
ICM - Iterated conditional mode
𝑥
2𝑥
1𝑥
3𝑥
4𝑥
5Idea: fix all variable but one and optimize for this one
Selected 𝑥
1and optimize:
Gibbs Energy:
• Can get stuck in local minima
• Depends on initialization
ICM Global min
𝐸 𝒙 = 𝜃
12𝑥
1, 𝑥
2+ 𝜃
13𝑥
1, 𝑥
3+ 𝜃
14𝑥
1, 𝑥
4+ 𝜃
15𝑥
1, 𝑥
5+ ⋯
𝐸′ 𝒙 = 𝜃
12𝑥
1, 𝑥
2+ 𝜃
13𝑥
1, 𝑥
3+
𝜃
14𝑥
1, 𝑥
4+ 𝜃
15𝑥
1, 𝑥
5ICM - parallelization
• The schedule is a more complex task in graphs which are not 4-connected Normal procedure:
Step 1 Step 2 Step 3 Step 4
Parallel procedure:
Step 1 Step 2 Step 3 Step 4
ICM - Iterated conditional mode
Extension / related techniques
• Simulated annealing
• Block ICM (see excersise)
• Gibbs sampling (see later)
• Lazy Flipper: MAP Inference in Higher-Order Graphical Models by Depth-limited
Exhaustive Search
[Bjoern Andres, Joerg H. Kappes, Ullrich Koethe, Fred A. Hamprecht]Roadmap next two lectures
• Define: Structured Models
• Formulate applications as discrete labeling problems
• Discrete Inference:
• Pixels-based: Iterative Conditional Mode (ICM)
• Line-based: Dynamic Programming (DP)
• Field-based: Graph Cut and Alpha-Expansion
• Interactive Image Segmentation
• From Generative models to
• Discriminative models to
• Discriminative function
What is dynamic programming?
dynamic programming is a method for solving complex problems by breaking them down into simpler subproblems. It will
examine all possible ways to solve the problem and will find the
optimal solution.
Dynamic Programming on chains
No MRF; Block Matching Pixel independent (WTA)
No horizontal links
Efficient since independent chains
Ground truth Pairwise MRF
[Boykov et al. ‘01]
Dynamic Programming on chains
q
p r
• Pass messages from left to right
• Message is a vector with 𝐾 entries (𝐾 is the amount of labels)
• Read out the solution from final message and final unary term
• globally exact solution
• Other name: min-sum algorithm
𝑀
𝑞→𝑟(𝑥
𝑟)
E 𝒙 =
𝑖
𝜃
𝑖𝑥
𝑖+
𝑖,𝑗∈𝑁
𝜃
𝑖𝑗𝑥
𝑖, 𝑥
𝑗Unary terms pairwise terms in a row
𝑀
𝑝→𝑞(𝑥
𝑞) 𝑀
𝑟→𝑠(𝑥
𝑠)
𝑀
𝑜→𝑝(𝑥
𝑝)
Comment: Dmitri Schlesinger called the messages Bellman functions.
Dynamic Programming on chains
q
p r
𝑀
𝑞→𝑟Define the message:
𝑀 𝑞→𝑟 𝑥 r = min
𝑥
𝑞{M p→q 𝑥 𝑞 + 𝜃 𝑞 𝑥 𝑞 + 𝜃 𝑞,𝑟 (𝑥 𝑞 , 𝑥 𝑟 )}
Information from previous nodes
Local
information
Connection to next node
Message stores the energy up to this point for 𝑥
𝑟= 𝑘:
𝑀
𝑞→𝑟𝑥
𝑟= 𝑘 = min
𝑥1…𝑞,𝑥𝑟=𝑘
E 𝑥
1, … , 𝑥
𝑞, 𝑥
𝑟= 𝑘
Dynamic Programming on chains - example
Dynamic Programming on trees
Tree structure
root
Example: Part-based object recognition
[Felzenschwalb, Huttenlocher ‘01]
Messages
Extensions
• Can be done with marginal: sum-product algorithm (see later lecture)
• Can be done in fields: Belief Propagation
• Can be done in higher-order factor graphs
• Speed up trick with distance transforms
• Shortest path in a graph can also be done with dynamic programming
General shortest path
Variables x Label
Dynamic Programming in vision – Two scenarios
• The two dimensions are pixels (on chain) and labels:
• Stereo
• Many other applications
• The two dimensions are pixel (x-direction) and pixels (y-direction)
• Segmentation with Intelligent Scissors
[Mortenson et al. Siggraph 95] in Gimp, Adobe Photoshop, etc.
• Image Retargeting
• Image stitching also possible but rarely done
• Border matting
[Rother et. al. Siggraph ’04]Image Retargeting
E 𝒙 =
𝑖
𝜃𝑖 𝑥𝑖 +
𝑖,𝑗
𝜃𝑖𝑗 𝑥𝑖, 𝑥𝑗
Binary Label: 𝑥
𝑖∈ {0,1}
𝜃𝑖𝑗 𝑥𝑖, 𝑥𝑗
𝜃𝑖 𝑥𝑖
Force
label 0 Force
label 1
labeling (sketched)
Label 0 Label 1
Path from top to bottom In this case the problem can be
represented in 2 ways:
labeling or path finding
You can do that as an exercise, please see details in:
http://www.merl.com/reports/docs/TR2008-064.pdf