How to deal with…
non-submodular and higher-order energies (Part 1)
Carsten Rother
27/06/2014 Machine Learning 2
Advertisement
• Optimization and Learning in discrete-domain models
(CRFs, Higher-order models, continuous label space, loss based learning, etc)
Theoretical Side:
• Scene recovery from multiple images
• 3D Scene understanding
• Bio Imaging Application Side:
Main Research Theme:
• Combining physics-based vision with machine learning:
Generative models meet discriminative models
State-of-the art CRF models
27/06/2014 3
y
iEnergy:
𝐸 𝒚, 𝒙, 𝒘 =𝐹
𝐸𝐹 𝑦𝐹, 𝒙, 𝑤𝐹
Factors graph:
Factor graph - compact:
𝑝 𝒚|𝒙, 𝒘 = 1
𝑍 𝒙, 𝒘 𝒆
−𝐸 𝒚,𝒙,𝒘Gibbs distribution:
Machine Learning 2
Deconvolution
Input x = K*y
Output
Combine physics and machine learning:
1) Using physics:
Add Gaussian “likelihood” (x-K*y)
22) Put into deep learning appraoch
x RTF1 y
1RTF2 y
2…
(Stacked RTFs)
y
[Schmidt, Rother, Nowozin, Jancsary, Roth, CVPR 2013.
Scene recovery from multiple images
27/06/2014 5
2 RGBD Input
Machine Learning 2
Scene recovery from single images
BioImaging
27/06/2014 7
Joint work with Myers group (Dagmar, Florian, and others)
Atlas
Instance
Machine Learning 2
3D Scene Understanding
• Training time: 3D objects
• Test time:
Advertisement
27/06/2014 9
• If you are excited about any these topics … come to us for a
“forschungspraktikum”, master thesis, diploma thesis, etc
• If you want to collaborate with top industry labs or university
… come to us. Examples:
• BMW, Adobe, Microsoft Research, Daimler, etc.
• Top universities: in Israel, Oxford, Heidelberg, etc.
Machine Learning 2
Advertisement
Smart 3D point cloud processing:
- 3D fine-grained recognition: type of aircraft, vehicle, objects,…
- Tracking: 3D models with varying degree of information - Structured data: how to define a CRF/RTF?
- Combine physics based vision (generative models) with machine learning
There is an opening for a master project / PhD student – if you are interested talk to me after lecture!
Joint project with “Institut für Luftfahrt und Logistik“
Lidar scanner
Reminder: Pairwise energies
27/06/2014 11
𝐸 𝑥 =
𝑖∈𝑉
𝜃𝑖(𝑥𝑖) +
𝑖,𝑗 ∈𝐸
𝜃𝑖𝑗(𝑥𝑖, 𝑥𝑗) + 𝜃𝑐𝑜𝑛𝑠𝑡
𝐺 = (𝑉, 𝐸) undirected graph
For now, 𝑥 ∈ {0,1}
Visualization of the full energy:
Submodular Condition:
𝜃𝑖𝑗(0,1) 𝑥𝑗 = 0
𝑥𝑖 = 0
𝜃𝑖𝑗 0,0 + 𝜃𝑖𝑗 1,1 ≤ 𝜃𝑖𝑗 1,0 + 𝜃𝑖𝑗 0,1
𝜃𝑖𝑗 (0,0)
𝜃𝑖𝑗(1,1) 𝜃𝑖𝑗 (1,0)
• If all terms are submodular then global optimum can be computed in polynomial time with graph cut
• If not…this lecture
𝜃𝑖 (0) 𝜃𝑖 (1)
𝑥𝑗 = 1
𝑥𝑖 = 1 𝑥𝑖 = 0
𝑥𝑖 = 1
Machine Learning 2
𝜃𝑖𝑗 (0,0) also sometimes written as: 𝜃𝑖𝑗;00
How often do we have submodular terms?
Label smoothness is often the natural condition:
In alpha expansion (reminder later) energy is often “naturally” submodular:
Neigboring pixels have more often than not the same label. We may choose:
𝜃𝑖𝑗 0,0 =𝜃𝑖𝑗 1,1 = 0; 𝜃𝑖𝑗 1,0 =𝜃𝑖𝑗 0,1 ≥ 0
𝜃𝑖𝑗 0,0 + 𝜃𝑖𝑗 1,1 ≤ 𝜃𝑖𝑗 1,0 + 𝜃𝑖𝑗 0,1
Image – left(a) Image – right(b) labelling
|𝑥𝑖 − 𝑥𝑗| 𝑐𝑜𝑠𝑡
Importance of good optimization
27/06/2014 13
[Data courtesy from Oliver Woodford]
Problem: Minimize a binary 4-connected energy (non-submodular) (choose a colour-mode at each pixel)
Input: Image sequence
Output: New view
Machine Learning 2
Importance of good optimization
Belief Propagation ICM, Simulated Annealing
Ground Truth
QPBOP
[Boros ’06, see Rother ‘07]
Global Minimum Graph Cut with truncation
[Rother et al ‘05]
QPBO [Hammer ‘84]
(black unknown)
Most simple idea to deal with non-submodular terms
• Truncate all non-submodular terms:
27/06/2014 Machine Learning 2: QPBO and Dual-Decomposition 15
𝜃𝑖𝑗 0,0 + 𝜃𝑖𝑗 1,1 > 𝜃𝑖𝑗 1,0 + 𝜃𝑖𝑗 0,1
𝜃𝑖𝑗 0,0 − 𝛿 + 𝜃𝑖𝑗 1,1 − 𝛿 = 𝜃𝑖𝑗 1,0 + 𝛿 + 𝜃𝑖𝑗 0,1 + 𝛿
𝛿 = 1
4[𝜃𝑖𝑗 0,0 + 𝜃𝑖𝑗 1,1 − 𝜃𝑖𝑗 1,0 − 𝜃𝑖𝑗 0,1 ] Better techniques to come…
How often do we have non-submodular terms?
• Learning (unconstraint parameters)
Graph connectivity: 64
MRF DTF
Red: non-submodular blue: submodular
Training Data Test Data
Texture Denoising
27/06/2014 17
Test image Test image (60% Noise)
Training images
Result MRF 9-connected
(7 attractive; 2 repulsive)
Result MRF 4-connected Result MRF
4-connected (neighbours)
Machine Learning 2
How often do we have non-submodular terms?
Deconvolution:
Hand-crafted scenarios:
Many more examples later: Diagram recognition, fusion move, etc.
Input Image User Input Global optimum
Reparametrization
27/06/2014 19
Two reparametrizations we need:
+𝛿
+𝛿
𝜃𝑐𝑜𝑛𝑠𝑡 − 𝛿
Pairwise transform
unary transform
[Minimizing non-submodular energies with graph cut, Kolmogorov, Rother, PAMI 2007]
Machine Learning 2
Put energies into “normal form”
1) Apply all pairwise transformations until For all pairs of incoming edges it is:
min 𝜃𝑝𝑞0𝑗, 𝜃𝑝𝑞1𝑗 = 0 for all directed edges p->q and all 𝑗 ∈ 0,1 2) Apply all unary transform until:
min 𝜃𝑝0, 𝜃𝑝1 = 0 for all p
Construct the graph
27/06/2014 Machine Learning 2 21
Minimum Cut through the graph gives the solution 𝑥∗ = 𝑎𝑟𝑔𝑚𝑖𝑛 𝐸(𝑥)
Construct the graph
Minimum Cut through the graph gives the solution 𝑥∗ = 𝑎𝑟𝑔𝑚𝑖𝑛 𝐸(𝑥)
QPBO method
27/06/2014 Machine Learning 2 23
[Hammer et al. ’84, Boros et al ’91; see Kolmogorov, Rother ‘07]
• Double number of variables:
•
• is submodular!
• Construct graph and solve with graph cut:
less than double the runtime for graph cut
• Method is called QPBO: Quadratic Peusdo Boolean Optimization (not good name)
) , (
) , (
) ( })
({
q p pq
q p pq
p p p
x x E
x x E
x E x
E
p p
p x x
x ,
unary
pairwise submodular
pairwise non-submodular
2
) , 1
( )
1 , (
2
) 1
, 1
( )
, (
2
) 1
( )
}) ( {
}, ({
'
q p pq
q p
pq
q p
pq q
p pq
p p
p p p
p
x x E
x x
E
x x
E x
x E
x E
x x E
x E
(non-sub.) (sub.)
Read out the solution
• Assign labels based on minimum cut in auxiliary graph:
0
;
1
pp
x
x x
p 1
1
;
0
pp
x
x x
p 0
0
;
0
pp
x
x x
p ?
1
;
1
pp
x
x x
p ?
Properties
27/06/2014 25
• Autarky(Persistency) Property:
• Partial Optimality: labeled pixels in belong to a global minimum
• Labeled nodes have the same result as LP relaxation of the problem E (but QPBO is a very fast solver)
[Hammer et al ’84, Schlesinger ‘76, Werner ’07, Kolmogorov, Wainright ’05; Kolmogorov ’06]
0 0 0 0
0 0 0 0
0 0 0 0
1 1 ? ?
1 1 ? ?
1 1 ? ?
1 1 0 0
1 1 0 0
1 1 0 0
x (partial) y (any complete) z = FUSE(x,y)
Global optimum
x
Machine Learning 2
When do we get all nodes labeled?
• function is submodular
• t
• If there exist a flipping that makes the energy fully submodular, then QPBO will find it
• We can be simply “lucky”
• What to do with unlabelled nodes: run some other method
(e.g. BP)
Extension: QPBOP (“P” standard for “Probing”)
27/06/2014 27
0 ? ? ? ? ?
r
p q s t
0 0 0 ? ? 0 0 1 0 ?
• for a global minimum remove node from energy
• remove node from energy
• add directed link
• Why did QPBO not find this solution?
Enforce integer constraint on (tighter relaxation)
r
p q s t
0 x
qr
p
x
x
x
qx
r sp
x
x ,
QPBO:
Probe Node p:
0 1
p
r
p q s t
Machine Learning 2
Two extensions: QPBOP, QPBOI
1. Run QPBO - gives set of unlabeled nodes U
2. Probe a p U
3. Simplify energy: Remove nodes and add links
4. Run QPBO, update U
5. Stop if energy stays for all p U otherwise go to 2.
Properties: - New energy preserves global optimality
and (sometimes) gives the global minimum
- Order may effect result
QPBO versus QPBOP
27/06/2014 29
QPBOP
Global Minimum (0.4sec) QPBO
73% unlabeled (0.08sec)
Machine Learning 2
Extension: QPBOI (“I” standard for “Improve”)
• Property: [persistency property]
0 ? ? ?
? ? ? ?
? ? ? ?
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 1
0 0 1 0
0 0 0 0
x (partial)
y (e.g. from BP) y’ = FUSE(x,y)
0 0 ? ?
0 0 ? ?
? ? ? ?
0 0 0 1
0 0 1 ?
? ? ? ?
Extension: QPBOI (“I” standard for “Improve”)
27/06/2014 31
0 ? ? ?
? ? ? ?
? ? ? ?
• Property: [autarky property]
• QPBOI-algorithm: choose sequence of nested sets
• QPBO-stable: No set changes labelling - sometimes global minima
0 0 0 1
0 0 1 0
0 0 0 0
0 0 0 1
0 0 1 0
0 0 1 1
x (partial)
y’ y’’ = FUSE(x,y’)
0 0 0 1
0 0 1 ?
0 0 1 1
Machine Learning 2
Results
Three important factors:
• Degree of non-submodularity (NS)
• Unary strength
• Connectivity (av. degree of a node)
Results – Diagram Recognition
27/06/2014 33
Ground truth
GrapCut E= 119 (0 sec) ICM E=999 (0 sec) BP E=25 (0 sec)
QPBO: 56.3% unlabeled (0 sec) QPBOP (0sec) - Global Min.
P+BP+I, BP+I E=0 (0sec) Sim. Ann. E=0 (0.28sec)
•
2700 test cases: QPBOP solved all
Machine Learning 2
Results - Deconvolution
Ground Truth Input QPBO 45% unlab. (red) (0.01sec)
ICM E=14 (0sec)
QPBO-C 43% unlab. (red) (0.4sec) GC E=999 (0sec)
C+BP+I, Sim. Ann. E=0 (0.4sec)
BP E=5 (0.5sec) BP+I E=3.6 (1sec)
Move on to multi-label
• Let’s apply QPBO(P/I) methods to multi-label problems
• In particular alpha expansion
27/06/2014 Machine Learning 2: QPBO and Dual-Decomposition 35
Reminder: Alpha expansion
Sky House Tree Ground
Initialize with Tree Status: Expand Ground Expand House Expand Sky
• Variables take label a or retain current label
37 θij (xa,xb) = 0 iff xa=xb
Examples: Potts model, Truncated linear(not truncated quadratic)
[Boykov , Veksler and Zabih 2001]
Other moves strategies: alpha-beta swap, range move, etc.
θij (xa,xb) + θij (xb,xc) ≥ θij (xa,xc) θij (xa,xb) = θij(xb,xa) ≥ 0
• Given the original energy 𝐸(𝑥)
• At each step we have two solutions: 𝒙𝟎, 𝒙𝟏
• Define the (variable-wise) combination: 𝑥𝑖01 = (1 − 𝑥𝑖′) 𝑥𝑖0 + 𝑥𝑖′ 𝑥𝑖1 (where 𝒙′ ∈ {0,1} is selection variable)
• Construct a new energy 𝐸′ such that 𝐸’(𝒙’) = 𝐸(𝒙𝟎𝟏)
• The move energy 𝐸’(𝒙’) is submodular if:
Reminder: Alpha expansion
𝑥0 𝑥1
Machine Learning 2
Reminder: Alpha Expansion
• What to do if non-submodular?
• Run QPBO
• For unlabeled pixels:
• choose solution (𝑥
0or 𝑥
1) that has lower energy 𝐸
• Replace unlabeled nodes with chosen solution
• Guarantees that new solution has equal or better energy
than both 𝐸 𝑥
0and 𝐸 𝑥
1(see Persistency property)
Fusion Move
27/06/2014 39
• Given the original energy 𝐸(𝑥)
• At each step we have two arbitrary solutions: 𝑥
0, 𝑥
1• Define the (variable wise) combination:
𝑥
𝑖01= (1 − 𝑥
𝑖′) .∗ 𝑥
𝑖0+ 𝑥
𝑖′.∗ 𝑥
𝑖1(where 𝑥′ ∈ {0,1} is selection variable)
• Construct a new energy 𝐸′ such that 𝐸’(𝑥’) = 𝐸(𝑥
01)
• Run QPBO an fix unlabeled nodes as above
• Comment, in practice often submodular if both solutions are good (since energy prefers neighboring node to be similar)
𝑥0 𝑥1
Machine Learning 2
Fusion move to make alpha expansion parallel
• One processor needs 7 sequential alpha expansions for 8 labels:
1,2,3,4,5,6,7,8
• Four processors need only 3 sequential steps (still 7 alpha expansions):
∎(1-2) ∎(3-4) ∎(5-6) ∎(7-8)
p1 p2 p3 p4
∎(1-4) ∎(5-8)
∎(1-8)
∎ means fusion
Fusion move for continuous label-spaces
27/06/2014 41
Local gradient cost: 𝑥𝑖 − 𝑥𝑖+1
Victor Lempitsky, Stefan Roth, and Carsten Rother, Fusion Flow:Discrete- Continuous Optimization for Optical Flow Estimation, CVPR 2008
Machine Learning 2
FusionFlow - comparisons
LogCut – Dealing efficiently with large label spaces
27/06/2014 43
Victor Lempitsky, Carsten Rother, and Andrew Blake, LogCut- Efficient Graph Cut Optimization for Markov Random Fields, in ICCV, 2007
Machine Learning 2
Optical flow:
1024 discrete labels
Ground truth
Log Cut – basic idea
q p
q p pq p
p
p
x E x x
E E
,
) , ( )
( )
(x
withx
p [ 0 , K ]
• Encode label space 𝐾 (e.g. 𝐾=64) with log 𝐾 (e.g. 6 bits):
Example: 44 = 101100
• Alpha Expansion: we need 𝐾-1 binary decision to get a labeling out
We only need log 𝐾(here 6) binary decision to get a labeling out
Example stereo matching
27/06/2014 45
Stereo (Tsukuba) - 16 Labels:
Bit 4: Bit 3: Bit 2: Bit 1
0xxx 00xx 001x 0010
Machine Learning 2
0-7 versus 8-15 0-3 versus 4-7 0-1 versus 2-3 2 versus 3
How to choose the energy?
)) (
( min )
0 (
'
[0,3] p pp x
E x
E
p
q p
q p pq p
p
p x E x x
E E
,
) ' , ' ( ' )
' ( ' )
' (
' x with
x '
p { 0 , 1 }
e.g. bit 3:
Unary:
E’ lower bound of E (tight if no pairwise terms)
] 3 , 0
[
xp xp[4,7]
How to choose the energy?
27/06/2014 47
Pairwise:
0 0
0 0
0 0
0 0 ]
, ) (
min[
) ,
, (x x a x x b
Ep q p q p q p
Approximations:
1. Choose One 2. Min
3. Mean
4. Weighted Mean 5. Training
) ' , ' (
'p,q x p xq E
) 0 , 0
,q( Ep
) 0 , 0 ( 'p,q E
1 3
3 3
b E
|
|xp xq
1 1 1
2 2 2
2 2
2 1
1 1 3
3 3
3 3 3 3
3 3 3 3 3
3 3 3 3
3 3 3 3
2 3 3 3
1 2 3 3 1 1 1
2 2 3
2 2 1
1 1 3
Machine Learning 2
Comparison
Image Restoration (2 different models):
One Min Mean weight
Mean TrainingaExp One Min Mean weight
Mean TrainingaExp
LogCut
27/06/2014 49
Iterative LogCut:
1. One Sweep – log(K) optimizations 2. Shift Labels
3. One Sweep – log(K) optimizations 4. Fuse with current solution
5. Go to 2.
Ener gy
no shift
½ shift
full shift
Labels:
1,2,3,4,5,6,7,8 Shift by 3:
6,7,8,1,2,3,4,5
Machine Learning 2
Results
Speed-up factor: 20.7
LogCut (2 iter); 8sec E=8767 LogCut (64 iter); 150sec E=8469
AExp (6 iter); 390sec E=8773 Ground Truth
Training Test
Results
27/06/2014 51
Train
(out of 10)
Test
(out of 10)
LogCut 1.5sec
Effic. BP 2.1sec
AExp 4.7sec
TRW 90sec
Machine Learning 2