How to deal with…

(1)

How to deal with…

non-submodular and higher-order energies (Part 1)

Carsten Rother

27/06/2014 Machine Learning 2

(2)

• Optimization and Learning in discrete-domain models

(CRFs, Higher-order models, continuous label space, loss based learning, etc)

Theoretical Side:

• Scene recovery from multiple images

• 3D Scene understanding

• Bio Imaging Application Side:

Main Research Theme:

• Combining physics-based vision with machine learning:

Generative models meet discriminative models

(3)

State-of-the art CRF models

27/06/2014 3

y

_i

Energy:

𝐸 𝒚, 𝒙, 𝒘 =

𝐹

𝐸_𝐹 𝑦_𝐹, 𝒙, 𝑤_𝐹

Factors graph:

Factor graph - compact:

𝑝 𝒚|𝒙, 𝒘 = 1

𝑍 𝒙, 𝒘 𝒆

^{−𝐸 𝒚,𝒙,𝒘}

Gibbs distribution:

Machine Learning 2

(4)

Deconvolution

Input x = K*y

Output

Combine physics and machine learning:

1) Using physics:

Add Gaussian “likelihood” (x-K*y)

²

2) Put into deep learning appraoch

x RTF1 y

₁

RTF2 y

₂

…

(Stacked RTFs)

y

[Schmidt, Rother, Nowozin, Jancsary, Roth, CVPR 2013.

(5)

Scene recovery from multiple images

27/06/2014 5

2 RGBD Input

Machine Learning 2

(6)

Scene recovery from single images

(7)

BioImaging

27/06/2014 7

Joint work with Myers group (Dagmar, Florian, and others)

Atlas

Instance

Machine Learning 2

(8)

3D Scene Understanding

• Training time: 3D objects

• Test time:

(9)

• If you are excited about any these topics … come to us for a

“forschungspraktikum”, master thesis, diploma thesis, etc

• If you want to collaborate with top industry labs or university

… come to us. Examples:

• BMW, Adobe, Microsoft Research, Daimler, etc.

• Top universities: in Israel, Oxford, Heidelberg, etc.

Machine Learning 2

(10)

Reminder: Pairwise energies

27/06/2014 11

𝐸 𝑥 =

𝑖∈𝑉

𝜃_𝑖(𝑥_𝑖) +

𝑖,𝑗 ∈𝐸

𝜃_𝑖𝑗(𝑥_𝑖, 𝑥_𝑗) + 𝜃_{𝑐𝑜𝑛𝑠𝑡}

𝐺 = (𝑉, 𝐸) undirected graph

For now, 𝑥 ∈ {0,1}

Visualization of the full energy:

Submodular Condition:

𝜃_𝑖𝑗(0,1) 𝑥_𝑗 = 0

𝑥_𝑖 = 0

𝜃_𝑖𝑗 0,0 + 𝜃_𝑖𝑗 1,1 ≤ 𝜃_𝑖𝑗 1,0 + 𝜃_𝑖𝑗 0,1

𝜃_𝑖𝑗 (0,0)

𝜃_𝑖𝑗(1,1) 𝜃_𝑖𝑗 (1,0)

• If all terms are submodular then global optimum can be computed in polynomial time with graph cut

• If not…this lecture

𝜃_𝑖 (0) 𝜃_𝑖 (1)

𝑥_𝑗 = 1

𝑥_𝑖 = 1 𝑥_𝑖 = 0

𝑥_𝑖 = 1

Machine Learning 2

𝜃_𝑖𝑗 (0,0) also sometimes written as: 𝜃_𝑖𝑗;00

(12)

How often do we have submodular terms?

Label smoothness is often the natural condition:

In alpha expansion (reminder later) energy is often “naturally” submodular:

Neigboring pixels have more often than not the same label. We may choose:

𝜃_𝑖𝑗 0,0 =𝜃_𝑖𝑗 1,1 = 0; 𝜃_𝑖𝑗 1,0 =𝜃_𝑖𝑗 0,1 ≥ 0

𝜃_𝑖𝑗 0,0 + 𝜃_𝑖𝑗 1,1 ≤ 𝜃_𝑖𝑗 1,0 + 𝜃_𝑖𝑗 0,1

Image – left(a) Image – right(b) labelling

|𝑥_𝑖 − 𝑥_𝑗| 𝑐𝑜𝑠𝑡

(13)

Importance of good optimization

27/06/2014 13

[Data courtesy from Oliver Woodford]

Problem: Minimize a binary 4-connected energy (non-submodular) (choose a colour-mode at each pixel)

Input: Image sequence

Output: New view

Machine Learning 2

(14)

Importance of good optimization

Belief Propagation ICM, Simulated Annealing

Ground Truth

QPBOP

[Boros ’06, see Rother ‘07]

Global Minimum Graph Cut with truncation

[Rother et al ‘05]

QPBO [Hammer ‘84]

(black unknown)

(15)

Most simple idea to deal with non-submodular terms

• Truncate all non-submodular terms:

27/06/2014 Machine Learning 2: QPBO and Dual-Decomposition 15

𝜃_𝑖𝑗 0,0 + 𝜃_𝑖𝑗 1,1 > 𝜃_𝑖𝑗 1,0 + 𝜃_𝑖𝑗 0,1

𝜃_𝑖𝑗 0,0 − 𝛿 + 𝜃_𝑖𝑗 1,1 − 𝛿 = 𝜃_𝑖𝑗 1,0 + 𝛿 + 𝜃_𝑖𝑗 0,1 + 𝛿

𝛿 = 1

4[𝜃_𝑖𝑗 0,0 + 𝜃_𝑖𝑗 1,1 − 𝜃_𝑖𝑗 1,0 − 𝜃_𝑖𝑗 0,1 ] Better techniques to come…

(16)

How often do we have non-submodular terms?

• Learning (unconstraint parameters)

Graph connectivity: 64

MRF DTF

Red: non-submodular blue: submodular

Training Data Test Data

(17)

Texture Denoising

27/06/2014 17

Test image Test image (60% Noise)

Training images

Result MRF 9-connected

(7 attractive; 2 repulsive)

Result MRF 4-connected Result MRF

4-connected (neighbours)

Machine Learning 2

(18)

How often do we have non-submodular terms?

Deconvolution:

Hand-crafted scenarios:

Many more examples later: Diagram recognition, fusion move, etc.

Input Image User Input Global optimum

(19)

Reparametrization

27/06/2014 19

Two reparametrizations we need:

+𝛿

𝜃_{𝑐𝑜𝑛𝑠𝑡} − 𝛿

Pairwise transform

unary transform

[Minimizing non-submodular energies with graph cut, Kolmogorov, Rother, PAMI 2007]

Machine Learning 2

(20)

Put energies into “normal form”

1) Apply all pairwise transformations until For all pairs of incoming edges it is:

min 𝜃_{𝑝𝑞0𝑗}, 𝜃_{𝑝𝑞1𝑗} = 0 for all directed edges p->q and all 𝑗 ∈ 0,1 2) Apply all unary transform until:

min 𝜃_𝑝0, 𝜃_𝑝1 = 0 for all p

(21)

Construct the graph

27/06/2014 Machine Learning 2 21

Minimum Cut through the graph gives the solution 𝑥^∗ = 𝑎𝑟𝑔𝑚𝑖𝑛 𝐸(𝑥)

(22)

Construct the graph

Minimum Cut through the graph gives the solution 𝑥^∗ = 𝑎𝑟𝑔𝑚𝑖𝑛 𝐸(𝑥)

(23)

QPBO method

27/06/2014 Machine Learning 2 23

[Hammer et al. ’84, Boros et al ’91; see Kolmogorov, Rother ‘07]

• Double number of variables:

• • is submodular!

• Construct graph and solve with graph cut:

less than double the runtime for graph cut

• Method is called QPBO: Quadratic Peusdo Boolean Optimization (not good name)







) , (

) ( })

({

q p pq

p p p

x x E

x E x

E

p p

p x x

x  ,

unary

pairwise submodular

pairwise non-submodular







 



 



 

2 ) , 1

( )

1 , (

2 ) 1

, 1

( )

, (

2 ) 1

( )

}) ( {

}, ({

'

q p pq

q p

pq

q p

pq q

p pq

p p

p p p

p

x x E

x x

E

x x

E x

x E

x x E

x E

(non-sub.) (sub.)

(24)

Read out the solution

• Assign labels based on minimum cut in auxiliary graph:

0 ;

1 



_p

p

x

x x

_p

 1

1 ;

0 



_p

p

x

x x

_p

 0

0 ;

0 



_p

p

x

x x

_p

 ?

1 ;

1 



_p

p

x

x x

_p

 ?

(25)

Properties

27/06/2014 25

• Autarky(Persistency) Property:

• Partial Optimality: labeled pixels in belong to a global minimum

• Labeled nodes have the same result as LP relaxation of the problem E (but QPBO is a very fast solver)

[Hammer et al ’84, Schlesinger ‘76, Werner ’07, Kolmogorov, Wainright ’05; Kolmogorov ’06]

0 0 0 0

1 1 ? ?

1 1 0 0

x (partial) y (any complete) z = FUSE(x,y)

Global optimum

x

Machine Learning 2

(26)

When do we get all nodes labeled?

• function is submodular

• t

• If there exist a flipping that makes the energy fully submodular, then QPBO will find it

• We can be simply “lucky”

• What to do with unlabelled nodes: run some other method

(e.g. BP)

(27)

Extension: QPBOP (“P” standard for “Probing”)

27/06/2014 27

0 ? ? ? ? ?

r

p q s t

0 0 0 ? ? 0 0 1 0 ?

• for a global minimum remove node from energy

• remove node from energy

• add directed link

• Why did QPBO not find this solution?

Enforce integer constraint on (tighter relaxation)

r

p q s t

 0 x

q

r

p

x

x 

x

q

x

r s

p

x

x ,

QPBO:

Probe Node p:

0 1

p

r

p q s t

Machine Learning 2

(28)

Two extensions: QPBOP, QPBOI

1. Run QPBO - gives set of unlabeled nodes U

2. Probe a p U

3. Simplify energy: Remove nodes and add links

4. Run QPBO, update U

5. Stop if energy stays for all p U otherwise go to 2.

Properties: - New energy preserves global optimality

and (sometimes) gives the global minimum

- Order may effect result

(29)

QPBO versus QPBOP

27/06/2014 29

QPBOP

Global Minimum (0.4sec) QPBO

73% unlabeled (0.08sec)

Machine Learning 2

(30)

Extension: QPBOI (“I” standard for “Improve”)

• Property: [persistency property]

0 ? ? ?

? ? ? ?

0 0 0 0

0 0 0 1

0 0 1 0

0 0 0 0

x (partial)

y (e.g. from BP) y’ = FUSE(x,y)

0 0 ? ?

? ? ? ?

0 0 0 1

0 0 1 ?

? ? ? ?

(31)

Extension: QPBOI (“I” standard for “Improve”)

27/06/2014 31

0 ? ? ?

? ? ? ?

• Property: [autarky property]

• QPBOI-algorithm: choose sequence of nested sets

• QPBO-stable: No set changes labelling - sometimes global minima

0 0 0 1

0 0 1 0

0 0 0 0

0 0 0 1

0 0 1 0

0 0 1 1

x (partial)

y’ y’’ = FUSE(x,y’)

0 0 0 1

0 0 1 ?

0 0 1 1

Machine Learning 2

(32)

Results

Three important factors:

• Degree of non-submodularity (NS)

• Unary strength

• Connectivity (av. degree of a node)

(33)

Results – Diagram Recognition

27/06/2014 33

Ground truth

GrapCut E= 119 (0 sec) ICM E=999 (0 sec) BP E=25 (0 sec)

QPBO: 56.3% unlabeled (0 sec) QPBOP (0sec) - Global Min.

P+BP+I, BP+I E=0 (0sec) Sim. Ann. E=0 (0.28sec)

•

2700 test cases: QPBOP solved all

Machine Learning 2

(34)

Results - Deconvolution

Ground Truth Input QPBO 45% unlab. (red) (0.01sec)

ICM E=14 (0sec)

QPBO-C 43% unlab. (red) (0.4sec) GC E=999 (0sec)

C+BP+I, Sim. Ann. E=0 (0.4sec)

BP E=5 (0.5sec) BP+I E=3.6 (1sec)

(35)

Move on to multi-label

• Let’s apply QPBO(P/I) methods to multi-label problems

• In particular alpha expansion

27/06/2014 Machine Learning 2: QPBO and Dual-Decomposition 35

(36)

Reminder: Alpha expansion

Sky House Tree Ground

Initialize with Tree Status: Expand Ground Expand House Expand Sky

• Variables take label a or retain current label

(37)

37 θ_ij (x_a,x_b) = 0 iff x_a=x_b

Examples: Potts model, Truncated linear(not truncated quadratic)

[Boykov , Veksler and Zabih 2001]

Other moves strategies: alpha-beta swap, range move, etc.

θ_ij (x_a,x_b) + θ_ij (x_b,x_c) ≥ θ_ij (x_a,x_c) θ_ij (x_a,x_b) = θ_ij(x_b,x_a) ≥ 0

• Given the original energy 𝐸(𝑥)

• At each step we have two solutions: 𝒙^𝟎, 𝒙^𝟏

• Define the (variable-wise) combination: 𝑥_𝑖⁰¹ = (1 − 𝑥_𝑖^′) 𝑥_𝑖⁰ + 𝑥_𝑖^′ 𝑥_𝑖¹ (where 𝒙′ ∈ {0,1} is selection variable)

• Construct a new energy 𝐸′ such that 𝐸’(𝒙’) = 𝐸(𝒙^𝟎𝟏)

• The move energy 𝐸’(𝒙’) is submodular if:

Reminder: Alpha expansion

𝑥⁰ 𝑥¹

Machine Learning 2

(38)

Reminder: Alpha Expansion

• What to do if non-submodular?

• Run QPBO

• For unlabeled pixels:

• choose solution (𝑥

⁰

or 𝑥

¹

) that has lower energy 𝐸

• Replace unlabeled nodes with chosen solution

• Guarantees that new solution has equal or better energy

than both 𝐸 𝑥

⁰

and 𝐸 𝑥

¹

(see Persistency property)

(39)

Fusion Move

27/06/2014 39

• Given the original energy 𝐸(𝑥)

• At each step we have two arbitrary solutions: 𝑥

⁰

, 𝑥

¹

• Define the (variable wise) combination:

𝑥

_𝑖⁰¹

= (1 − 𝑥

_𝑖^′

) .∗ 𝑥

_𝑖⁰

+ 𝑥

_𝑖^′

.∗ 𝑥

_𝑖¹

(where 𝑥′ ∈ {0,1} is selection variable)

• Construct a new energy 𝐸′ such that 𝐸’(𝑥’) = 𝐸(𝑥

⁰¹

)

• Run QPBO an fix unlabeled nodes as above

• Comment, in practice often submodular if both solutions are good (since energy prefers neighboring node to be similar)

𝑥⁰ 𝑥¹

Machine Learning 2

(40)

Fusion move to make alpha expansion parallel

• One processor needs 7 sequential alpha expansions for 8 labels:

1,2,3,4,5,6,7,8

• Four processors need only 3 sequential steps (still 7 alpha expansions):

∎(1-2) ∎(3-4) ∎(5-6) ∎(7-8)

p1 p2 p3 p4

∎(1-4) ∎(5-8)

∎(1-8)

∎ means fusion

(41)

Fusion move for continuous label-spaces

27/06/2014 41

Local gradient cost: 𝑥_𝑖 − 𝑥_𝑖+1

Victor Lempitsky, Stefan Roth, and Carsten Rother, Fusion Flow:Discrete- Continuous Optimization for Optical Flow Estimation, CVPR 2008

Machine Learning 2

(42)

FusionFlow - comparisons

(43)

LogCut – Dealing efficiently with large label spaces

27/06/2014 43

Victor Lempitsky, Carsten Rother, and Andrew Blake, LogCut- Efficient Graph Cut Optimization for Markov Random Fields, in ICCV, 2007

Machine Learning 2

Optical flow:

1024 discrete labels

Ground truth

(44)

Log Cut – basic idea



 ^



q p

q p pq p

p

x E x x

E E

,

) , ( )

( )

(x

_with

x

_p

 [ 0 , K ]

• Encode label space 𝐾 (e.g. 𝐾=64) with log 𝐾 (e.g. 6 bits):

Example: 44 = 101100

• Alpha Expansion: we need 𝐾-1 binary decision to get a labeling out

We only need log 𝐾(here 6) binary decision to get a labeling out

(45)

Example stereo matching

27/06/2014 45

Stereo (Tsukuba) - 16 Labels:

Bit 4: Bit 3: Bit 2: Bit 1

0xxx 00xx 001x 0010

Machine Learning 2

0-7 versus 8-15 0-3 versus 4-7 0-1 versus 2-3 2 versus 3

(46)

How to choose the energy?

)) (

( min )

0 (

'

[0,3] p p

p x

E x

E

p





^



q p

q p pq p

p

p x E x x

E E

,

) ' , ' ( ' )

' ( ' )

' (

' x with

x '

_p

 { 0 , 1 }

e.g. bit 3:

Unary:

E’ lower bound of E (tight if no pairwise terms)

] 3 , 0

[

xp x_p[4,7]

(47)

How to choose the energy?

27/06/2014 47

Pairwise:

0 0

0 0 ]

, ) (

min[

) ,

, (x x a x x b

E_p _q _p _q  _p  _q ^p

Approximations:

1. Choose One 2. Min

3. Mean

4. Weighted Mean 5. Training

) ' , ' (

'_p_,_q x _p x_q E

) 0 , 0

,q( Ep

) 0 , 0 ( '_p_,q E

1 3

3 3

b E

|

|x_p x_q

1 1 1

2 2 2

2 2

2 1

1 1 3

3 3

3 3 3 3

3 3 3 3 3

3 3 3 3

2 3 3 3

1 2 3 3 1 1 1

2 2 3

2 2 1

1 1 3

Machine Learning 2

(48)

Comparison

Image Restoration (2 different models):

One Min Mean ^weight

Mean TrainingaExp One Min Mean ^weight

Mean TrainingaExp

(49)

LogCut

27/06/2014 49

Iterative LogCut:

1. One Sweep – log(K) optimizations 2. Shift Labels

3. One Sweep – log(K) optimizations 4. Fuse with current solution

5. Go to 2.

Ener gy

no shift

½ shift

full shift

Labels:

1,2,3,4,5,6,7,8 Shift by 3:

6,7,8,1,2,3,4,5

Machine Learning 2

(50)

Results

Speed-up factor: 20.7

LogCut (2 iter); 8sec E=8767 LogCut (64 iter); 150sec E=8469

AExp (6 iter); 390sec E=8773 Ground Truth

Training Test

(51)

Results

27/06/2014 51

Train

(out of 10)

Test

(out of 10)

LogCut 1.5sec

Effic. BP 2.1sec

AExp 4.7sec

TRW 90sec

Machine Learning 2

How to deal with…

How to deal with…

non-submodular and higher-order energies (Part 1)

Carsten Rother

Advertisement

• Optimization and Learning in discrete-domain models

(CRFs, Higher-order models, continuous label space, loss based learning, etc)

Theoretical Side:

• Scene recovery from multiple images

• 3D Scene understanding

• Bio Imaging Application Side:

Main Research Theme:

• Combining physics-based vision with machine learning:

Generative models meet discriminative models

State-of-the art CRF models

y

Energy:

Factors graph:

Factor graph - compact:

𝑝 𝒚|𝒙, 𝒘 = 1

𝑍 𝒙, 𝒘 𝒆

Gibbs distribution:

Deconvolution

Combine physics and machine learning:

1) Using physics:

Add Gaussian “likelihood” (x-K*y)

2) Put into deep learning appraoch

x RTF1 y

RTF2 y

…

(Stacked RTFs)

y

Scene recovery from multiple images

2 RGBD Input

Scene recovery from single images

BioImaging

Joint work with Myers group (Dagmar, Florian, and others)

3D Scene Understanding

• Training time: 3D objects

• Test time:

Advertisement

• If you are excited about any these topics … come to us for a

“forschungspraktikum”, master thesis, diploma thesis, etc

• If you want to collaborate with top industry labs or university

… come to us. Examples:

• BMW, Adobe, Microsoft Research, Daimler, etc.

• Top universities: in Israel, Oxford, Heidelberg, etc.

Advertisement

Reminder: Pairwise energies

How often do we have submodular terms?

Importance of good optimization

Problem: Minimize a binary 4-connected energy (non-submodular) (choose a colour-mode at each pixel)

Input: Image sequence

Output: New view

Importance of good optimization

Most simple idea to deal with non-submodular terms

• Truncate all non-submodular terms:

How often do we have non-submodular terms?

• Learning (unconstraint parameters)

Texture Denoising

How often do we have non-submodular terms?

Reparametrization

Put energies into “normal form”

Construct the graph

Construct the graph

QPBO method

• Double number of variables:

•

• is submodular!

• Construct graph and solve with graph cut:

less than double the runtime for graph cut

• Method is called QPBO: Quadratic Peusdo Boolean Optimization (not good name)













) , (

) , (