Machine Learning II Search Techniques

(1)

Machine Learning II Search Techniques

Dmitrij Schlesinger, Carsten Rother, Dagmar Kainmueller, Florian Jug

SS2014, 30.05.2014

(2)

Iterated Conditional Modes

y^∗ = arg min

y

X

i

ψ_i(y_i) +^X

ij

ψ_ij(y_i, y_j)

Idea: choose (locally) the label that leads to the best energy given the fixed rest [Besag, 1986]

Repeat until convergence for all i:

y_i = arg min

k

ψ_i(k) + ^X

j:ij∈E

ψ_ij(k, y_j)

+ Extremely simple, easy to parallelize

− "Coordinate-wise" optimization→ does not converge to the global minimum even for very simple energies Example: strong Ising model (Potts with K=2)

(3)

Iterated Conditional Modes

Extension: instead to fix all variables but one, fix a subset of variables so that the rest is easy to optimise (e.g. a chain or a tree). For images – e.g. row-wise/columl-wise optimization

→can be solved exact and efficient by Dynamic Programming

(4)

Search techniques – general idea

There is a "neighbourhood" for each labelling – a subset of labelling so that

a) it can be described "constructively"

b) the current labelling belongs to this subset c) the optimal labelling in the subset is easy to find

The algorithm is an iterative search for the best labelling in the neighbourhood of the actual one

– converges to a local optimum

(5)

Some examples

ICM: the neighbourhood of a labelling consists of all ones that differ only by the label in one node. Row-wise ICM: the labels for a nodes of a chain can vary. The rest is fix.

Application – stereo: y⁰ is the result of independent row-wise Dynamic Programming, followed by the row-wise ICM

(6)

α-expansion

The neighbourhood of a labelling – for allnodes restrict the label set[Boykov et al., 2001]

α-expansion: consider a label α, for each node consider two labels (at most) – the actual one andα

the auxiliary task is abinary MinSum problem– can be solved by MinCut under circumstances

This is repeated for allα-s until convergence

(7)

α-expansion

In which cases the auxiliary tasks can be solved exactly?

Sufficient: if the pairwise functions ψ_ij areMetrices, i.e.

a) ψ(k, k) = 0

b) ψ(k, k⁰) = ψ(k⁰, k)≥0

c) ψ(k, k⁰)≤ψ(k, k⁰⁰) +ψ(k⁰⁰, k⁰) Then the auxiliary tasks aresubmodular:

ψ(α, α) +ψ(β⁰, β⁰⁰) =

= 0 +ψ(β⁰, β⁰⁰)≤ψ(β⁰, α) +ψ(α, β⁰⁰) Examples:

– the Potts Model ψ(k, k⁰) = δ(k 6=k⁰) – segmentation – linear metric ψ(k, k⁰) =|k−k⁰|– stereo

– "truncated" metrices e.g. ψ(k, k⁰) = min(|k−k⁰|, C)

(8)

αβ-swap

Consider a label pairα, β, in each node

– if the current label is α or β, only α and β are allowed, – otherwise, only the current label is allowed.

→each node can swap from α to β and back

the auxiliary task is a binary MinSum problem – solvable by MinCut, if e.g.ψ(k, k) = 0and ψ(k, k⁰6=k)≥0(Semimetric) This is repeated for all pairs α and β until convergence

(9)

A comparison

For ann×n grig as the graph, K labels, random labeling

#l – the number of labelings in the neighbourhood

#n – the number of neighbourhoods

ICM ICM+ α-exp. αβ-swap

#l K Kⁿ 2ⁿ

2·(K−1)

K 2ⁿ

2·2 K

#n n² 2·n K ^K(K−1)₂

applicableψ arbitrary arbitrary metric semimetric

exact for never chain K=2 (?) K=2

– very easy to parallelise

– can be freely combined with each other

(10)

Range-moves (idea) [Kumar, Veksler, Torr, 2009]

Consider a "truncated convex"ψ(k, k⁰) = min(f(k−k⁰), C) with a convexf(·), e.g. the truncated linear metric

ψ(k, k⁰) = min(|k−k⁰|, C)

Consider an "interval" of labels [k^min_i , k_i^max] in each node so, that there is no ψ-values greater thanC (jumps):

f(k−k⁰)≤C ∀k ∈[k_i^min, k^max_i ], k⁰ ∈[k^min_j , k_j^max], ij ∈ E

Inside such a "corridor" the task is submodular !!!

A possible strategy – construct such a corridor somehow (not unique indeed) and solve the auxiliary task, iterate this

Note: if the initial labeling does not contain jumps, they never occur→ once a while perform usual α-expansion or αβ-swap in order to introduce jumps, if there is a jump for an edgeij in the current labeling, fix both y_i and y_j

(11)

Search techniques for submodular problems

In principle, submodular problems can be solved exactly without any search. However ...

Consider a stereo-reconstruction for an 8MPixel stereo-pair with about 100 disparity levels →8000000×100² = 8·10¹⁰ edges in the resulting MinCut. In order to use standard Flow-algorithms, two "float"-numbers per edge are needed→ out of memory.

Let a submodular problem is given. Consider an arbitrary label subset in each node, that are "allowed" for moves. The

resulting multi-label MinSum problem is submodular !!!

A possible strategy – do not consider the whole label set just from the very beginnig, take e.g. only each4-th label, find the solution, refine etc.