• Keine Ergebnisse gefunden

Machine Learning II LP-relaxation

N/A
N/A
Protected

Academic year: 2022

Aktie "Machine Learning II LP-relaxation"

Copied!
22
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Machine Learning II LP-relaxation

Dmitrij Schlesinger, Carsten Rother, Dagmar Kainmueller, Florian Jug

SS2014, 20.06.2014

(2)

Seeming quality

Consider the following ”stupid” algorithm.

Instead to optimize the original MinSum problem E(A) = min

y

X

i

ψi(yi) +X

ij

ψij(yi, yj)

proceed as follows:

1. Choose for each node and for each edge the best label(s) and the best label pair(s) respectively according to ψ 2. Sum all the minimums up

(3)

Seeming quality

The value obtained in such a manner isseeming quality SQ(A) = X

i

mink ψi(k) +X

ij

minkk0 ψij(k, k0)

Compare it to the original energy E(A) = min

y

X

i

ψi(yi) +X

ij

ψij(yi, yj)

Obviously, SQ(A)E(A), i.e. SQ(A) is a lower bound

(4)

Seeming quality

Suppose, there exists a labelingy that is ”composed” of only the ”bests” labels and label pairs.

Obviously, this labeling is globally optimal and E(A) =SQ(A)holds.

Such tasks are calledtrivial.

(5)

Equivalent transformations (recap)

Two tasks A= (ψ) and A0 = (ψ0) are called equivalent, iff

X

i

ψi(yi) +X

ij

ψij(yi, yj)

=

X

i

ψ0i(yi) +X

ij

ψ0ij(yi, yj)

holds for all labelingsy

A(A)– Equivalence class (all thasks that are equivalent to A).

Equivalent transformations (re-parameterization):

(6)

Back to the seeming quality

Equivalent transformations do not change energies, but they do change the seeming quality SQ(A)

After an equivalent transformation

Φ = ϕi(k)∀i, k, ϕij(k), ∀ij, k ϕi(k) + X

j:ij∈E

ϕij(k) = 0 ∀i, k is applied, the newψ become

ψ0i(k) = ψi(k) +ϕi(k)

ψ0ij(k, k0) =ψij(k, k0) +ϕij(k) +ϕji(k0)

(7)

Back to the seeming quality

The idea – search for a task having the highest seeming quality in the equivalence classA(A) – maximize the lower bound:

X

i

mink

ψi(k) +ϕi(k)+ +X

ij

minkk0

ψij(k, k0) +ϕij(k) +ϕji(k0)→max

Φ

s.t. ϕi(k) + X

j:ij∈E

ϕij(k) = 0 ∀i, k

a concave non-differentiable optimization problem – How to (efficiently) maximize SQ(A)?

– The triviality check is NP in general

– For which A there exists a trivial equivalent?

(8)

Diffusion Algorithm

Repeat for all i,k until convergence:

1) Accumulate – transfer so many as possible to ψi(k):

4ij(k) = min

k0 ψij(k, k0) ψi(k) = ψi(k) + X

j:ij∈E

4ij(k, k0) ψij(k, k0) =ψij(k, k0)− 4ij(k, k0) 2) Distribute equally over ψij(k, k0):

4i(k) =ψi(k)/4 (for 4-Neighbourhood) ψij(k, k0) =ψij(k, k0) +4i(k)

ψi(k) = 0

(9)

Diffusion Algorithm

– It is unclear what for task is solved by the diffusion algorithm, i.e. it does not follow from the original task (maximize the lower bound). In general, the seeming quality is not optimized globally

– Works well in practice, easy to parallelize

– Diffusion is the Relaxation Labelling Algorithm (siehe OrAnd) im (min,+)-Semiring!!!

SQ– [M. Schlesinger, 1976?], Diffusion – [+Flach, 1998?], further elaboration – [+Werner, ?]

(10)

LP-Relaxation

Rewrite the Energy Minimization problem

X

i

ψi(yi) +X

ij

ψij(yi, yj)→min

y

as follows:

First, introduceweights wi(k)and wij(k, k0) for all labels and label pairs respectively. The problem reads:

X

i

X

k

wi(k)·ψi(k) +X

ij

X

kk0

wij(k, k0ψij(k, k0)→min

w

s.t. X

k

wi(k) = 1 ∀i, X

k0

wij(k, k0) = wi(k) ∀i, k, j, wi(k)∈ {0,1}, wij(k, k0)∈ {0,1}

(11)

LP-Relaxation

X

i

X

k

wi(k)·ψi(k) +X

ij

X

kk0

wij(k, k0ψij(k, k0)→min

w

s.t. X

k

wi(k) = 1 ∀i, X

k0

wij(k, k0) = wi(k) ∀i, k, j, wi(k)∈ {0,1}, wij(k, k0)∈ {0,1}

wi(k)∈ {0,1} together with Pkwi(k) = 1 (for edges analogously) cause that only ”allowed” configurations correspond to real labelings, Pk0wij(k, k0) =wi(k)cares about ”consistency”.

Second, the weights arerelaxed, i.e.

wi(k)∈[0,1], wij(k, k0)∈[0,1]

The task becomes alinear optimization problem

(12)

LP-Relaxation

The solution of the relaxed problem is (in general) non-integer

⇒the value of the LP-solution is not greater (lower bound), because the solution space is larger

⇒it is not possible to estimate the original diskrete solution from the continuous one

If there exists an optimal integer (integral, discrete) solution among all continuous ones, theLP is tight

(13)

LP-Relaxation

X

i

X

k

wi(k)·ψi(k) +X

ij

X

kk0

wij(k, k0ψij(k, k0)→min

w

s.t. 1) X

k

wi(k) = 1 ∀i X

kk0

wij(k, k0) = 1 ∀i, j 2) X

k0

wij(k, k0) =wi(k) ∀i, j, k, 3) wi(k)∈[0,1], wij(k, k0)∈[0,1]

The second part of 1) is redundant in fact, because it follows from the first part of 1) and the coupling constraints 2) Introduce Lagrange-coefficients for conditions of type 2), i.e.λij(k)

keep constraints 1) and 3)

(14)

LP-Relaxation

The Lagrangian is:

X

i

X

k

wi(k)·ψi(k) +X

ij

X

kk0

wij(k, k0ψij(k, k0) +

X

i

X

j

X

k

λij(k)·

"

X

k0

wij(k, k0)−wi(k)

#

→max

λ min

w

s.t. X

k

wi(k) = 1 ∀i, X

k,k0

wij(k, k0) = 1 ∀i, j wi(k)∈[0,1], wij(k, k0)∈[0,1]

Group summands around the weightsw:

X

i

X

k

wi(k)·

ψi(k)−X

j

λij(k)

+

X

ij

X

kk0

wij(k, k0

ψij(k, k0) +λij(k) +λji(k0)

→max

λ min

w

s.t. ...

(15)

LP-Relaxation

X

i

X

k

wi(k)·hψi(k)−X

j

λij(k)i

+

X

ij

X

kk0

wij(k, k0hψij(k, k0) +λij(k) +λji(k0)i

→max

λ min

w

s.t. X

k

wi(k) = 1 ∀i, X

k,k0

wij(k, k0) = 1 ∀i, j wi(k)∈[0,1], wij(k, k0)∈[0,1]

No coupling constraints that link wi(k) and wij(k, k0)

⇒summands (underPi und Pij) can be optimized with respect to their ownw independently !!!

(16)

LP-Relaxation

Denote λi(k) =−Pjλij(k) and consider just one sumand

X

k

wi(k)·

ψi(k) +λi(k)

→min

w

s.t. X

k

wi(k) = 1, wi(k)∈[0,1]

Thevalue of the solution is mink

ψi(k) +λi(k)

Analogously for edges – the value of the solution for an edge i, j is

minkk0

ψij(k, k0) +λij(k) +λji(k0)i Substitute all the stuff into the Lagrangian ...

(17)

LP-Relaxation

... that should be maximized with respect to λ-s:

X

i

mink

ψi(k) +λi(k)

+ +X

ij

minkk0

ψij(k, k0) +λij(k) +λji(k0)

→max

λ

s.t. λi(k) +X

j

λij(k) = 0 ∀i, k

⇒this is exactly the task to maximize the seeming quality with respect to the equivalent transformationsλ !!!

⇒the LP-relaxation and the maximization of the seeming quality are a duality pair

LP:O(EK2) variables, O(EK)constraints

SQ:O(EK) variables, O(V K)constraints ⇒ simpler

(18)

LP-Relaxation for binary submodular problems

Remember: binary Energy Minimization problems can be written in the form

E(y) =X

i

ψi·yi+X

ij

βij ·δ(yi 6=yj) with yi ∈ {0,1}. Rewrite it as

E(y) =X

i

ψi·yi+X

ij

βij · |yiyj|

Now relax it ”directly”, i.e. sayyi ∈[0,1]

– no coupling constraints :-)

If allβ-s are non-negative (the task is submodular, the

corresponding MinCut has only non-negative edge costs etc.), the problem is convex !!!

(19)

LP-Relaxation for binary submodular problems

”Linearize” it by introducing additional variablesyij E(y) = X

i

ψi·yi+X

ij

βij ·yij →min

y

s.t. yijyiyj, yijyjyi

At the optimum yij = max(yiyj, yjyi) =|yiyj| holds.

⇒linear optimization again.

If you build now the dual, you obtain the MaxFlow (try it at home ...)

(20)

Concluding remarks

E (discrete)≥LPSQ

LP =SQif the relaxed linear optimization is solved exactly There is nostrongly polynomial algorithm for linear

optimization in general

For binary (even non-submodular!!!) problems (MinCut) the LP can be solved exactly and efficiently by MaxFlow

E =LP – the LP-relaxation istight

Submodular (multi-label) problems are LP-tight Problems on chains are LP-tight

Check of LP-tightness is NP-complete in general

There are algorithms that allows to ”detect” whether the optimal discrete solution is obtained

(21)

Concluding remarks

Consider the following two schemes:

1. We relax the original multi-label problem and solve the LP 2. We transform the original multi-label problem into the

binary one. Then we relax the binary problem and solve it The first variant is ”more tight” then the second one

In the second variant it is possible to solve the LP exactly.

Interesting – in this case the optimal relaxed weights are always0, 1or 0.5

(22)

Further readings

* Boros, Hammer: Quadratic Pseudo-Boolean Optimization – Belief Propagation for MinSum – diffusion in fact

– Kolmogorov: Message Passing, TRWS (”special”

equivalent transformations)

– Komodakis, M. Schlesinger: Subgradient algorithm

* Rother: Fusion Moves algorithm – Werner: Cutting Plane algorithm

– Shechovtsov ... : Sub-/Supermodular Decomposition – Savchynskyy, Kappes, Schmidt, Schnörr: Nesterov’s

Scheme

– Shechovtsov, Savchynskyy ... : partial optimality – ...

*the next lecture by Carsten Rother

Referenzen

ÄHNLICHE DOKUMENTE

We define and analyze the criteria variability, regularity, and improvement potential which measure the expected quality and con- vergence speed of an evolutionary design

It proves separation theorems for convex hulls and cones (of finite sets), the Farkas, Gordan, Stiemke and Motzkin theorems, and two versions of the linear programming duality

• Doing compiler research, as though programs were written by people. – who are still around and care about getting their program written correctly

The calculation of the L p -L q -estimates for the linear equations of thermoelasticity of rhombic media showed that such calculations do not only become more and more

In the second block the model optimizes harvested forest area in each age class of each cell where final cut area >0 in order to fulfill domestic wood demand of

L P - D I T serves two purposes: First, to propose a data structure and declarations of functions that can easily provide efficient data processing commonly needed

Views or opinions expressed herein do not necessarily represent those of the Institute, its National Member Organizations, or other organi- zations supporting the

This paper reviews several duality results in the theory of linear vector optimization using an extended reformulation with general cone ordering.. This generalization gives some