Machine Learning II LP-relaxation

(1)

Machine Learning II LP-relaxation

Dmitrij Schlesinger, Carsten Rother, Dagmar Kainmueller, Florian Jug

SS2014, 20.06.2014

(2)

Seeming quality

Consider the following ”stupid” algorithm.

Instead to optimize the original MinSum problem E(A) = min

y

X

i

ψi(y_i) +^X

ij

ψij(y_i, yj)

proceed as follows:

1. Choose for each node and for each edge the best label(s) and the best label pair(s) respectively according to ψ 2. Sum all the minimums up

(3)

Seeming quality

The value obtained in such a manner isseeming quality SQ(A) = ^X

i

mink ψ_i(k) +^X

ij

minkk⁰ ψ_ij(k, k⁰)

Compare it to the original energy E(A) = min

y

X

i

ψ_i(y_i) +^X

ij

ψ_ij(y_i, y_j)

Obviously, SQ(A)≤E(A), i.e. SQ(A) is a lower bound

(4)

Seeming quality

Suppose, there exists a labelingy that is ”composed” of only the ”bests” labels and label pairs.

Obviously, this labeling is globally optimal and E(A) =SQ(A)holds.

Such tasks are calledtrivial.

(5)

Equivalent transformations (recap)

Two tasks A= (ψ) and A⁰ = (ψ⁰) are called equivalent, iff

X

i

ψ_i(y_i) +^X

ij

ψ_ij(y_i, y_j)

=

X

i

ψ⁰_i(y_i) +^X

ij

ψ⁰_ij(y_i, y_j)

holds for all labelingsy

A(A)– Equivalence class (all thasks that are equivalent to A).

Equivalent transformations (re-parameterization):

(6)

Back to the seeming quality

Equivalent transformations do not change energies, but they do change the seeming quality SQ(A)

After an equivalent transformation

Φ = ϕ_i(k)∀i, k, ϕ_ij(k), ∀ij, k ϕi(k) + ^X

j:ij∈E

ϕij(k) = 0 ∀i, k is applied, the newψ become

ψ⁰_i(k) = ψ_i(k) +ϕ_i(k)

ψ⁰_ij(k, k⁰) =ψ_ij(k, k⁰) +ϕ_ij(k) +ϕ_ji(k⁰)

(7)

Back to the seeming quality

The idea – search for a task having the highest seeming quality in the equivalence classA(A) – maximize the lower bound:

X

i

mink

ψi(k) +ϕi(k)+ +^X

ij

minkk⁰

ψ_ij(k, k⁰) +ϕ_ij(k) +ϕ_ji(k⁰)→max

Φ

s.t. ϕ_i(k) + ^X

j:ij∈E

ϕ_ij(k) = 0 ∀i, k

a concave non-differentiable optimization problem – How to (efficiently) maximize SQ(A)?

– The triviality check is NP in general

– For which A there exists a trivial equivalent?

(8)

Diffusion Algorithm

Repeat for all i,k until convergence:

1) Accumulate – transfer so many as possible to ψ_i(k):

4_ij(k) = min

k⁰ ψ_ij(k, k⁰) ψ_i(k) = ψ_i(k) + ^X

j:ij∈E

4_ij(k, k⁰) ψ_ij(k, k⁰) =ψ_ij(k, k⁰)− 4_ij(k, k⁰) 2) Distribute equally over ψ_ij(k, k⁰):

4_i(k) =ψ_i(k)/4 (for 4-Neighbourhood) ψij(k, k⁰) =ψij(k, k⁰) +4i(k)

ψ_i(k) = 0

(9)

Diffusion Algorithm

– It is unclear what for task is solved by the diffusion algorithm, i.e. it does not follow from the original task (maximize the lower bound). In general, the seeming quality is not optimized globally

– Works well in practice, easy to parallelize

– Diffusion is the Relaxation Labelling Algorithm (siehe OrAnd) im (min,+)-Semiring!!!

– SQ– [M. Schlesinger, 1976?], Diffusion – [+Flach, 1998?], further elaboration – [+Werner, ?]

(10)

LP-Relaxation

Rewrite the Energy Minimization problem

X

i

ψ_i(y_i) +^X

ij

ψ_ij(y_i, y_j)→min

y

as follows:

First, introduceweights w_i(k)and w_ij(k, k⁰) for all labels and label pairs respectively. The problem reads:

X

i

X

k

w_i(k)·ψ_i(k) +^X

ij

X

kk⁰

w_ij(k, k⁰)·ψ_ij(k, k⁰)→min

w

s.t. ^X

k

wi(k) = 1 ∀i, ^X

k⁰

wij(k, k⁰) = wi(k) ∀i, k, j, w_i(k)∈ {0,1}, w_ij(k, k⁰)∈ {0,1}

(11)

LP-Relaxation

X

i

X

k

w_i(k)·ψ_i(k) +^X

ij

X

kk⁰

w

s.t. ^X

k

w_i(k) = 1 ∀i, ^X

k⁰

w_ij(k, k⁰) = w_i(k) ∀i, k, j, w_i(k)∈ {0,1}, w_ij(k, k⁰)∈ {0,1}

w_i(k)∈ {0,1} together with ^P_kw_i(k) = 1 (for edges analogously) cause that only ”allowed” configurations correspond to real labelings, ^P_k⁰w_ij(k, k⁰) =w_i(k)cares about ”consistency”.

Second, the weights arerelaxed, i.e.

wi(k)∈[0,1], wij(k, k⁰)∈[0,1]

The task becomes alinear optimization problem

(12)

LP-Relaxation

The solution of the relaxed problem is (in general) non-integer

⇒the value of the LP-solution is not greater (lower bound), because the solution space is larger

⇒it is not possible to estimate the original diskrete solution from the continuous one

If there exists an optimal integer (integral, discrete) solution among all continuous ones, theLP is tight

(13)

LP-Relaxation

X

i

X

k

w_i(k)·ψ_i(k) +^X

ij

X

kk⁰

w

s.t. 1) ^X

k

w_i(k) = 1 ∀i ^X

kk⁰

w_ij(k, k⁰) = 1 ∀i, j 2) ^X

k⁰

wij(k, k⁰) =wi(k) ∀i, j, k, 3) w_i(k)∈[0,1], w_ij(k, k⁰)∈[0,1]

The second part of 1) is redundant in fact, because it follows from the first part of 1) and the coupling constraints 2) Introduce Lagrange-coefficients for conditions of type 2), i.e.λ_ij(k)

keep constraints 1) and 3)

(14)

LP-Relaxation

The Lagrangian is:

X

i

X

k

w_i(k)·ψ_i(k) +^X

ij

X

kk⁰

w_ij(k, k⁰)·ψ_ij(k, k⁰) +

X

i

X

j

X

k

λ_ij(k)·

"

X

k⁰

w_ij(k, k⁰)−w_i(k)

#

→max

λ min

w

s.t. ^X

k

w_i(k) = 1 ∀i, ^X

k,k⁰

w_ij(k, k⁰) = 1 ∀i, j w_i(k)∈[0,1], w_ij(k, k⁰)∈[0,1]

Group summands around the weightsw:

X

i

X

k

w_i(k)·

ψ_i(k)−^X

j

λ_ij(k)

+

X

ij

X

kk⁰

w_ij(k, k⁰)·

ψ_ij(k, k⁰) +λ_ij(k) +λ_ji(k⁰)

→max

λ min

w

s.t. ...

(15)

LP-Relaxation

X

i

X

k

w_i(k)·^hψ_i(k)−^X

j

λ_ij(k)ⁱ

+

X

ij

X

kk⁰

w_ij(k, k⁰)·^hψ_ij(k, k⁰) +λ_ij(k) +λ_ji(k⁰)ⁱ

→max

λ min

w

s.t. ^X

k

wi(k) = 1 ∀i, ^X

k,k⁰

wij(k, k⁰) = 1 ∀i, j w_i(k)∈[0,1], w_ij(k, k⁰)∈[0,1]

No coupling constraints that link w_i(k) and w_ij(k, k⁰)

⇒summands (under^P_i und ^P_ij) can be optimized with respect to their ownw independently !!!

(16)

LP-Relaxation

Denote λ_i(k) =−^P_jλ_ij(k) and consider just one sumand

X

k

w_i(k)·

ψ_i(k) +λ_i(k)

→min

w

s.t. ^X

k

w_i(k) = 1, w_i(k)∈[0,1]

Thevalue of the solution is mink

ψ_i(k) +λ_i(k)

Analogously for edges – the value of the solution for an edge i, j is

minkk⁰

ψ_ij(k, k⁰) +λ_ij(k) +λ_ji(k⁰)ⁱ Substitute all the stuff into the Lagrangian ...

(17)

LP-Relaxation

... that should be maximized with respect to λ-s:

X

i

mink

ψ_i(k) +λ_i(k)

+ +^X

ij

minkk⁰

ψ_ij(k, k⁰) +λ_ij(k) +λ_ji(k⁰)

→max

λ

s.t. λ_i(k) +^X

j

λ_ij(k) = 0 ∀i, k

⇒this is exactly the task to maximize the seeming quality with respect to the equivalent transformationsλ !!!

⇒the LP-relaxation and the maximization of the seeming quality are a duality pair

LP:O(EK²) variables, O(EK)constraints

SQ:O(EK) variables, O(V K)constraints ⇒ simpler

(18)

LP-Relaxation for binary submodular problems

Remember: binary Energy Minimization problems can be written in the form

E(y) =^X

i

ψi·yi+^X

ij

βij ·δ(yi 6=yj) with y_i ∈ {0,1}. Rewrite it as

E(y) =^X

i

ψ_i·y_i+^X

ij

β_ij · |y_i−y_j|

Now relax it ”directly”, i.e. sayy_i ∈[0,1]

– no coupling constraints :-)

If allβ-s are non-negative (the task is submodular, the

corresponding MinCut has only non-negative edge costs etc.), the problem is convex !!!

(19)

LP-Relaxation for binary submodular problems

”Linearize” it by introducing additional variablesy_ij E(y) = ^X

i

ψi·yi+^X

ij

βij ·yij →min

y

s.t. yij ≥yi −yj, yij ≥yj −yi

At the optimum y_ij = max(y_i−y_j, y_j−y_i) =|y_i−y_j| holds.

⇒linear optimization again.

If you build now the dual, you obtain the MaxFlow (try it at home ...)

(20)

Concluding remarks

E (discrete)≥LP ≥SQ

LP =SQif the relaxed linear optimization is solved exactly There is nostrongly polynomial algorithm for linear

optimization in general

For binary (even non-submodular!!!) problems (MinCut) the LP can be solved exactly and efficiently by MaxFlow

E =LP – the LP-relaxation istight

Submodular (multi-label) problems are LP-tight Problems on chains are LP-tight

Check of LP-tightness is NP-complete in general

There are algorithms that allows to ”detect” whether the optimal discrete solution is obtained

(21)

Concluding remarks

Consider the following two schemes:

1. We relax the original multi-label problem and solve the LP 2. We transform the original multi-label problem into the

binary one. Then we relax the binary problem and solve it The first variant is ”more tight” then the second one

In the second variant it is possible to solve the LP exactly.

Interesting – in this case the optimal relaxed weights are always0, 1or 0.5

(22)

Machine Learning II LP-relaxation