Machine Learning II LP-relaxation
Dmitrij Schlesinger, Carsten Rother, Dagmar Kainmueller, Florian Jug
SS2014, 20.06.2014
Seeming quality
Consider the following ”stupid” algorithm.
Instead to optimize the original MinSum problem E(A) = min
y
X
i
ψi(yi) +X
ij
ψij(yi, yj)
proceed as follows:
1. Choose for each node and for each edge the best label(s) and the best label pair(s) respectively according to ψ 2. Sum all the minimums up
Seeming quality
The value obtained in such a manner isseeming quality SQ(A) = X
i
mink ψi(k) +X
ij
minkk0 ψij(k, k0)
Compare it to the original energy E(A) = min
y
X
i
ψi(yi) +X
ij
ψij(yi, yj)
Obviously, SQ(A)≤E(A), i.e. SQ(A) is a lower bound
Seeming quality
Suppose, there exists a labelingy that is ”composed” of only the ”bests” labels and label pairs.
Obviously, this labeling is globally optimal and E(A) =SQ(A)holds.
Such tasks are calledtrivial.
Equivalent transformations (recap)
Two tasks A= (ψ) and A0 = (ψ0) are called equivalent, iff
X
i
ψi(yi) +X
ij
ψij(yi, yj)
=
X
i
ψ0i(yi) +X
ij
ψ0ij(yi, yj)
holds for all labelingsy
A(A)– Equivalence class (all thasks that are equivalent to A).
Equivalent transformations (re-parameterization):
Back to the seeming quality
Equivalent transformations do not change energies, but they do change the seeming quality SQ(A)
After an equivalent transformation
Φ = ϕi(k)∀i, k, ϕij(k), ∀ij, k ϕi(k) + X
j:ij∈E
ϕij(k) = 0 ∀i, k is applied, the newψ become
ψ0i(k) = ψi(k) +ϕi(k)
ψ0ij(k, k0) =ψij(k, k0) +ϕij(k) +ϕji(k0)
Back to the seeming quality
The idea – search for a task having the highest seeming quality in the equivalence classA(A) – maximize the lower bound:
X
i
mink
ψi(k) +ϕi(k)+ +X
ij
minkk0
ψij(k, k0) +ϕij(k) +ϕji(k0)→max
Φ
s.t. ϕi(k) + X
j:ij∈E
ϕij(k) = 0 ∀i, k
a concave non-differentiable optimization problem – How to (efficiently) maximize SQ(A)?
– The triviality check is NP in general
– For which A there exists a trivial equivalent?
Diffusion Algorithm
Repeat for all i,k until convergence:
1) Accumulate – transfer so many as possible to ψi(k):
4ij(k) = min
k0 ψij(k, k0) ψi(k) = ψi(k) + X
j:ij∈E
4ij(k, k0) ψij(k, k0) =ψij(k, k0)− 4ij(k, k0) 2) Distribute equally over ψij(k, k0):
4i(k) =ψi(k)/4 (for 4-Neighbourhood) ψij(k, k0) =ψij(k, k0) +4i(k)
ψi(k) = 0
Diffusion Algorithm
– It is unclear what for task is solved by the diffusion algorithm, i.e. it does not follow from the original task (maximize the lower bound). In general, the seeming quality is not optimized globally
– Works well in practice, easy to parallelize
– Diffusion is the Relaxation Labelling Algorithm (siehe OrAnd) im (min,+)-Semiring!!!
– SQ– [M. Schlesinger, 1976?], Diffusion – [+Flach, 1998?], further elaboration – [+Werner, ?]
LP-Relaxation
Rewrite the Energy Minimization problem
X
i
ψi(yi) +X
ij
ψij(yi, yj)→min
y
as follows:
First, introduceweights wi(k)and wij(k, k0) for all labels and label pairs respectively. The problem reads:
X
i
X
k
wi(k)·ψi(k) +X
ij
X
kk0
wij(k, k0)·ψij(k, k0)→min
w
s.t. X
k
wi(k) = 1 ∀i, X
k0
wij(k, k0) = wi(k) ∀i, k, j, wi(k)∈ {0,1}, wij(k, k0)∈ {0,1}
LP-Relaxation
X
i
X
k
wi(k)·ψi(k) +X
ij
X
kk0
wij(k, k0)·ψij(k, k0)→min
w
s.t. X
k
wi(k) = 1 ∀i, X
k0
wij(k, k0) = wi(k) ∀i, k, j, wi(k)∈ {0,1}, wij(k, k0)∈ {0,1}
wi(k)∈ {0,1} together with Pkwi(k) = 1 (for edges analogously) cause that only ”allowed” configurations correspond to real labelings, Pk0wij(k, k0) =wi(k)cares about ”consistency”.
Second, the weights arerelaxed, i.e.
wi(k)∈[0,1], wij(k, k0)∈[0,1]
The task becomes alinear optimization problem
LP-Relaxation
The solution of the relaxed problem is (in general) non-integer
⇒the value of the LP-solution is not greater (lower bound), because the solution space is larger
⇒it is not possible to estimate the original diskrete solution from the continuous one
If there exists an optimal integer (integral, discrete) solution among all continuous ones, theLP is tight
LP-Relaxation
X
i
X
k
wi(k)·ψi(k) +X
ij
X
kk0
wij(k, k0)·ψij(k, k0)→min
w
s.t. 1) X
k
wi(k) = 1 ∀i X
kk0
wij(k, k0) = 1 ∀i, j 2) X
k0
wij(k, k0) =wi(k) ∀i, j, k, 3) wi(k)∈[0,1], wij(k, k0)∈[0,1]
The second part of 1) is redundant in fact, because it follows from the first part of 1) and the coupling constraints 2) Introduce Lagrange-coefficients for conditions of type 2), i.e.λij(k)
keep constraints 1) and 3)
LP-Relaxation
The Lagrangian is:
X
i
X
k
wi(k)·ψi(k) +X
ij
X
kk0
wij(k, k0)·ψij(k, k0) +
X
i
X
j
X
k
λij(k)·
"
X
k0
wij(k, k0)−wi(k)
#
→max
λ min
w
s.t. X
k
wi(k) = 1 ∀i, X
k,k0
wij(k, k0) = 1 ∀i, j wi(k)∈[0,1], wij(k, k0)∈[0,1]
Group summands around the weightsw:
X
i
X
k
wi(k)·
ψi(k)−X
j
λij(k)
+
X
ij
X
kk0
wij(k, k0)·
ψij(k, k0) +λij(k) +λji(k0)
→max
λ min
w
s.t. ...
LP-Relaxation
X
i
X
k
wi(k)·hψi(k)−X
j
λij(k)i
+
X
ij
X
kk0
wij(k, k0)·hψij(k, k0) +λij(k) +λji(k0)i
→max
λ min
w
s.t. X
k
wi(k) = 1 ∀i, X
k,k0
wij(k, k0) = 1 ∀i, j wi(k)∈[0,1], wij(k, k0)∈[0,1]
No coupling constraints that link wi(k) and wij(k, k0)
⇒summands (underPi und Pij) can be optimized with respect to their ownw independently !!!
LP-Relaxation
Denote λi(k) =−Pjλij(k) and consider just one sumand
X
k
wi(k)·
ψi(k) +λi(k)
→min
w
s.t. X
k
wi(k) = 1, wi(k)∈[0,1]
Thevalue of the solution is mink
ψi(k) +λi(k)
Analogously for edges – the value of the solution for an edge i, j is
minkk0
ψij(k, k0) +λij(k) +λji(k0)i Substitute all the stuff into the Lagrangian ...
LP-Relaxation
... that should be maximized with respect to λ-s:
X
i
mink
ψi(k) +λi(k)
+ +X
ij
minkk0
ψij(k, k0) +λij(k) +λji(k0)
→max
λ
s.t. λi(k) +X
j
λij(k) = 0 ∀i, k
⇒this is exactly the task to maximize the seeming quality with respect to the equivalent transformationsλ !!!
⇒the LP-relaxation and the maximization of the seeming quality are a duality pair
LP:O(EK2) variables, O(EK)constraints
SQ:O(EK) variables, O(V K)constraints ⇒ simpler
LP-Relaxation for binary submodular problems
Remember: binary Energy Minimization problems can be written in the form
E(y) =X
i
ψi·yi+X
ij
βij ·δ(yi 6=yj) with yi ∈ {0,1}. Rewrite it as
E(y) =X
i
ψi·yi+X
ij
βij · |yi−yj|
Now relax it ”directly”, i.e. sayyi ∈[0,1]
– no coupling constraints :-)
If allβ-s are non-negative (the task is submodular, the
corresponding MinCut has only non-negative edge costs etc.), the problem is convex !!!
LP-Relaxation for binary submodular problems
”Linearize” it by introducing additional variablesyij E(y) = X
i
ψi·yi+X
ij
βij ·yij →min
y
s.t. yij ≥yi −yj, yij ≥yj −yi
At the optimum yij = max(yi−yj, yj−yi) =|yi−yj| holds.
⇒linear optimization again.
If you build now the dual, you obtain the MaxFlow (try it at home ...)
Concluding remarks
E (discrete)≥LP ≥SQ
LP =SQif the relaxed linear optimization is solved exactly There is nostrongly polynomial algorithm for linear
optimization in general
For binary (even non-submodular!!!) problems (MinCut) the LP can be solved exactly and efficiently by MaxFlow
E =LP – the LP-relaxation istight
Submodular (multi-label) problems are LP-tight Problems on chains are LP-tight
Check of LP-tightness is NP-complete in general
There are algorithms that allows to ”detect” whether the optimal discrete solution is obtained
Concluding remarks
Consider the following two schemes:
1. We relax the original multi-label problem and solve the LP 2. We transform the original multi-label problem into the
binary one. Then we relax the binary problem and solve it The first variant is ”more tight” then the second one
In the second variant it is possible to solve the LP exactly.
Interesting – in this case the optimal relaxed weights are always0, 1or 0.5
Further readings
* Boros, Hammer: Quadratic Pseudo-Boolean Optimization – Belief Propagation for MinSum – diffusion in fact
– Kolmogorov: Message Passing, TRWS (”special”
equivalent transformations)
– Komodakis, M. Schlesinger: Subgradient algorithm
* Rother: Fusion Moves algorithm – Werner: Cutting Plane algorithm
– Shechovtsov ... : Sub-/Supermodular Decomposition – Savchynskyy, Kappes, Schmidt, Schnörr: Nesterov’s
Scheme
– Shechovtsov, Savchynskyy ... : partial optimality – ...
*the next lecture by Carsten Rother