An alternative characterization of the central path

4.4 Self-dual cones and self-scaled barriers

5.1.1 An alternative characterization of the central path

The central path is also the set of points z ∈ L that satisfy the nonlinear equation s+µg(x) = 0. This alternative representation of the central path is the subject of Theorem 5.1.4 for which we first show the following lemma.

Lemma 5.1.3. Letz^? be a minimizer of (PDµ). Then the minimizer of

Furthermore if ψdenotes the objective function of (PDµ), and ˆψthe objective function of (5.5), thenψ(z) = ˆψ(_µ¹z). For observe that

Theorem 5.1.4. The minimizer of (PD_µ)is uniquely defined by the equations

µg(x) +s= 0 (5.6a)

Proof. Since the problem (PDµ) is bounded and feasible, it is solvable and therefore there exist Lagrange multipliers λy, λx, λθ for which the first-order optimality conditions

hold. Using the skew symmetry of G and the properties of the gradients of conjugate pairs of functions, and defining λs=−g(x), we can write (5.7) as

G^T

Equations (5.8) are the optimality conditions for the minimization problem (5.5) of Lemma (5.1.3) and therefore λy, λx, λs, λθ solve (5.5). Using Lemma 5.1.3 we can conclude that if z^? minimizes (PD_µ) then _µ¹z^? minimizes (5.5) and the Lagrange multipliers of (5.7) are equal to _µ¹z^?. Using (5.7c) we conclude that x^?=−µg^?(s^?), that s^?=µg(x^?), and that at the minimizer (5.6) holds.

To show the converse, assume that atzequations (5.6) hold; then (5.6a) can be written as

5.2 Potential reduction algorithms for conic pro-gramming problems

Since the pointx(µ), y(µ), s(µ) on the central path is the minimizer of minimize µ0

and this problem is convex with self-concordant objective, Newton’s method is efficient at finding points close to the central path.

This suggests the following strategy: For a fixedµk use Newton’s method to approximately minimize (PDµ) in order to find a point zk close to the central path. Then, reduce µk to µk+1 and use Newton’s method starting fromzk to compute a new iterate z_k+1 that approximately minimizes (PD_µ), and so on.

This scheme forms a sequence that tracks the central path to the solution of the conic programming problem.

The question of how to chooseµat each iteration remains. Potential reduc-tion methods setµ= ^x^T_ρ^s at every iteration, whereρ > ν is a constant chosen appropriately. At iterationk they solve for the Newton direction of the barrier problem withµ= ^x^T^k_ρ^s^k and choose a step size by doing a linesearch to reduce a merit function (the potential function). Before we introduce the potential function we argue whyρ > ν is necessary.

Lemma 5.2.1. For anyµ >0 the point on the central path satisfiesµ= ^x^T_ν^s. Proof. Since at the central path s+µg(x) = 0 and f is a ν-logarithmically homogeneous barrier, claim (4.3.1) givesx^Ts=µx^Tg(x) =µν.

If the iterate is on the central path andµ= ^x^T^k_ν^s^k is chosen, then the barrier problem will be at its minimizer and the Newton direction will have length zero. This would cause the method to stall. A choice ofρ > ν implies that the barrier problem is never fully solved and that at each iteration the value ofx^Ts is reduced. This in turn implies thatθ→0. A choice ofρ < ν is contradictory, for this choice would result in the Newton direction for a barrier problem with a largerµinstead of a smaller one.

5.2.1 Newton direction for the barrier problem

The Newton direction for the barrier problem has the form ∇²F(z) B^T

where B is the matrix that encodes the linear equality constraints in (PDµ):

and the function F(z) is the objective of the barrier problem, namely the self-concordant convex function

F(z) = µ₀

µνθ+f(x) +f^?(s).

For future reference it is useful to expand (5.9) into the following systems of equations. The first

corresponds to the primal barrier, the second

H^?(s)∆s−λ¯x=−g^?(s) (5.12) corresponds to the dual barrier, and the third

5.2.2 The potential function

Potential functions are a useful tool for analyzing conic programming algo-rithms. With them it can be shown that a potential reduction primal-dual conic programming algorithm achieves a precision of εin O(√

νlog(1/ε)) iterations.

This is the state of the art complexity bound for general conic programming.

The usefulness of potential functions is not limited to theoretical aspects;

potential reduction algorithms have proven to be robust and computationally efficient. Their merit lies in the fact that potential functions define a very principled way to choose a step length, so that the next iterate will achieve a sufficient reduction in the complementarity while staying centered enough.

The potential function we use in this work was first presented for linear programming by Ye [54] and then generalized to conic programming by Nesterov [43]. For a more detailed explanation of potential reduction in the context of general conic programming see [40].

We now introduce the potential function Ψ and the functional proximity measure Ω, and we visit some of their properties and those of a modified Newton

method as applied to the reduction of Ψ. This will lay the foundation for the presentation of the standard computational complexity results on potential reduction methods.

Define

Ψ(x, s) =ρlog(x^Ts) +f(x) +f^?(s)−νlog(ν) +ν, (5.14) whereρ > ν is a scalar,f(x) is the barrier for the primal cone, andf^?(s) is the conjugate barrier for the dual cone.

Observe that if zk ∈ L is a sequence that approaches a sub-optimal limit on the boundary of the cones, the barrier term will tend to infinity and the complementarity term ρlog(x^Ts) will be bounded below (otherwise the com-plementarity tends to zero, contradicting the sub-optimality of the limit), and therefore Ψ will tend to infinity. On the other hand if the iterate approaches an optimal point, the term ρlog(x^Ts) will tend to −∞, dominating the effect of the barriers, and Ψ will tend to−∞. Potential reduction algorithms work by following Ψ to−∞to find a solution to the problem.

The functional proximity measure Ω(x, s) :K × K^?→Rdefined by

Ω(x, s) =νlog(x^Ts) +f(x) +f^?(s)−νlog(ν) +ν (5.15) is a useful way to evaluate the distance from a point to the central path. The function Ω is positive in the feasible set and Ω(x, s) = 0 iff the argument is on the central path.

Lemma 5.2.2. The function Ω(x, s)≥0andΩ(x, s) = 0iffs+µg(x) = 0with µ= ^x^T_ν^s.

Proof. Letµ= ^x^T_ν^s. From the definition of the conjugate function we have that

−f^? s

= inf

x^Ts µ +f(x)

≤x^Ts µ +f(x) and then

0≤f^?(s

µ) +x^Ts µ +f(x) 0≤νlog(µ) +f^?(s) +f(x) +ν

0≤νlog(x^Ts) +f^?(s) +f(x)−νlog(ν) +ν 0≤Ω(x, s).

On the other hand ifs+µg(x) = 0 then Ω(x, s) = Ω(x,−µg(x))

=νlog(µν) +f^?(−µg(x)) +f(x)−νlog(ν) +ν

=νlog(µ) +νlog(ν)−νlog(µ)−f(x)−ν+f(x)−νlog(ν) +ν

= 0.

For the converse, if Ω(x, s) =νlog(x^Ts) +f^?(s) +f(x)−νlog(ν) +ν = 0 then the properties ofν-logarithmically homogeneous barriers imply−f^?(_µ^s) = f(x) + ^x^T_µ^s ≥ infxˆ

f(ˆx) +^ˆ^x^T_µ^so

= f^?

s µ

so ˆx = x minimizes f(ˆx) + ^x^ˆ^T_µ^s, which in turn implies thatg(x) +_µ^s = 0.

Now we can establish some results about the function Ψ in (5.14).

Lemma 5.2.3. The functionΨis unbounded below in the feasible set.

Proof. Using (5.14) and (5.15) write Ψ(x, s) = (ρ−ν) log(x^Ts) + Ω(x, s), there-fore at the central path we then have Ψ(x(µ), s(µ)) = (ρ−ν) log(µν) and there-fore Ψ→ −∞as µ→0.

A converse result also holds because Ψ induces an upper bound on the com-plementarity. Thus, reducing Ψ to−∞implies thatx^Tswill tend to zero.

Lemma 5.2.4. If Ψ(x, s)≤(ρ−ν) log(ε) for some feasiblex, sthenx^Ts≤ε.

Proof. Since Ψ(x, s) = (ρ−ν) log(x^Ts) + Ω(x, s), the bound (ρ−ν) log(x^Ts)≤ Ψ(x, s) holds. Therefore Ψ(x, s)≤(ρ−ν) log(ε) implies that (ρ−ν) log(x^Ts)≤ (ρ−ν) log(ε), which in turn impliesx^Ts≤ε.

The following is a rephrasing of the previous result that is useful to analyze the computational complexity of the potential reduction algorithms.

Lemma 5.2.5. Any algorithm that produces a sequence of feasible iterates {xk, sk} such thatΨ(xk+1, sk+1)<Ψ(xk, sk)−δ for someδ >0, will converge to an εaccurate iterate inO (ρ−ν) log(¹_ε)

iterations.

Proof. Letx0, s0be some starting iterate and denote Ψ0= Ψ(x0, s0). Using the bound (ρ−ν) log(x^Ts)≤(ρ−ν) log(x^Ts) + Ω(x, s) = Ψ(x, s), we get that

(ρ−ν) log(x^T_ksk)≤Ψ0−δk, which holds iff

Ψ(x0, s0)

δ +ρ−ν

δ log(1/ε)≤k.

Im Dokument Re-distributed by Stanford University under license with the author. (Seite 44-49)