Solving the Dual Problem - The Lagrange Dual Problem

6.2 The Lagrange Dual Problem

6.2.2 Solving the Dual Problem

The properties of the dual objective function established in the previous section allow us to analyze the convergence rate of the projected gradient method and al-ternative algorithms from convex optimization. Thereby the size of the instances occurring in gate sizing certainly restricts the options from a practical point of view.

Usually variants of the projected subgradient method are used in practice. Wang et al. [WDZ07] used the conditional gradient method. Szegedy [Sze05] used the projected gradient method, but without knowing that the subgradient he used was in fact the gradient. Nonetheless, the best choice for the multiplier update regard-ing convergence is not evident, and a seemregard-ingly heuristic multiplicative multiplier update is growing in popularity.

Application of the Projected Gradient Method

Let ¯x be the unique minimizer of the Lagrange function. By Theorem 6.3, d(¯x) is the gradient of the dual objective function. As we aim to maximize the dual function, the projected gradient method (cf. Algorithm 3.1) iteratively advances in the gradient direction, and

λ^(k+1):=π_X_cont(λ^(k)+ρ^(k)d(¯x^k).

delay_e :=at_v

delaye:=−rat_v PI

Figure 6.1: Extended timing graphG⁰ := (V⁰, E⁰)

Paradoxically, it does not make sense from a practical point of view to update the multipliers with d(¯x) because its entries are always positive and multipliers would never decrease. We briefly analyze this paradoxon:

Chen et al. [CCW99] suggested to propagate the arrival times in each iteration, and to add the negative of the local edge slacks slack_e(x) :=a_w−(a_v+delay_e(x))

Local edge slack

slacke(x) to the multipliers. This idea might be based on the fact that arrival times have to be primal feasible in the end, although they are ignored during optimization.

Szegedy [Sze05] observed that arrival time propagation is not necessary because the subsequent multiplier projection cancels the effect of adding the arrival times if the following extension G⁰ of the timing graphG is used:

Add a dummy node t and connect it with all inputs via edges E_in⁰ := {(t, v) :

Extended timing graph

G⁰:= (V⁰, E⁰) v ∈ Vin}, and with all outputs via edges E_out⁰ := {(v, t) : v ∈ Vout}. We set E⁰ :=E∪E_in⁰ ∪E_out⁰ ,V⁰ :=V ∪ {t} and defineG⁰ := (V⁰, E⁰).

The signal delay delaye over an edgee= (t, v)∈E_in⁰ is defined as the arrival time at_v atv. The signal delay delay_e over an edge e = (v, t) ∈ E_out⁰ is defined as the negative required arrival time (−rat_v) at v. Figure 6.1 shows the resulting graph.

Note that these delays are independent of gate sizes and yield the additional delay constraints

delaye ≤ av ∀e= (t, v)∈E_in⁰ , av+delaye ≤ 0 ∀e= (v, t)∈E_out⁰ .

Wang et al. [WDZ07], who proved differentiability of the dual objective, used a similar construction with two dummy nodes which models the same effect, but without giving any theoretical justification.

Note that differentiability of the dual function still holds with the additional timing constraints. Furthermore, the problem is more homogeneous as the flow constraints now need to hold for all vertices in the model graph. With the extended timing graph, it makes sense to update the multipliers with the delay vector, as not all edge delays are positive. The arrival time constraints are now encoded in G⁰, and propagation of the timing information in the design is in some sense performed by the multiplier projection.

On the other side, the vector of local edge slacks which was deployed for example

6.2 The Lagrange Dual Problem in Chen et al. [CCW99] seemingly worked well in practice. The explanation is the following:

Theorem 6.8 The vector of local edge slacks in G is a subgradient of the non-separated dual function D(λ) := infˆ x∈X_cont,a∈RⁿL(λ, a, x)ˆ defined in equation (6.1).

Proof. Recall that

L(λ, a, x)ˆ = cost(x) +X

e∈E

λ_edelay_e(x) +X

v∈V

a_v



 X

e∈δ⁺(v)

λ_e− X

e∈δ⁻(v)

λ_e





where const ∈ R is a constant that accounts for the fixed arrival times and fixed required arrival times at primary input and output pins, respectively.

When λ fulfills the flow constraints, the last non-constant term equals zero and the arrival time variables can be chosen arbitrarily. By Theorem 6.3 there exists a unique ¯x that minimizes the first two terms.

Consequently, there exists in general no unique minimizer of ˆL(λ, a, x), and the Lagrange function has infinitely many subgradients with entries (a_v+delay_e(¯x)− a_w), e∈E. Theorem 3.24 proves that these are indeed subgradients.

We conclude that updating the Lagrange multipliers with the delay vector in G⁰ is equivalent to updating the multipliers in G with the negative local edge slacks, where the arrival times are computed by static timing analysis.

The projected gradient method is summarized in Algorithm 6.1. πF denotes the projection to the non-negative network flow space F. Note that we proceed in positive gradient direction, as we aim to maximize the dual objective.

Algorithm 6.1 Projected Gradient Method for Gate Sizing Input: Dual objective function D(λ)

Output: λ∈ F, x∈X_cont

1: k←0

2: Choose starting point λ⁽⁰⁾ ∈ F

3: repeat

4: x^(k)←arg min_x∈X_contL(λ^(k), x)

5: g^(k)←(delaye(x^(k)))e∈E⁰ 6: λ^(k+1)←πF λ^(k)+ρ^(k)·g^(k)

7: k←k+ 1

8: until stopping criterion is satisfied

9: Return λ^(k), x^(k)

Convergence and Convergence Rate Given a zero duality gap, the projected gradient method solves the dual problem up to any desired accuracy. To the best of our knowledge, the convergence rate for gate sizing has not been considered before.

Among the drawbacks of the projected gradient method are its sensitivity to the choice of step size and start multipliers, and the slow convergence rate. If the step size degrades too fast, false decisions in early iterations due to inaccurate multipliers cannot be undone in later iterations. Line search repeatedly solves the Lagrange primal problem to determine the step size that maximizes the dual function. However, this is costly and hardly used for gate sizing in practice.

Because the non-negative network flow space F is convex and closed, Algorithm 6.1 converges to a stationary point for certain step size rules. The fact that the gradient is Lipschitz continuous ensures convergence even if the step size is con-stant. Additionally, the convergence rate depends linearly on

λ⁽⁰⁾−λ^∗

and the Lipschitz constant of∇D(λ) for certain step sizes, where λ^∗ is an optimal solution to the dual problem. The set F is unbounded and, to the best of our knowledge, no bounds on the Lagrange multipliers and hence on

λ⁽⁰⁾−λ^∗ exist.

Linear convergence rates of the projected gradient method have also been estab-lished under the assumption that the objective function is strongly convex or twice differentiable. Both assumptions are in general not fulfilled by the dual objective function (Lemma 6.4 and Lemma 6.5). For general convex functions, the conver-gence rate is unknown (cf. Section 3.3).

Finding a Good Start Solution Tennakoon and Sechen [TS02] propose the fol-lowing heuristic to find good start multipliers: Firstly, a steepest descent method aims to find gate sizes that satisfy a desired delay target. Secondly, a heuristic aims to find gate sizes inducing the same delays but a better objective function value.

Finally, the Lagrange multipliers that imply this sizing solution are estimated and constitute the start multipliers.

Other Methods

Well-known methods like bundle methods or the space dilation method seem to be impractical for large-scale applications because of the additional computational and storage requirements.

Heuristics In recent years, heuristic multiplier updates that are more sensitive to local timing information and independent of a global step size have become more popular. Tennakoon and Sechen [TS02] were the first to use a multiplicative multiplier update of the form

λ^(k+1)_e =λ^(k)_e ·crit^(k)_e , where crit^(k)e := _a ^a^v

w−delaye(x) encodes the violation of the timing constraint corre-sponding to edge e= (v, w) in iteration k. Since then, several variants have been developed but without any convergence guarantees. We refer to Chapter 7 for a theoretical justification of this and other modifications to the projected subgradient method.

6.3 The Lagrange Primal Problem

Im Dokument Algorithms for Circuit Sizing in VLSI Design (Seite 91-95)