• Keine Ergebnisse gefunden

Comparison with Lagrangian Relaxation

6.6 Additional Constraints

7.2.4 Comparison with Lagrangian Relaxation

Lagrangian Relaxation vs. Multiplicative Weights for the Continuous Relaxation

We highlight the differences between the multiplicative weights algorithm (Algo-rithm 7.4) and the Lagrangian relaxation approach with the projected gradient method (Algorithm 6.1) for continuous gate sizing from a theoretical point of view.

Objective Function Both approaches are based on the convex program for gate sizing (4.10). While the projected gradient method (Algorithm 6.1) optimizes the objective power consumption directly, Algorithm 7.4 requires an upper bound on the power consumption to transform the objective function into a constraint. The upper bound can be specified by a designer or determined by binary search, which can be time-consuming. On the other hand, a weight is assigned to the new constraint, and updated in Algorithm 7.4 based on its criticality. Intuitively, this makes it easier to find a better tradeoff between power consumption and delays.

Sizing Oracle and Arrival Times Both algorithms call similar oracle algorithms to get feasible gate sizes in each iteration, but with different objectives. In Algorithm 6.1 the gradient is the vector of delays, and it is therefore important to find gate sizes that are close to the optimal sizes. In Algorithm 7.4, one is interested in gate sizes that minimize the power-delay tradeoff function.

Arrival time variables are disregarded in the Lagrangian relaxation approach, in-stead the Lagrange multipliers are restricted to the non-negative flow space. Al-gorithm 7.4 iteratively computes arrival times for all v ∈ V with an arrival time oracle (Algorithm 7.2). The oracle runs in linear time, whereas an exact multiplier projection involves minimizing a quadratic function and is time-consuming in prac-tice. It is easy to see that these are only two equivalent ways of tackling the same problem:

The function that is (approximately) minimized in each iteration of Algorithm 7.4 is equivalent to the Lagrange primal function before eliminating the arrival time variables (cf. Section 7.2.1). We conclude that Algorithm 7.4 approximately minimizes the non-simplified Lagrange function, and Algorithm 6.1 approximately minimizes the simplified Lagrange function in each iteration (cf. Section 6.1).

In the Lagrangian relaxation framework, the non-simplified Lagrange function and an arrival time oracle have been used for example by Langkau [Lan00], but this approach is not common.

Multiplier Update vs. Weight Update The Lagrange multiplier update rule in Algorithm 6.1 is additive: the new multiplier vector is generated by proceeding in gradient direction. The same step size applies to each multiplier, and the con-vergence of the algorithm is very sensitive to this choice. Algorithm 7.4 uses a

multiplicative update rule, where each weight is updated based on the criticality of the corresponding constraint, which is more sensitive to local information.

Starting Solution In the Lagrangian relaxation framework, the duality gap is zero, and convergence of Algorithm 6.1 can be guaranteed only if a strongly feasible solu-tion exists. In the multiplicative weights framework, a feasible solusolu-tion is sufficient (cf. Section 7.2.1).

In theory, both algorithms converge independently of the start multipliers/weights, but it is well-known that convergence of the projected gradient method is highly sensitive to the choice of the starting solution. In the gate sizing context, this was for example observed by Tennakoon and Sechen [TS02], who proposed a preprocessing step to find a good starting solution.

Convergence and Running Time An advantage of Algorithm 7.4 is that the num-ber of iterations can be determined depending on the desired accuracy: More it-erations yield more accurate solutions. The number of itit-erations of Algorithm 6.1 that are necessary to achieve a certain accuracy is unknown, and hence it is not clear whether the algorithm even runs in polynomial time (cf. Section 6.2).

Algorithm 7.4 is derived from the basic variant of the multiplicative weights algo-rithm, where the number of iterations depends on the problem width, and is thus not polynomial. Other variants, for example the scale-free multiplicative weights algorithm (see H¨ahnle [H¨ah15]), exhibit better running times. We used the basic variant to enable better comparison between two different models for gate sizing (see Section 8.10). Additionally, we employ the discretized version of Algorithm 7.4 to justify the discretized and heuristically modified version of Algorithm 6.1 in the next section.

Discretized Lagrangian Relaxation in Practice

Since Chen et al. [CCW99] published their groundwork on Lagrangian relaxation for gate sizing, this approach has been widely adopted both for the continuous and the discrete problem. We start with an overview over previous works and the most prominent modifications that are usually applied to the discretized La-grangian relaxation approach in practice in order to improve convergence of the projected gradient method (Algorithm 6.1). We then observe that the most promi-nent modifications can be theoretically justified by the discretized multiplicative weights algorithm.

Multiplicative Multiplier Update Tennakoon and Sechen [TS02] were the first to propose a multiplicative Lagrange multiplier update rule of the form:

λ(t+1)e :=λ(t)e ·crit(t)e ∀e∈E,

7.2 The Multiplicative Weights Algorithm for Gate Sizing

where crit(t)e encodes the violation of the timing constraint corresponding to edge e in iteration k. Arrival times are computed by static timing analysis. Their motivation was to find a multiplier update that is more sensitive to local information and independent of the global step size.

Although this proposal was made for the continuous relaxation, it was widely adopted by subsequent works on the discretized approach, and variants can for example be found in Ozdal et al. [Ozd+12], Li et al. [Li+12a] and Livramento et al. [Liv+13; Liv+14]. A variant proposed by Flach et al. [Fla+14] incorporates the clock cycle timeD∈R>0 in the multiplier update:

λ(t+1)e :=





λ(t)e ·

1 +av+delayDe(x)−ratw 1/l

, ifav+delaye(x)≥ratw

λ(t)e ·

1 +ratw−av−delayD e(x) −l

, ifav+delaye(x)< ratw, fore= (v, w)∈E, whereratw is the required arrival time at w and av the arrival time at v as computed by static timing analysis based on the sizing solution x in the current iteration. The value of l∈Nchanges in the course of the algorithm.

Objective Weight Tennakoon and Sechen [TS08] introduced a weight ωpower for the objective function cost(x) to find a better tradeoff between delay optimization and power optimization. The authors optimize a modified Lagrange function of the form ωpower·cost(x) +P

e∈Eλedelaye(x). The factor ωpower is initialized with a value smaller than 1, and updated in each iteration based on the timing criticality of the design as follows:

ωpower(t+1) :=ωpower(t) · min

v∈Vend

ratv atv

.

Livramento et al. [Liv+13] experimentally observed that adding a power weight not only leads to less power consumption on the ISPD 2012 Gate Sizing Benchmarks (Ozdal et al. [Ozd+12]), but also to less timing constraint violations compared to running their algorithm without the power weight.

Multiplier Projection In practice, heuristics with linear running time but without any approximation guarantees estimate the multiplier projection, as computing an exact projection is time-consuming in practice, see for example Tennakoon and Sechen [TS02] and Szegedy [Sze05].

Comparison with Algorithm 7.6 We now compare the discretized projected gradi-ent method including the modifications presgradi-ented above with the discretized mul-tiplicative weights algorithm (Algorithm 7.6) for the feasibility problem. Both algorithms are heuristics and do not necessarily terminate.

FeasibilityProblem(budgetpower)

Setω(1):=1Rm+1, 0ν0.5,t0 whileimprovementdo

tt+ 1

x(t)DiscreteLocalRefine

ω(t) a(t)ATOracle

x(t), ω(t) foreEdo

ω(t+1)e ω(t)e

1νa

(t)

w−a(t)v −delaye(x(t))

ρ

end for ω(t+1)m+1 ω(t)m+1

1νbudgetpower−cost(x(t))

ρ

end while Return ¯x=1tP

t0≤tx(t0)

Modified projected gradient

Initializeλ(1)Rm, ω(1)powerR,t0 whileimprovementdo

tt+ 1

x(t)DiscreteLocalRefine

λ(t), ωpower(t)

Computea(t), rat(t)R|V| by STA foreE do

λ(t+1)e λ(t)e

1 +

a(t)v +delaye(x(t))−rat(t)w D

θ

(t) e

end for

ωpower(t+1) :=ωpower(t) ·maxv∈Vend rat(t)v a(t)v . Projectλ(t+1)to flow spaceF end while

returnx(t)

Hereθ(t)e = 1/lifa(t)v +delaye(x(t))rat(t)w and−lotherwise. STA is the abbreviation of static timing analysis

The ideas behind the modifications of the discretized projected gradient algorithm essentially yield the discretized multiplicative weights algorithm.

We already established in this section that the multiplicative weights algorithm (Algorithm 7.6) tackles the non-simplified Lagrange function, which contains the arrival time variables, in each iteration. Together with the arrival time oracle, this is equivalent to optimizing the simplified Lagrange function combined with the multiplier projection.

Although the multiplier update in the modified projected gradient algorithm is heuristic and based on arrival times and required arrival times computed by static timing analysis, the ideas behind the multiplicative update rule and the power weight ωpower for the objective function can now be theoretically justified by Algorithm 7.6.

Recall that even under the assumption that an approximation algorithm is known for the discrete power-delay tradeoff problem, it is unknown if the projected gra-dient method converges because no bound on the Lagrange multipliers exist (cf.

Section 6.5). In the multiplicative weights algorithm, the oracle error would di-rectly translate into the final approximation. The number of iterations required to get a desired accuracy can still be determined and depends on the oracle error (cf.

Section 7.2.2).

A drawback of Algorithm 7.6 is that the vector ¯xreturned is not necessarily feasible discrete. However, the modified projected gradient has been successfully used in practice, which indicates that the solution vector from the last iteration can be a good choice. Additionally, a binary search for budgetpower is necessary.

8 The Resource Sharing Framework for Gate Sizing

The min-max resource sharing problem is a fundamental problem in mathematical optimization. It consists of distributing a limited set of resources among a limited set of customers who compete for the resources. An optimal solution distributes the resources in such a way that the maximum resource usage is minimized. This model has been successfully applied to (timing-driven) global routing in VLSI de-sign, and the fastest approximation algorithm for this problem is a variant of the multiplicative weights algorithm. Having established the effectiveness of the multi-plicative weights method for gate sizing, this chapter is dedicated to gate sizing as a min-max resource sharing problem.

We begin with a formal problem definition in Section 8.1 and demonstrate in Sec-tions 8.2 and 8.3 how the continuous relaxation of the gate sizing problem fits into this framework using a single gate customer. With the algorithm of M¨uller, Radke and Vygen [MRV11] for the min-max resource sharing problem we get a fast ap-proximation for the continuous relaxation of the gate sizing problem in Section 8.4.

Under certain assumptions the running time is polynomial.

Subsequently, we compare its running time with existing algorithms for the contin-uous relaxation in Section 8.5. Section 8.6 models gate sizing as min-max resource sharing problem with path delay resources. This model was proposed by H¨ahnle [H¨ah15]. The approximation can easily be discretized, although with unknown per-formance guarantee because the discrete power-delay tradeoff problem occurs as a subproblem (cf. Section 5.2). Additionally, the convex combination returned by the resource sharing algorithm needs to be rounded. We consider rounding in Sec-tion 8.7 and give a bound on the approximaSec-tion guarantee of randomized rounding for a special case. Section 8.8 describes how constraints on load capacitance, slew and placement density, which need to be taken into account in practice, fit into the resource sharing framework. Integration with timing-driven global routing and repeater insertion is the subject of Section 8.9. The chapter concludes with an evaluation of the resource sharing model in Section 8.10.

The interpretation of gate sizing as a resource sharing problem has been used before in game-theoretic approaches for the discrete problem, but without any performance guarantee (cf. Section 4.7). However, our interpretation as min-max resource shar-ing problem with one gate customer is novel.

The results in this chapter are joint work with Nicolai H¨ahnle and Stephan Held.

8.1 The Min-Max Resource Sharing Problem

In the min-max resource sharing problem, we are given a finite set R of resources,

ResourcesR

a finite set C ofcustomers, and for each customerc∈C an implicitly given convex

CustomersC

set Bc of feasible solutions and a convex resource usage function gc :Bc → R|≥0R|.

Feas. solutionBc

We assume that these functions can be computed efficiently.

We are further given an oracle function fc : R|≥0R| → Bc for each customer, also called block solver, which computes for given resource weights ω ∈R|>0R| a feasible

Reource weights

ωR|R| solution bc∈Bc with ωtgc(bc)≤η·optc(ω), whereoptc(ω) := infb∈Bcωtgc(b), and

optc(ω)

η ≥1 is the approximation factor of the oracle.

The task is to find a feasible solution bc ∈Bc for each customer c ∈C such that the maximum resource usage maxr∈RP

c∈C(gc(bc))r is approximately minimized.

Note that the block solver for customer crequires input resource weightsω∈R|R|. The algorithms presented in this chapter are based on the multiplicative weights algorithm, and the resource weights grow proportionally to their usages.

Grigoriadis and Khachiyan [GK94] presented the first combinatorial fully polyno-mial approximation scheme for the general problem. The problem was also studied by Khandekar [Kha04], Jansen and Zhang [JZ08], and the fastest algorithm for the general problem was developed by M¨uller et al. [MRV11] for application in global routing in chip design. We refer to this paper for a broader overview of previous work.

κ

min-max resource sharing problem

Instance:

• A finite setR ofresources

• A finite setC ofcustomers

• For each customer c∈C

– a convex set Bc of feasible solutions

– a convex function gc :Bc → RR≥0 that describes its resource usages

Task: Find feasible solutions bc ∈ Bc for each c ∈ C minimizing the largest resource usage, i.e.

κ := inf{max

r∈R

X

c∈C

(gc(bc))r|bc∈Bc} is approximately attained.