• Keine Ergebnisse gefunden

Projection of the Lagrangian Multipliers

4.3 The Timing Driven Placement Problem

4.3.4 Projection of the Lagrangian Multipliers

Projection subproblem

Instance:

• A graph G = (V = Vp∪V∪V+, E) without isolated nodes, where

Vdef=

v ∈V | deg(v) = 0 , V+ def=

v ∈V | deg+(v) = 0 and Vp def= V \(V∪V+).

• Arc-weighting λ∈RE.

Goal:: Find a nonnegative weighting λ0 ∈RE≥0 of the arcs minimizing kλ−λ0k22

subject to the flow conservation equalities on Vp: X

e∈E e=v

λ0e− X

e∈E e+=v

λ0e = 0 ∀v ∈Vp.

The computation of a projection to the set of nonnegative flows is a non-trivial task regarding that the gate graphs occuring in the layout process of todays chips have several millions arcs. The projection problem is well known from combinatorial optimization as the minimum cost flow problem with quadratic cost function. A polynomial algorithm based on scaling and successive piecewise linear approximation and run-time O(c()|E||V|2) was given in [Minoux 1984]. It is the special case of the more general class of minimum cost flow problems with convex separable cost function which have been studied extensively. The practically most efficient algorithm by [Ibaraki, Fukushima and Ibaraki 1991] is based on the generalized Newton method where the Hessian is replaced by approximation matrices which hap-pen to be weighted combinatorial Laplacians of the timing graph with suitable weights.

However, exact projections have turned out to be computationally expensive in practice. Keep in mind that the projection is required only to keep the

Lagrangian multipliers (that measure the timing criticalities of the arc) near to the feasible region. The “real” optimization of the placement happens while solving theplacement subproblem described in the next section. Since the su-perlinear scaling of the projection steps, it tends to dominate the run time of the whole algorithm even for moderately large designs. This problem has been noticed in [Chen, Chu and Wong 1999] since the projection was also used to solve the gate sizing problem by a similar method. They observed that the practical run time of the algorithm was about O(|V|1.7) for the whole sub-gradient method. This is acceptable for instances up to 100000 nodes, but after that the run time degrades considerably. To cure this problem, sever-al authors ([Muuss 1999] and [Sechen and Tennakoon 2002]) proposed some heuristics with linear run time but without theoretical guarantees of conver-gence.

This was the motivation to study a different variant of the constrained sub-gradient mehod in Section 3.5. The main idea is to project λ to the whole flow space and after that to the set of nonnegative vectors. Since both sets are polyhedral, we get by Theorem 3.6.8 that they intersect nicely and therefore by Theorem 3.7.4 the method described here converges as well.

Of course, it is to be demonstrated that λ can be very efficiently projected to the whole flow space. Here is the place where the combinatorial Laplacian enters the picture again.

One problem is that we have only a subset of nodes Vp for which the flow equalities have to hold. To make the problem more homogenous, we augment Gby a dummy supernode s representing the reference 0 point of time. Addi-tionally, for each nodev ∈V an arc fromstov and for each nodev ∈V+an arc from v to s is added to the graph. In order to eliminate the arrival time constraints for the nodes in V∪V+, we add delay constraints for the newly added arcs incident with the dummy supernode: for the arcs e = (v, s), we define the delay function over e by d(e,p) def= −av. For the arcs e= (s, v) we putd(e,p)def= av. It is immediate that the new set of constraints6 is equivalent to the original set of timing constraints. The resulting graph is denoted by G0 = (V0 def= V ∪ {s}, E0 def= E∪E), where ˜˜ E is the set of newly added arcs. Let D denote the node-arc incident matrix ofG0. We assume thatG0 is connected (which is the typical case in practice), otherwise the problem can be separated into disjoint subproblems. Therfore, the rank of D is |V0| −1 = |V| and the deletion of any row ofD results in a matrix with the same rank. Let U ⊆RE˜

6Note that the arrival time of the supernode can be left variable.

4.3. THE TIMING DRIVEN PLACEMENT PROBLEM 83 denote the oriented cutset space of G0 which is generated by the columns of DT. If we delete an arbitrary row of D (for example the one corresponding to the supernode s), then for the resulting matrix Ds, DsDTs (a submatrix of the combinatorial Laplacian of G0) is positive definite and DTs is a minimal generating matrix of U.

Given λ ∈ RE, the task is to compute the orthogonal projection PU(λ) to the flow space. It is basic linear algebra that PU+PU is the identity function of RE, that is

PU(λ) =λ−PU(λ).

The orthogonal projection to U can be performed by solving

DsDTsx=Dsλ. (4.3)

Then DTsx = PU(λ). Note that D is a sparse matrix having exactly two nonzero entries in each column, so for given λ and x, Dsλ and DTsx can be both computed in O(|E|+|V|) time. DsDTs is a sparse matrix with at most

|V0|+ 2|E0| ≤ 3|V|+ 2|E| nonzero entries so one iteration of the conjugated gradient method can be performed inO(|E|+|V|). To obtain an exact solution of (4.3), |V| iterations are needed. In pratice, however, only a small number of iterations (well under 100) are sufficient to get a very good approximative solution if one uses suitable preconditioning techniques. So the practically observable run-time of the ovarall projection scales in practice almost linearly with the size of the graph.

A slight annoyance induced by this method is that there are simple examples for which the Lagrangian multipliers are partially increased for timing feasible instances. The exact projection to U≥0 guarantees that the Lagrangian mul-tipliers never increase if the design is feasible in the current step. In the long run, the algorithm with cyclic projection converges as shown in Theorem 3.7.4, but the rate of convergence may be impaired by this phenomenon. Though it seems that the overall run-time improvement compared to an exact projection cancels this effect. An interesting idea to cure this shortcoming is to combine the subgradient method with the method of reflection-projection proposed in [Buschke and Kruk 2002] instead of the method of cyclic projections.

Of course the question arises what to project. The literature ([Chen, Chu and Wong 1999], [Langkau 2000] and [Sechen and Tennakoon 2002]) suggests for the similar gate-sizing problem that the arrival times are propagated by static timing analysis and the arc-slacks with respect to this propagation are added to the multipliers and used in the projection. It may be based on the

idea that the arrival time vector has to become primal feasible once the design gets feasible. However, it turns out that the propagated arrival times do not matter at all.

It is clear that

PU

≥0(λ) =PU

≥0(PU(λ)),

since U is a linear subspace of RE. On the other hand, choosing a different arrival time vector means the addition of an element ofU toλ. After projecting to U, this contribution vanishes again. This means that one does not need to propagate at all to perform the timing optimization: ρde can be added to λe for each arc, where de is simply the delay over the arc. The projection of the Lagrangian multipliers already “propagates” the design in some sense.

Note that we have transformed the boundary conditions on the arrival times at V+∪V into delays over the arcs of the newly inserted dummy node, so they affect the overall result of the projection.

Another interesting side-effect is that this method is able to optimize designs with variable clock arrival times without the necessity of performing clock-skew scheduling explicitly which in turn suggests that the clock-skew scheduling it-self could be performed by a projection algorithm instead of the combinatorial ones in [Albrecht 2001], [Albrecht et al. 2002] and [Held 2001]. This is possi-ble, but the main complication is that the delays must be projected to the space of nonnegative flows for which the run-times of best known algorithms scale superlinearly with the size of the graph (Projecting to the flow space could result in suboptimal scheduling even for timing-feasible designs). Methods based on combinatorial potential-balancing algorithms seem to run sufficient-ly fast in practice, so without additional benefits the use of such methods is not justified. Different analytical methods based on a constrained quadrat-ic programming approach were proposed in [Kourtev and Friedman 1999] to improve robustness of the clock-scheduling.