Conclusion - Evaluation of the Resource Sharing Model

8.10 Evaluation of the Resource Sharing Model

8.10.4 Conclusion

The resource sharing algorithms described in Section 8.5 and Section 8.6 improve over Algorithm 8.1 and Algorithm 7.4 because their running times are independent of the resource weights and problem widths. We conclude that the resource sharing model for the gate sizing problem leads to a fast approximation of the continuous relaxation, and allows to model timing objectives like worst slack maximization more directly (see also Section 8.3.3).

9 Experimental Results

Having compared the resource sharing with the Lagrangian relaxation approach for gate sizing from a theoretical point of view, we now turn towards a comparison in practice.

We implemented a Lagrangian relaxation approach and a new resource sharing algorithm with path resources for discrete sizing and Vt optimization. Both algo-rithms are built around a common oracle algorithm for sizing and V_t optimization that can be run in parallel. Resource weights and Lagrange multipliers are updated sequentially in each iteration and fed into the oracle algorithm, which queries the resource weights and Lagrange multipliers as needed.

Using the same oracle algorithm enables us to directly compare the different weight update schemes. The purpose of this chapter is to get a direct comparison between both algorithms, and a first evaluation of the resource sharing model for gate sizing and V_toptimization in practice.

As a sizing oracle we extend the local search based sizing tool BonnRefine(Held [Hel09]) that is part of the BonnTools optimization suite for VLSI physical design.

We start with a description of the sizing oracle in Section 9.1 followed by imple-mentation details of our Lagrangian relaxation (LR) algorithm in Section 9.2 and path resource sharing (RS) algorithm in Section 9.3. The framework for the path resource weights was provided by S. Daboul. We describe our testbed and setup in Section 9.4, including our choice of starting solutions, evaluation metrics and different optimization modes. Finally, both algorithms are compared in Section 9.5 on a testbed consisting of 8 microprocessor units provided by our industrial partner IBM with 22 nm and 14 nm technology, and the ISPD 2013 benchmarks (Ozdal et al. [Ozd+13]) in Section 9.6. We conclude the chapter with a short summary and an outlook on future research.

9.1 BonnRefine as Oracle Algorithm

Our oracle algorithm returns a solution to the discrete power-delay tradeoff problem (5.2), but in general not an optimal solution. As discussed in Section 5.2.1, no approximation algorithms are known for this problem.

We employ a discretized version of Algorithm 5.1 that is widely used in practice.

This algorithm optimizes gates iteratively in reverse topological order, and for each gate the discrete solution which minimizes its local refine function as defined in (5.4) is chosen. Recall that the local refine function for a gate gi ∈ G is the weighted sum of its power consumption and the edge delays in its neighborhood graph E_g_i

(cf. page 24). More formally,

trx(xi, ω) :=ωm+1cost(xi) + X

e∈E_gi

ωedelaye(x)

for x ∈Xdisc and weights ωm+1 ∈R≥0, ωe ∈R≥0 for e∈E. Here all entries ofx are fixed except for the i-th entry.

In the LR algorithm, ωm+1 equals 1 and the weights ωe for e∈ E correspond to the Lagrange multipliers. In the RS algorithm, the weights correspond to the edge weights that are derived from the path resource weights, divided by the resource budgets. This will become clear in Section 9.3. We integrated the budgets into the weights to simplify notation.

Recall that in the LR algorithm we aim to find sizes and V_t levels that are close to the optimal solution, while in the RS algorithm we are interested in sizes and V_t levels such that the value of the power-delay tradeoff function is close to the optimum. In both cases, no approximation algorithms are known, and we do not distinguish between the two objectives in the following.

Furthermore, recall that the local refine function of a gate g_i depends on the sizes of other gates. The oracle algorithm aims to minimize the power-delay tradeoff function for given weights, but in practice it is not clear with which sizes and Vt

levels to start when optimizing the gates iteratively. It is reasonable in practice to start with the solution computed in the last iteration of the LR and RS algorithm, which is what we did in our implementations. In the RS algorithm, a convex combination of the solutions computed in the previous iterations can also be used by assigning capacitances and slews appropriately, because existing convergence guarantees refer to convex combinations of solutions.

Our oracle uses the infrastructure of the sizing tool BonnRefine, which, in its general setting, computes local slack optima under arbitrary delay models based on local search. We added as new solution evaluation metric the value of the local refine function, to which we refer from now on as refine cost. Furthermore, we

Refine cost

integrated Vt optimization intoBonnRefine and refer to a new size orVtlevel for a gate, or a combination of both, as solution candidate.

Solution candidate

The industrial timing engine IBM EinsTimer is used for all delay, slew and slack calculations under the Elmore delay model [Elm48]. Wires are estimated as ap-proximately shortest Steiner trees. To bound the running time of the algorithm, delay recalculations are restricted to a bounded number of logic levels around each gate by the timing engine. This prevents propagation of delay and slew changes through the whole timing graph in each sizing and V_t optimization step, but in-troduces small inaccuracies. In our setting, we restrict delay recalculations to the neighborhood graph of the gate.

Nonetheless, it is time-consuming to evaluate the local refine function for each solution candidate of a gate. Therefore we skip some candidates as follows: Let s be the size of the current solution of a gate g ∈ G. Starting with the next smaller size than s, all smaller sizes are tested in order of decreasing size. For each size,

9.2 Implementation of a Discrete Lagrangian Relaxation Algorithm

Im Dokument Algorithms for Circuit Sizing in VLSI Design (Seite 145-149)