Flow Description - Faster Circuits for And-Or Paths and Binary Addition

Logic synthesis is one of the first steps in the design process and for the following steps, the logical description typically remains fixed. However, during physical de-sign, it may turn out that the chosen implementation of the logic functionality was not the best choice, e.g., with respect to placement or timing. Now it would be desirable to find a better suited logically equivalent representation. As described in Section 7.2, other logic optimization techniques used for fixing timing problems dur-ing physical design work only locally on small fractions of the netlist, and often do not have provable approximation guarantees. In contrast to these methods, Bonn-Logic can resynthesize combinatorial paths of arbitrary length and thus resolve more complex timing problems.

BonnLogic is a timing optimization flow that is used in several steps of IBM’s physical design flow: for assertion generation before placement, i.e., generation of (required) arrival times for certain pins, for optimizing timing late in the pre-routing physical design flow, and for executing engineering change orders (ECOs). Here, we focus on the second application.

In a global view, timing optimization needs to be done in a careful way in order to find good trade-offs between signal speed and area / power consumption. Still, the most timing-critical part of a chip needs to be well-optimized as it is the limiting factor for the clock frequency of a computer chip. There are other timing and in particular logic optimization algorithms in the BonnTools that optimize timing globally. BonnLogic, however, is a tool for optimizing the most critical paths.

BonnLogic was originally developed by Werber, Rautenbach, and Szegedy [WRS07] and has been improved significantly during the last years, see a previous publication by Brenner and Hermann [BH20]. The idea ofBonnLogicis to improve worst slack by iteratively restructuring the most critical path. As observed by Werber, Rautenbach, and Szegedy [WRS07] (see also Section 7.3.1), optimizing a path can be reduced to optimizing an And-Or path, cf. Definition 2.5.1. Hence, the essential component ofBonnLogic is anAnd-Orpath optimization algorithm – in the original version by Werber, Rautenbach, and Szegedy [WRS07], this is the algorithm from Rautenbach, Szegedy, and Werber [RSW06]; and in the current version, this is Algorithm 6.3, which has a much better approximation guarantee (cf. Theorem 6.1.14 and Table 2.2) and performs much better on most instances, see Section 6.2.

AsBonnLogicis used during physical design, placement and timing information need to be taken into account during optimization. In Section 7.3.1, we hence adapt the simple delay model used in Algorithm 6.3 to respect placement, buffering and gate sizing effects. As we do not fully account for different kinds of gates or different gate sizes that might be available, our framework involves a technology mapping step (cf. Section 7.3.2) and powerful gate sizing and buffering routines (cf. Section 7.3.3).

As buffering and gate sizing perform accurate delay computations, which are very complex, this is by far the most time-consuming part of our flow and we need to

198Chapter7.BonnLogic:ALogicRestructuringFlo

Yes

No Has slack improved

by at least _min? Preoptimization:

Apply detailed optimization to P.

Revert changes of last preoptimization.

Normalize S and extract an And-Or path S' from S.

No Yes

Has slack improved by at least in last

iterations?

numit

Apply Algorithm 3.3 to S'.

Apply technology mapping to S.

For each sub-path S of P with length at most _max: start

Store S in list L of restructuring candidates.

Sort L by decreasing estimated slack gain.

Pop k candidates from L and tentatively apply detailed optimization to each of them.

Yes No

Is ≥ ?

Choose the candidate C with best

actual slack gain seen so far. Relax ._t

Yes

Is and has no subpath slack decreased beyond P's initial slack?

≥ min

Initialize := target.

end loop end New iteration:

Choose a critical path P.

No Implement netlist change C.

Figure 7.3: Flow chart for our logic optimization framework (cf. Section 7.3) with the path restructuring step in green.

limit its usage.

We now outline our logic optimization flow in detail before describing different sub-steps in the subsequent sections. The flow is illustrated in Figure 7.3. All parameters mentioned are chosen technology-dependently, but can also be modified by users.

BonnLogic iteratively optimizes the worst slack of the currently most timing-critical combinational path until the overall worst slack does not improve significantly anymore. A singleiterationworks as follows:

LetP denote a most critical path. During apreoptimization step, we first try to improve the slack ofP without changing its logical structure in order to diminish disruptions. To this end, we apply detailed optimization to P as described in Section 7.3.3. If a threshold slack improvement of δmin is exceeded, we keep the changes imposed by preoptimization and start the next iteration.

Otherwise, we discard the preoptimization’s changes and perform the path re-structuringstep (central, green part of Figure 7.3). This step works using internal data structures and internal virtual delay models; the netlist is not changed before detailed optimization (Section 7.3.3). Due to the inaccuracy of our timing model, we consider the possibility to optimize any sub-path S of P up to a maximum length of mmax. First, we apply a normalization step (Section 7.3.1) in order to extract anAnd-OrpathS⁰ fromS on which we run Algorithm 6.3 to determine the global structure of the replacement circuit. Then, thetechnology mappingroutine from Elbert [Elb17] (see also Section 7.3.2) locally modifies S to benefit from all gates available in the library. After having optimized all sub-paths of P, we store all restructuring possibilities in a listL, sorted by decreasing estimated slack gain.

For only the most promising fraction of restructuring options, we apply the time-consuming detailed optimization (cf. Section 7.3.3): First, we tentatively apply detailed optimization to the topmostk candidates in L. If the actual slack gain of the best solution exceeds δtarget, we choose this solution; otherwise, we iteratively decrease δ_target by a fixed value and try out the next k candidates in L until we reach δtarget or L is empty. Afterwards, we choose the restructuring candidate C with best actual slack gain δ_C for P among all detailed-optimized solutions. This way, we usually apply detailed optimization to only a few instances, but still find a good restructuring option. If δC ≥ δmin and if no side path slack has worsened beyond the initial slack of P, we implement this netlist change, possibly retaining parts ofP needed for side outputs. If the change is implemented and the slack gain over the last num_it iterations exceeds a threshold δit, we start the next iteration;

otherwise, we stop.

Note that this is a simplified flow description. E.g., in practice, we optimize the second critical path or the most critical path between register cells whenP cannot be further optimized.

7.3.1 Delay Model and Normalization

Our And-Or path optimization algorithm, Algorithm 6.3, expects as an input an alternating path of And2 and Or2 gates with prescribed input arrival times, and assumes that gates have a unit delay and connections do not impose any delay (see the delay model defined in Definition 2.3.2). However, the most critical path P contains arbitrary gates with varying delays, and the physical locations of the path inputs might be far apart, inducing undeniably high wire delays even after buffering.

A normalization step thus transformsP into a piece of netlist whose core part is anAnd-Orpath with appropriately modified input arrival times.

Im Dokument Faster Circuits for And-Or Paths and Binary Addition (Seite 197-200)