Determination of Failing Timing Paths - 5 Approximation at Gate Level for Voltage Scaling

5 Approximation at Gate Level for Voltage Scaling

5.1 Determination of Failing Timing Paths

at once, which reduces the complexity a lot. Furthermore, this allows to parallelize the computations. In the end, however, the individual scaling results have to be carefully joined, which will be also explained later in this work.

In previous works, as it has been already summarized in Chapter 2, mainly manual methods were presented. Most works presenting practical examples for voltage over-scaling, performed the approximation by simply scaling the supply voltage as a whole, simulating the circuit with updated timing information and observing the effects. Based on inspection, some of the works then introduced manual optimization of the circuit in order to optimize the timing behavior. This is clearly a valid approach, however in this work the process is tried to be automated wherever possible. Approaches that propose to introduce multiple voltage domains, to approximate the circuit only where possible, are rare. Most approaches use Monte-Carlo simulations to find a distribution of supply voltages that fits the demands on the resulting error rate. This again requires lengthy simulations to determine the behavior of voltage scaled operating points. Only very few approaches exist that propose an analytic approach to determine the resulting error rates if the timing constraints are violated, but these are also limited to specific circuits. In this work an analytic, generic approach is presented in order to determine the minimal supply voltage for each gate in the circuit for an approximate operating point. An overview of the methodology is shown in Figure 5.2. One can see that the flow consists of basically two loops. The outer one is executed for each endpoint and the inner one is then scaling the voltage for each of these endpoints. The outer loop can easily be parallelized, as the dependencies between the endpoints do not have to be solved immediately. Requirements for the algorithm are apart from the circuit itself, the tolerable error rates at the circuit outputs as determined in the previous chapter. The result of the algorithm is a list, denoting the minimum supply voltage for each gate in the circuit.

5.1 Determination of Failing Timing Paths

As already mentioned, in order to divide the problem of determining the effects of voltage over-scaling into manageable peaces, each timing endpoint is analyzed separately. Hence, the group of elements whose behavior has to be analyzed at once is the “fanin” of the endpoint up to the next register stage. Figure 5.3 is showing the schematic of benchmark circuit “c17”. Marked in orange is the fanin of one timing endpoint. Each of these domains is analyzed separately, as the maximum error probability that can be accepted at the endpoint is already known. One can see that the fanin consists of several paths, each having an influence on the Boolean value at the flip-flop input. Hence, each of these paths, if failing, can falsify the correct value.

A timing path is defined as the connection between two flip-flops with only combina-tional logic in between. A timing path is furthermore defined by the type of transition at the input, a rising or falling edge. Additionally, the transitions at all inputs of all combinational gates compromising the path uniquely define the path. One can see that due to this variety of transitions the fanin of an endpoint consists of many timing paths.

Even the connection between one startpoint and one endpoint defines multiple timing

Figure 5.2: Overview of the voltage over-scaling methodology presented in this work

paths. The more gates are between a start and an endpoint, the more timing path exist, as the number of combinations grows rapidly. This has to be kept in mind when later calculating the resulting error rate due to timing violations. At first one has to find out, which timing paths fail at which over-scaled supply voltage. In order to do so, the knowl-edge about the propagation delay of the gates of the used technology is required. Usually the propagation delay is given only for the nominal voltage of the technology. Sometimes also the technology parameters of a low-voltage and a high-voltage operating point are given. Hence, the behavior of the paths for other supply voltages has to be determined by SPICE simulations at transistor level. This allows to simulate the propagation delay of the fanin for several supply voltages, even outside of the specifications. This approach

5.1 Determination of Failing Timing Paths

Figure 5.3: Schematic of benchmark circuit “c17”. Marked in orange the fanin of the one output register

works well and delivers very accurate results [132, 139]. Care has to be taken that wire load models have to be taken into account. Furthermore, one must not forget to consider all load capacitance of the gates, when analyzing the fanin of an endpoint separately from the rest of the circuit. When performing this approach it is necessary to extract the fanin of an endpoint and convert it to a SPICE netlist. This can be achieved using the TCL interface of Synopsys’ “Design Compiler” and some additional scripts. The main limitation of this approach is the speed of the SPICE simulations. One has to keep in mind that realistic circuits consist of billions of endpoints. Furthermore, the fanin of an usual endpoint is not as small as the one in the example circuit “c17”. Simulating all these fanin groups for a variety of supply voltages requires a significant amount of time.

Fortunately, yet another approach can be used to determine the failing timing paths for an over-scaled supply voltage. The “Composite Current Source” model allows to in-terpolate between the operating points of technology libraries with very good accuracy [127]. Synopsys uses this technology for instance in their timing analysis tool “Prime-Time”. PrimeTime can be used to assign supply voltages to each gate in the circuit and perform a timing analysis using the well-established models for noise and parameter variation. This simplifies the determination of failing timing paths a lot.

At first, all endpoints, i.e. all registers have to be identified in the circuits. This can easily be done using the TCL interface. The same scripts can be used as when using

“Design Compiler”. Then for each endpoint the fanin is identified. As already mentioned, only the fanin up to the next register stage has to be considered. If there is no previous register stage, the fanin ends at the circuit input pins. After this step all elements com-promising the fanin of an endpoint are known and the supply voltage of these elements can be scaled. Scaling is simply done by assigning a new (scaled) supply-voltage to each gate of the fanin and perform the timing analysis. The timing analysis is not slower

than with nominal voltages. The timing analysis is now automatically done by interpo-lating the various parameters of the technology library for the assigned supply voltage.

The timing results can then be analyzed, again using the TCL interface, in order to detect which paths fail at which supply voltages. By repeating this step for an interval of voltages, one can determine which timing paths fail at which supply voltage. This operation can now be performed for all timing endpoints and their fanin. The benefit of this approach compared to the SPICE based analysis is clearly the speed. The timing analyis in PrimeTime is several magnitudes faster than the SPICE simulation. Addi-tionally, PrimeTime utilizes well-established models for estimating parameter variations, thermal noise models, temperature variations and wire load models. In order to generate trustworthy results in SPICE, these parameters would have to be modeled as well.

Figure 5.4 is showing a plot of the sum of failing timing paths for three benchmark circuits versus the supply voltage. The results have been determined using Synopsys PrimeTime. Benchmark circuit “simple circuit” is the circuit shown in Figure 5.1a, con-sisting of 3 inputs, one NAND2 gate connected to one AND2 gate. Benchmark circuit

“c17” is shown in Figure 5.3. It has 4 inputs and 2 outputs connected by 6 NAND2 gates.

Benchmark circuit “c432” has 36 inputs and 7 outputs connected by 160 logic gates. One

0.6 0.7 0.8 0.9 1.0 1.1 1.2

Supply Voltage [V]

10⁰ 10¹ 10² 10³

SumofFailingTimingPaths

c432 EP 4 c432 EP 3 c432 EP 2 c17 EP 1 simple circuit

Figure 5.4:Sum of failing paths (selected endpoints EP) depending on supply voltage for bench-mark circuits: “simple”,”c17” and “c432” [145]

can see that for large, realistic sized circuits, the number of failing paths is increasing very fast. As it has been already mentioned, the reason for this is the rapidly increasing number of timing paths per endpoint due to the increasing number of transition combi-nations. Even though an endpoint with a large fanin consists of many timing paths, it does not mean that the resulting error rate due to timing violations is increasing equally

Im Dokument Automated Power Optimization of Sequential Integrated Circuits through Approximate Computing (Seite 145-149)