• Keine Ergebnisse gefunden

Undetermined-Circuit Dynamic Program

5.6 Computational Results

6.1.2 Undetermined-Circuit Dynamic Program

10 9 12 6 4 0 5 1 12 14

t9 t8 t7 t6 t5 t4 t3 t2 t1 t0

5 2

7 6

10 13 7

11

14 split 4

15 split 3

16 split 2

17 split 1

(a) Another circuit implementing the And-Or path from Figure 6.1(a) which has been considered by Algorithm 6.1, but has delay17.

10 9 12 6 4 0 5 1 12 14

t9 t8 t7 t6 t5 t4 t3 t2 t1 t0

5 2

7 6

10 13 7

11 12

13 14 15

(b)Circuit with optimum delay15arising from the circuit in Figure 6.2(a) by per-forming Huffman coding on the group of Orgates.

Figure 6.2: The circuit on the left-hand side is a candidate solution of Algorithm 6.1 (page 162) forf0,0,9 obtained by the recursion formulas (6.4) to (6.6) as described in Example 6.1.6. The circuit on the right-hand side cannot be computed by Algorithm 6.1, but is delay-optimum for the instance in Figure 6.1(a) as the critical input t0 traverses only 1gate, which is best possible.

There is nothing to be done for the computation off0,0,0 =t0 or f1,1,1, =t1.

Split 3: Now, we apply the alternating split with an even prefix (see Equa-tion (6.5)) with λ= 2:

f2,2,9 =f2,2,5∨f2,6,9 The sub-functionf2,2,5 is realized by the standard circuit.

Split 4: Here, the alternating split with an even prefix (see Equation (6.5)) is applied withλ= 1:

f2,6,9 =f2,6,7∨f2,8,9

As the two arising sub-functions are symmetric, they can be constructed using Huffman coding. Note that the circuit fort2∧t4∧t6 is used in both sub-circuits.

In Figure 6.2(a), one can see very well where Algorithm 6.1 (page 162) lacks flexibility: Each drawn split line partitions the circuit into three parts: two sub-circuits, and a concatenation gate. The algorithm optimizes the two sub-circuits separately and does not cross these split lines during optimization. In this concrete example, re-arranging the Orconcatenation gates as shown in Figure 6.2(b) is not possible for Algorithm 6.1. However, this would lead to a circuit with delay 15 which is by one better than the delay of the circuit in Figure 6.1(b) computed by Algorithm 6.1.

postpone the construction of symmetric trees until all their inputs are known. This motivates the following definition.

Definition 6.1.7. A circuit C with a single output is called an undetermined circuitif

• all its gates are Andor Orgates and

• all gate vertices with the possible exception of out(C)have fan-in two.

We denote the gate type ofout(C) by gt(C)∈ {And,Or}.

Note that any circuit over the basis Ωmon = {And2,Or2} is an undetermined circuit. Another two undetermined circuits are shown in Figure 6.3.

Different to Algorithm 6.1 (page 162), we now allow undetermined circuits as implementations of the functions fi,j,k in intermediate solutions. Of course, the circuit we finally compute still must be a circuit over Ωmon = {And2,Or2}. We extend the definition of the weight of circuit inputs to undetermined circuits in order to compare different implementations realizing the same Boolean function.

Definition 6.1.8. Given an undetermined circuitC on Boolean inputst0, . . . , tm−1 with input arrival timesa(t0), . . . , a(tm1)∈N, we define theweightof C as

W(C) := X

v∈δ1(out(C))

2a(v).

Given the weight of an undetermined circuit, we can estimate the delay of a canonical logically equivalent circuit overΩmon.

Lemma 6.1.9. Given an undetermined circuitCon inputst0, . . . , tm1 with integral arrival times a(t0), . . . , a(tm1) ∈N, we can construct an equivalent circuit Ce over Ωmon ={And2,Or2} with delay

Ce

=

log2(W(C)) .

Proof. Initialize Ce with the circuit obtained fromC by deleting out(C). Now,Ce is a circuit overΩmon, but has multiple outputs, sayv0, . . . , vk−1. Propagate the input arrival times throughCeto compute arrival timesa(v0), . . . , a(vk1). Applying Huff-man coding (see Theorem 2.3.21) withv0, . . . , vk1 as inputs anda(v0), . . . , a(vk1) as input arrival times yields a circuitH with delaydlog2(W(C))e. Adding all gates and edges ofH toCe yields the required circuit.

The rough idea of Algorithm 6.3 is again to compute a dynamic programming table which contains a circuitCi,j,kforfi,j,k for every0≤i≤j≤k≤m−1withj−i even, but now, all intermediate solutions Ci,j,k are undetermined circuits. Hence, we call Algorithm 6.3 the undetermined-circuit dynamic program in order to highlight the contrast to the binary-circuit dynamic program in Algorithm 6.1.

Again, we apply the alternating splits (6.4) and (6.5) in order to recursively compute Ci,j,k, but the symmetric split (6.6) is used only with λ= 1, i.e.,

fi,j,k =fi,i,i∧fi+2,j,k. (6.7)

In the proof of Proposition 6.1.12, we shall see that the other symmetric splits are not needed anymore in Algorithm 6.3.

The three types of splits are extended to undetermined circuits as follows.

1 0 2 4 3 1 2 2 0 1 1 1 2

t12

t11

t10

t9

t8

t7

t6

t5

t4

t3

t2

t1

t0

(a)An undetermined circuit forf0,4,8with weight22+ 21+ 20+ 26= 71.

1 0 2 4 3 1 2 2 0 1 1 1 2

t12

t11

t10

t9

t8

t7

t6

t5

t4

t3

t2

t1

t0

(b) An undetermined circuit for f5,9,12

with weight 22+ 21+ 24+ 23= 30. Figure 6.3: Two undetermined circuits and their weights.

1 0 2 4 3 1 2 2 0 1 1 1 2

t12

t11

t10

t9

t8

t7

t6

t5

t4

t3

t2

t1

t0

(ti, . . . , tj−1) (tj, . . . , tj+2λ) (tj+2λ+1, . . . , tk)

out(C1), fi,j,j+2λ out(C2),fj+1,j+2λ+1,k

c0,fi,j,k

(a) Combining the undetermined circuits from Figure 6.3 to a circuit forf0,4,12 ac-cording to split (6.4) with λ = 2. This does not yield an undetermined circuit.

1 0 2 4 3 1 2 2 0 1 1 1 2

t12

t11

t10

t9

t8

t7

t6

t5

t4

t3

t2

t1

t0

(ti, . . . , tj−1) (tj, . . . , tk)

out(C),fi,j,k

(b) Undetermined circuit C arising from applying Algorithm 6.2 to the circuits C1

and C2 from Figure 6.4(a). We have W(C) = 22+ 21+ 20+ 26+ 25= 103.

Figure 6.4: Computing an undetermined circuit for fi,j,k withi= 0, j = 4 and k = 12 using the alternating split (6.4) with prefix-length 5 = 2λ+ 1and Algorithm 6.2 (page 169).

Let us consider a circuit C for fi,j,k arising from the split fi,j,k =fi,j,j+2λ∧fj+1,j+2λ+1,k

with λ ∈ N and 0 ≤ λ ≤ k2j1 as in Equation (6.4). Now, the circuits Ci,j,j+2λ for fi,j,j+2λ and Cj+1,j+2λ+1,k for fj+1,j+2λ+1,k are both undetermined circuits. Ac-cording to the split, using an And gate, we can combine them to a circuit C0 for fi,j,k, but this will not necessarily be an undetermined circuit, see the example in Figure 6.4(a). In Figure 6.4(b), we can see how C0 is turned into an undetermined circuit C for fi,j,k in this case.

The general procedure how to merge two undetermined circuits C1 and C2 is described in Algorithm 6.2: When gt(Ci) coincides with the gate type ◦ of the concatenation gate, the inputs ofout(Ci)are simply connected toout(C), otherwise, the symmetric tree at out(Ci) may be completed using Lemma 6.1.9. This means that we do not decide for an implementation of the symmetric tree computing the logic function atout(Ci)until we know that no other possible inputs of the symmetric tree will emerge in later steps of the algorithm.

In Algorithm 6.2, we see that the computed circuitC highly depends on whether

Algorithm 6.2: Merging undetermined circuits

Input: Two undetermined circuitsC1 and C2 computing Boolean functions h1 andh2, respectively, depending on inputs with integer arrival times; a gate type◦ ∈ {And,Or}.

Output: An undetermined circuitC computingh1◦h2.

1 LetC be the union of the circuitsC1 andC2.

2 Add a ’◦’-gate v0 to C.

3 fori←1 to 2do

4 Letv1, . . . , vk be the inputs ofout (Ci).

5 if gt(Ci) =◦then

6 Removeout (Ci)from C and add edges (v1, v0), . . . ,(vk, v0)to C.

7 else

8 Construct a circuitCei over Ωmon computing hi using Lemma 6.1.9.

9 Add all edges and gates fromCei to C.

10 Add an edge

out

Cei , c0

toC.

11 returnC

gt(Ci) =And or gt(Ci) =Or. As a consequence, our dynamic programming table now contains two undetermined circuits for every occurringfi,j,k: circuits Ai,j,k and Oi,j,k which have minimum weight among all recursively constructed circuits with gt(Ai,j,k) = And and gt(Oi,j,k) = Or, respectively. Here, we use the weight of the undetermined circuits to decide which circuit to store as by Lemma 6.1.9, two undetermined circuitsC1andC2withW(C1)≤W(C2)fulfilldelay(C1)≤delay(C2).

Apart from that, Algorithm 6.3 is very similar to Algorithm 6.1, but due to the use of undetermined circuits, we may omit some initial symmetric tree constructions.

The following lemma implies that Algorithm 6.3 correctly computes a circuit for f0,0,m1.

Lemma 6.1.10. Consider the application of Algorithm 6.3 to Boolean input variables t= (t0, . . . , tm1) with arrival times a(t0), . . . , a(tm1)∈N. Let 0≤i≤j ≤k < m withj−ieven be given. Ifk−j≤1, we haveAi,j,k =V

l∈{i,i+2,...,j−2,j}tl. Otherwise, bothAi,j,k and Oi,j,k exist.

Proof. We prove the statement by induction onN(fi,j,k)∈N. Whenk−j = 0, the circuit Ai,j,k =V

l∈{i,i+2,...,j2,j}tl is computed in line 4.

For k −j = 1, a candidate circuit for Ai,j,k can be obtained the following way: The realization fi,j,k = fi,j,j ∧ fj+1,j+1,k arises from the alternating split (6.4) with λ = 0 in line 6. By the first case, we have Ai,j,j = V

l∈{i,i+2,...,j−2,j}tl and Aj+1,j+1,k = V

l=j+1tl = tj+1. After application of Algorithm 6.2, we obtain A=V

l∈{i,i+2,...,j2,j}tl as a candidate circuit forAi,j,k. As A has optimum weight among all possible realizations forfi,j,k, we haveAi,j,k=A.

Now assume thatk−j ≥2, which impliesN(fi,j,k)≥2. By induction hypothesis, for all functions fi0,j0,k0 such that 0 ≤ i0 ≤ j0 ≤ k0 ≤ m −1 with j0 −i0 even and N(fi0,j0,k0) < N(fi,j,k), a realization Ai,j,k is computed. Hence, a candidate realization for Ai,j,k can be obtained via the alternating split Equation (6.4) with an odd prefix length 2λ+ 1 with λ = 0, i.e., fi,j,k = fi,j,j∧fj+1,j+1,k . For Oi,j,k,

Algorithm 6.3: Undetermined-circuit dynamic program for delay opti-mization of And-Orpaths

Input: Boolean input variables t= (t0, . . . , tm1) with arrival times a(t0), . . . , a(tm1)∈N.

Output: A circuit overΩmon computing f0,0,m−1.

1 for l←1 to mdo

2 for 0≤i≤j≤k < m s.t. j−ieven and N(fi,j,k) =ldo

3 if k=j then

4 Ai,j,j:=V

l∈{i,i+2,...,j2,j}tl

5 C :=list of candidate undetermined circuits for fi,j,k arising from applying split (6.4), (6.5) or (6.7) with any validλ, followed by application of Algorithm 6.2.

6 Ai,j,k= argmin

W(C) :C ∈ C with gt(C) =And

7 Oi,j,k = argmin

W(C) :C ∈ C with gt(C) =Or

8 if m≤2then

9 C :=A0,0,m−1

10 else

11 C := argmin

W(A0,0,m−1), W(O0,0,m−1)

12 Apply Lemma 6.1.9 to transform C into a circuitC˜ overΩmon.

13 return C˜

a candidate circuit can be obtained via the alternating split Equation (6.5) with an even prefix length 2λ withλ= 1, i.e., fi,j,k =fi,j,j+1∧fi,j+2,k . Hence, the list C in line 5 contains undetermined circuits both with Andand Oras output gate type, and Ai,j,k andOi,j,k both exist.

A similar statement as Observation 6.1.2 can be shown for Algorithm 6.3.

Observation 6.1.11. Let inputs variables t = (t0, . . . , tm1) with arrival times a(t0), . . . , a(tm1) ∈ R be given. We call a circuit C for g(t) a split circuit if it arises from the construction of optimum symmetric trees on input vectors of the form ti, ti+2, . . . , tj2, tj, the recursive application of (6.4), (6.5) and (6.7) followed by Algorithm 6.2 and the dualization of circuits. Moreover, we call a circuit C for g(t) split-optimum if it is a split circuit that has optimum delay among all split circuits forg(t). For integral arrival times, Algorithm 6.1 computes a split-optimum circuit for g(t).

The following theorem states that regarding delay, the circuit computed by Algorithm 6.3 is always at least of good as the one computed by Algorithm 6.1.

Proposition 6.1.12. Let Boolean input variables t = (t0, . . . , tm−1) with arrival times a(t0), . . . , a(tm1) ∈ N be given. Consider the circuits C and Ce computed by Algorithm 6.1 and Algorithm 6.3 for this instance, respectively. Then, we have

delay(C)e ≤delay(C).

Proof. For0≤i≤j≤k < mwithj−ieven, let Ci,j,k be as in Algorithm 6.1 and Ai,j,k and Oi,j,k as in Algorithm 6.3. We will first prove the following claim.

Claim. For every0≤i≤j≤k < m withj−ieven, we have delay(Ci,j,k)≥

(log2(W(Ai,j,k))

ifgt(Ci,j,k) =And, log2(W(Oi,j,k))

otherwise . Proof of claim: We prove the claim by induction onN(fi,j,k).

If k ∈ {j, j + 1}, Algorithm 6.1 constructs an optimum symmetric tree Ci,j,j with gt(Ci,j,k) = And for fi,j,k. In this case, by Lemma 6.1.10, we have Ai,j,j :=

V

l∈{i,i+2,...,j2,j}tl. Thus, we have log2(W(Ai,j,j))

=

 log2

X

l∈{i,i+2,...,j2,j}

2a(tl)

= delay(Ci,j,j).

Now assume that k ≥ j + 2. We have N(fi,j,k) = j2i + k− j + 1 ≥ 2.

Hence, by induction hypothesis, we can assume that the statement holds for all 0≤i0 ≤j0 ≤k0 ≤m−1 withj0−i0 even and N(fi0,j0,k0)< N(fi,j,k),

The circuitCi,j,k is computed by a split of type (6.4), (6.5), or (6.3).

Case 1: Assume that the split is of type (6.4) or (6.5).

Let C1 and C2 denote the circuits used in the split by Algorithm 6.1. By Lemma 6.1.10, the table computed by Algorithm 6.3 contains circuits C10 and C20 where for eachr ∈ {1,2},Ci0 is equivalent toCi andgt(Cr) = gt(Cr0). By induction hypothesis, we have

log2(W(Cr0))

≤ delay(Cr) for each r ∈ {1,2}. Hence, the circuitC0 arising from mergingC10 and C20 by Algorithm 6.2 fulfills

W(C0)≤2dlog2W(C01)e+ 2dlog2W(C20)e ≤2delay(C1)+ 2delay(C2).

Thus, assuming without loss of generality thatdelay(C1)≤delay(C2), this implies log2(W(C0))

log2

2delay(C1)+ 2delay(C2)

log2

2·2delay(C2)

delay(C2)∈N

= delay(C2) + 1

= max

delay(C1),delay(C2) + 1

= delay(Ci,j,k). (6.8)

Case 2: Assume that Ci,j,k is computed via a symmetric split (6.3), i.e., Ci,j,k = Ci,i+2λ2,i+2λ2∧Ci+2λ,j,k for some 1≤λ≤ j2i.

An undetermined circuit A0i,j,k for fi,j,k can be obtained by recursive application of split of type (6.7), i.e.,

fi,j,k =fi,i,i

fi+2,i+2,i+2

. . .∧ fi+2λ2,i+2λ2,i+2λ2∧fi+2λ,j,k

, (6.9) followed by Algorithm 6.2 after every split. As Ai,j,k is a split-optimum circuit for fi,j,k, we haveW(Ai,j,k)≤W(A0i,j,k), and it suffices to show that

log2

W(A0i,j,k

≤delay(Ci,j,k). (6.10)

By Lemma 6.1.10, both Ai+2λ,j,k and Oi+2λ,j,k have been computed, and a trivial realization Ar,r,r =tr has been computed for all r ∈ {i, i+ 2, . . . , i+ 2λ−2}. Let Ci+2λ,j,k0 be the circuit amongAi+2λ,j,k andOi+2λ,j,k with the same output gate type asCi+2λ,j,k. Then, by induction hypothesis, we have

log2

W(Ci+2λ,j,k0 )

≤delay(Ci+2λ,j,k). (6.11) As all the outer gates in Equation (6.9) areAndgates, we have

W(A0i,j,k) ≤ X

r∈{i,i+2,i+2λ2}

W(tr) + 2

log2

W(Ci+2λ,j,k0 )

Alg. 6.1,l.4

≤ 2delay(Ci,i+2λ−2,i+2λ−2)+ 2

log2

W(Ci+2λ,j,k0 )

(6.11)

≤ 2delay(Ci,i+2λ−2,i+2λ−2)+ 2delay((Ci+2λ,j,k)).

From this, we can show Equation (6.10) the same way as in Equation (6.8) of case 1.

This proves the induction step and hence the claim.

In the case that m ≤ 2, Algorithm 6.3 outputs the circuit Ae0,0,m−1 over Ωmon arising fromA0,0,m1by application of Lemma 6.1.9. This is an optimum symmetric tree forfi,j,k, so the statement holds.

When m ≥ 3, Algorithm 6.3 outputs the circuit Ce over Ωmon arising from the weight-optimum circuit amongA0,0,m1andO0,0,m1by application of Lemma 6.1.9.

From the claim, we hence deduce delay

Ce Lem.6.1.9

= minn

log2(W(A0,0,m1)) ,

log2(W(O0,0,m1))o

claim

≤ delay(C0,0,m1).

As in Proposition 6.1.4, we can show that if the circuits in Algorithm 6.3 are implemented as formula circuits, their size is at most quadratic. In practice, our circuits will have roughly linear size (see Section 6.2) as we heuristically optimize size as described in Section 6.1.4.

Proposition 6.1.13. Let Boolean input variables t = (t0, . . . , tm1) with arrival times a(t0), . . . , a(tm1) ∈ N be given. Consider 0 ≤ i ≤ j ≤ k with j −i even.

Let Ai,j,k and Oi,j,k (the latter only for k ≥ j + 2) be the undetermined circuits computed by Algorithm 6.3, and letAei,j,k andOei,j,kbe the binary circuits arising from applying Lemma 6.1.9 to Ai,j,k and Oi,j,k, respectively. Then, we have size(Aei,j,k), size(Oei,j,k)≤(k−j+ 1)(k−i+ 1)−1. In particular, the circuitC0,0,m1 forf0,0,m1 computed by Algorithm 6.3 has size at most m2−1.

Proof. We prove the statement by induction onN(fi,j,k). Fork≤j+1(in particular for N(fi,j,k)≤1), the circuitAi,j,k is a binary tree with size k−i <(k−j+ 1)(k− i+ 1)−1 ask=j.

Thus, assume now that N(fi,j,k) ≥ 2 and k ≥ j+ 2, where we construct Ai,j,k andOi,j,k in lines 5 to 7. Without loss of generality, we only prove the statement for Ai,j,k. Assume that we use a split that builds Ai,j,k from undetermined circuits C1 and C2. Note that the size of Aei,j,k is independent of the symmetric tree

constructed for out(Ai,j,k). Thus, we can assume that Ai,j,k = Ce1 ∧Ce2, where Cei arises from Ci by application of Lemma 6.1.9 for i ∈ {1,2}. Hence, we have size(Aei,j,k) ≤ size(Ce1) + size(Ce2) + 1. For Ce1 and Ce2, the induction hypothesis is fulfilled. From here, the proof of the induction step can be continued as in Proposition 6.1.4 as every split performed in Algorithm 6.3 is also performed in Algorithm 6.1.

In the next theorem, we summarize the characteristics of Algorithm 6.3.

Theorem 6.1.14. Given Boolean input variables t = (t0, . . . , tm1) with arrival times a(t0), . . . , a(tm−1) ∈ N, Algorithm 6.3 computes a split-optimum circuit Ce realizing the And-Or path g(t) =f0,0,m1 with delay at most

delay(C)e ≤log2W + log2log2m+ log2log2log2m+ 4.3 and size at most

size(C)e ≤m2−1

and can be implemented to run in timeO(m4).

Proof. Split-optimality of Ce holds by Observation 6.1.11, and the size bound is proven in Proposition 6.1.13. The delay guarantee follows from combining Proposi-tion 6.1.12 and Theorem 6.1.5.

The running time guarantee is also implied by Theorem 6.1.5 since Algorithm 6.3 performs a strict subset of splits of Algorithm 6.1; only the running time of Algo-rithm 6.2 for the combination of sub-solutions by is additional (up to constant steps).

Note that during the course of Algorithm 6.3, we only need to know the weight and output gate type of an undetermined circuit, not its concrete structure. Hence, it suffices to actually construct symmetric trees in the final circuit Ce0,0,m1 only.

By postponing Huffman coding (Theorem 2.3.21), Algorithm 6.2 boils down to computing the output gate, summing up the weights ofC1andC2 and – eventually – rounding them up to the next power of2. These tasks can be performed in constant time. Hence, lines 1 to 7 of Algorithm 6.3 can be implemented to run in timeO(m4) when no circuit is actually constructed.

For the construction of the final circuitC, we now perform backtracking on the˜ performed splits. For each split, we apply Algorithm 6.2, this time with application of Huffman coding [Huf52] (see Theorem 2.3.21), which takes time O(rlog2r) for each call on r inputs. As the size ofCe has been proven to be at most quadratic in m, we have r ∈ O(m2) for each Huffman coding call. Hence, all Huffman coding calls together take time at mostO(m2log2m), which does not increase the overall running time.

We conclude that Algorithm 6.3 can be implemented to run in time O(m4).

Using Algorithm 6.3, we will in particular compute the optimum solution as depicted in Figure 6.2(b) for the instance from Example 6.1.6.