Undetermined-Circuit Dynamic Program

5.6 Computational Results

6.1.2 Undetermined-Circuit Dynamic Program

10 9 12 6 4 0 5 1 12 14

t9 t8 t7 t6 t5 t4 t3 t2 t1 t0

5 2

7 6

10 13 7

14 split 4

15 split 3

16 split 2

17 split 1

(a) Another circuit implementing the And-Or path from Figure 6.1(a) which has been considered by Algorithm 6.1, but has delay17.

10 9 12 6 4 0 5 1 12 14

t9 t8 t7 t6 t5 t4 t3 t2 t1 t0

5 2

7 6

10 13 7

11 12

13 14 15

(b)Circuit with optimum delay15arising from the circuit in Figure 6.2(a) by per-forming Huffman coding on the group of Orgates.

Figure 6.2: The circuit on the left-hand side is a candidate solution of Algorithm 6.1 (page 162) forf_0,0,9 obtained by the recursion formulas (6.4) to (6.6) as described in Example 6.1.6. The circuit on the right-hand side cannot be computed by Algorithm 6.1, but is delay-optimum for the instance in Figure 6.1(a) as the critical input t₀ traverses only 1gate, which is best possible.

There is nothing to be done for the computation off_0,0,0 =t₀ or f_1,1,1,^∗ =t₁.

Split 3: Now, we apply the alternating split with an even prefix (see Equa-tion (6.5)) with λ= 2:

f_2,2,9 =f_2,2,5∨f_2,6,9 The sub-functionf2,2,5 is realized by the standard circuit.

Split 4: Here, the alternating split with an even prefix (see Equation (6.5)) is applied withλ= 1:

f2,6,9 =f2,6,7∨f2,8,9

As the two arising sub-functions are symmetric, they can be constructed using Huffman coding. Note that the circuit fort₂∧t₄∧t₆ is used in both sub-circuits.

In Figure 6.2(a), one can see very well where Algorithm 6.1 (page 162) lacks flexibility: Each drawn split line partitions the circuit into three parts: two sub-circuits, and a concatenation gate. The algorithm optimizes the two sub-circuits separately and does not cross these split lines during optimization. In this concrete example, re-arranging the Orconcatenation gates as shown in Figure 6.2(b) is not possible for Algorithm 6.1. However, this would lead to a circuit with delay 15 which is by one better than the delay of the circuit in Figure 6.1(b) computed by Algorithm 6.1.

postpone the construction of symmetric trees until all their inputs are known. This motivates the following definition.

Definition 6.1.7. A circuit C with a single output is called an undetermined circuitif

• all its gates are Andor Orgates and

• all gate vertices with the possible exception of out(C)have fan-in two.

We denote the gate type ofout(C) by gt(C)∈ {And,Or}.

Note that any circuit over the basis Ω_mon = {And2,Or2} is an undetermined circuit. Another two undetermined circuits are shown in Figure 6.3.

Different to Algorithm 6.1 (page 162), we now allow undetermined circuits as implementations of the functions f_i,j,k in intermediate solutions. Of course, the circuit we finally compute still must be a circuit over Ω_mon = {And2,Or2}. We extend the definition of the weight of circuit inputs to undetermined circuits in order to compare different implementations realizing the same Boolean function.

Definition 6.1.8. Given an undetermined circuitC on Boolean inputst₀, . . . , t_m−1 with input arrival timesa(t₀), . . . , a(t_m₋₁)∈N, we define theweightof C as

W(C) := X

v∈δ⁻¹(out(C))

2^a(v).

Given the weight of an undetermined circuit, we can estimate the delay of a canonical logically equivalent circuit overΩ_mon.

Lemma 6.1.9. Given an undetermined circuitCon inputst₀, . . . , t_m₋₁ with integral arrival times a(t0), . . . , a(tm−1) ∈N, we can construct an equivalent circuit Ce over Ω_mon ={And2,Or2} with delay

log₂(W(C)) .

Proof. Initialize Ce with the circuit obtained fromC by deleting out(C). Now,Ce is a circuit overΩ_mon, but has multiple outputs, sayv₀, . . . , v_k−1. Propagate the input arrival times throughCeto compute arrival timesa(v₀), . . . , a(v_k₋₁). Applying Huff-man coding (see Theorem 2.3.21) withv₀, . . . , v_k₋₁ as inputs anda(v₀), . . . , a(v_k₋₁) as input arrival times yields a circuitH with delaydlog₂(W(C))e. Adding all gates and edges ofH toCe yields the required circuit.

The rough idea of Algorithm 6.3 is again to compute a dynamic programming table which contains a circuitC_i,j,kforf_i,j,k for every0≤i≤j≤k≤m−1withj−i even, but now, all intermediate solutions C_i,j,k are undetermined circuits. Hence, we call Algorithm 6.3 the undetermined-circuit dynamic program in order to highlight the contrast to the binary-circuit dynamic program in Algorithm 6.1.

Again, we apply the alternating splits (6.4) and (6.5) in order to recursively compute C_i,j,k, but the symmetric split (6.6) is used only with λ= 1, i.e.,

f_i,j,k =fi,i,i∧f_i+2,j,k. (6.7)

In the proof of Proposition 6.1.12, we shall see that the other symmetric splits are not needed anymore in Algorithm 6.3.

The three types of splits are extended to undetermined circuits as follows.

1 0 2 4 3 1 2 2 0 1 1 1 2

t12

t11

t10

(a)An undetermined circuit forf0,4,8with weight2²+ 2¹+ 2⁰+ 2⁶= 71.

1 0 2 4 3 1 2 2 0 1 1 1 2

t12

t11

t10

(b) An undetermined circuit for f5,9,12

with weight 2²+ 2¹+ 2⁴+ 2³= 30. Figure 6.3: Two undetermined circuits and their weights.

1 0 2 4 3 1 2 2 0 1 1 1 2

t12

t11

t10

(ti, . . . , tj−1) (tj, . . . , tj+2λ) (tj+2λ+1, . . . , tk)

out(C1), fi,j,j+2λ out(C2),fj+1,j+2λ+1,k^∗

c0,fi,j,k

(a) Combining the undetermined circuits from Figure 6.3 to a circuit forf0,4,12 ac-cording to split (6.4) with λ = 2. This does not yield an undetermined circuit.

1 0 2 4 3 1 2 2 0 1 1 1 2

t12

t11

t10

(t_i, . . . , t_j−1) (t_j, . . . , t_k)

out(C),fi,j,k

(b) Undetermined circuit C arising from applying Algorithm 6.2 to the circuits C1

and C2 from Figure 6.4(a). We have W(C) = 2²+ 2¹+ 2⁰+ 2⁶+ 2⁵= 103.

Figure 6.4: Computing an undetermined circuit for f_i,j,k withi= 0, j = 4 and k = 12 using the alternating split (6.4) with prefix-length 5 = 2λ+ 1and Algorithm 6.2 (page 169).

Let us consider a circuit C for f_i,j,k arising from the split f_i,j,k =f_i,j,j+2λ∧fj+1,j+2λ+1,k^∗

with λ ∈ N and 0 ≤ λ ≤ ^k⁻₂^j⁻¹ as in Equation (6.4). Now, the circuits C_i,j,j+2λ for f_i,j,j+2λ and Cj+1,j+2λ+1,k^∗ for fj+1,j+2λ+1,k^∗ are both undetermined circuits. Ac-cording to the split, using an And gate, we can combine them to a circuit C⁰ for fi,j,k, but this will not necessarily be an undetermined circuit, see the example in Figure 6.4(a). In Figure 6.4(b), we can see how C⁰ is turned into an undetermined circuit C for f_i,j,k in this case.

The general procedure how to merge two undetermined circuits C1 and C2 is described in Algorithm 6.2: When gt(C_i) coincides with the gate type ◦ of the concatenation gate, the inputs ofout(C_i)are simply connected toout(C), otherwise, the symmetric tree at out(Ci) may be completed using Lemma 6.1.9. This means that we do not decide for an implementation of the symmetric tree computing the logic function atout(C_i)until we know that no other possible inputs of the symmetric tree will emerge in later steps of the algorithm.

In Algorithm 6.2, we see that the computed circuitC highly depends on whether

Algorithm 6.2: Merging undetermined circuits

Input: Two undetermined circuitsC₁ and C₂ computing Boolean functions h₁ andh₂, respectively, depending on inputs with integer arrival times; a gate type◦ ∈ {And,Or}.

Output: An undetermined circuitC computingh₁◦h₂.

1 LetC be the union of the circuitsC₁ andC₂.

2 Add a ’◦’-gate v0 to C.

3 fori←1 to 2do

4 Letv₁, . . . , v_k be the inputs ofout (C_i).

5 if gt(Ci) =◦then

6 Removeout (C_i)from C and add edges (v₁, v₀), . . . ,(v_k, v₀)to C.

7 else

8 Construct a circuitCe_i over Ω_mon computing h_i using Lemma 6.1.9.

9 Add all edges and gates fromCe_i to C.

10 Add an edge

out

Ce_i , c₀

toC.

11 returnC

gt(C_i) =And or gt(C_i) =Or. As a consequence, our dynamic programming table now contains two undetermined circuits for every occurringf_i,j,k: circuits A_i,j,k and O_i,j,k which have minimum weight among all recursively constructed circuits with gt(A_i,j,k) = And and gt(O_i,j,k) = Or, respectively. Here, we use the weight of the undetermined circuits to decide which circuit to store as by Lemma 6.1.9, two undetermined circuitsC1andC2withW(C1)≤W(C2)fulfilldelay(C1)≤delay(C2).

Apart from that, Algorithm 6.3 is very similar to Algorithm 6.1, but due to the use of undetermined circuits, we may omit some initial symmetric tree constructions.

The following lemma implies that Algorithm 6.3 correctly computes a circuit for f_0,0,m₋₁.

Lemma 6.1.10. Consider the application of Algorithm 6.3 to Boolean input variables t= (t₀, . . . , t_m₋₁) with arrival times a(t₀), . . . , a(t_m₋₁)∈N. Let 0≤i≤j ≤k < m withj−ieven be given. Ifk−j≤1, we haveA_i,j,k =V

l∈{i,i+2,...,j−2,j}t_l. Otherwise, bothA_i,j,k and O_i,j,k exist.

Proof. We prove the statement by induction onN(f_i,j,k)∈N. Whenk−j = 0, the circuit A_i,j,k =V

l∈{i,i+2,...,j−2,j}t_l is computed in line 4.

For k −j = 1, a candidate circuit for A_i,j,k can be obtained the following way: The realization f_i,j,k = f_i,j,j ∧ f_j+1,j+1,k^∗ arises from the alternating split (6.4) with λ = 0 in line 6. By the first case, we have Ai,j,j = V

l∈{i,i+2,...,j−2,j}t_l and A_j+1,j+1,k = V

l=j+1t_l = t_j+1. After application of Algorithm 6.2, we obtain A=V

l∈{i,i+2,...,j−2,j}t_l as a candidate circuit forA_i,j,k. As A has optimum weight among all possible realizations forf_i,j,k, we haveA_i,j,k=A.

Now assume thatk−j ≥2, which impliesN(f_i,j,k)≥2. By induction hypothesis, for all functions f_i0,j⁰,k⁰ such that 0 ≤ i⁰ ≤ j⁰ ≤ k⁰ ≤ m −1 with j⁰ −i⁰ even and N(f_i0,j⁰,k⁰) < N(f_i,j,k), a realization A_i,j,k is computed. Hence, a candidate realization for A_i,j,k can be obtained via the alternating split Equation (6.4) with an odd prefix length 2λ+ 1 with λ = 0, i.e., f_i,j,k = f_i,j,j∧f_j+1,j+1,k^∗ . For O_i,j,k,

Algorithm 6.3: Undetermined-circuit dynamic program for delay opti-mization of And-Orpaths

Input: Boolean input variables t= (t₀, . . . , t_m₋₁) with arrival times a(t0), . . . , a(tm−1)∈N.

Output: A circuit overΩ_mon computing f_0,0,m−1.

1 for l←1 to mdo

2 for 0≤i≤j≤k < m s.t. j−ieven and N(fi,j,k) =ldo

3 if k=j then

4 A_i,j,j:=V

l∈{i,i+2,...,j−2,j}t_l

5 C :=list of candidate undetermined circuits for f_i,j,k arising from applying split (6.4), (6.5) or (6.7) with any validλ, followed by application of Algorithm 6.2.

6 A_i,j,k= argmin

W(C) :C ∈ C with gt(C) =And

7 O_i,j,k = argmin

W(C) :C ∈ C with gt(C) =Or

8 if m≤2then

9 C :=A_0,0,m−1

10 else

11 C := argmin

W(A_0,0,m−1), W(O_0,0,m−1)

12 Apply Lemma 6.1.9 to transform C into a circuitC˜ overΩ_mon.

13 return C˜

a candidate circuit can be obtained via the alternating split Equation (6.5) with an even prefix length 2λ withλ= 1, i.e., f_i,j,k =f_i,j,j+1∧f_i,j+2,k^∗ . Hence, the list C in line 5 contains undetermined circuits both with Andand Oras output gate type, and A_i,j,k andO_i,j,k both exist.

A similar statement as Observation 6.1.2 can be shown for Algorithm 6.3.

Observation 6.1.11. Let inputs variables t = (t₀, . . . , t_m₋₁) with arrival times a(t0), . . . , a(tm−1) ∈ R be given. We call a circuit C for g(t) a split circuit if it arises from the construction of optimum symmetric trees on input vectors of the form t_i, t_i+2, . . . , t_j₋₂, t_j, the recursive application of (6.4), (6.5) and (6.7) followed by Algorithm 6.2 and the dualization of circuits. Moreover, we call a circuit C for g(t) split-optimum if it is a split circuit that has optimum delay among all split circuits forg(t). For integral arrival times, Algorithm 6.1 computes a split-optimum circuit for g(t).

The following theorem states that regarding delay, the circuit computed by Algorithm 6.3 is always at least of good as the one computed by Algorithm 6.1.

Proposition 6.1.12. Let Boolean input variables t = (t₀, . . . , t_m−1) with arrival times a(t₀), . . . , a(t_m₋₁) ∈ N be given. Consider the circuits C and Ce computed by Algorithm 6.1 and Algorithm 6.3 for this instance, respectively. Then, we have

delay(C)e ≤delay(C).

Proof. For0≤i≤j≤k < mwithj−ieven, let C_i,j,k be as in Algorithm 6.1 and A_i,j,k and O_i,j,k as in Algorithm 6.3. We will first prove the following claim.

Claim. For every0≤i≤j≤k < m withj−ieven, we have delay(C_i,j,k)≥

(log₂(W(A_i,j,k))

ifgt(C_i,j,k) =And, log₂(W(O_i,j,k))

otherwise . Proof of claim: We prove the claim by induction onN(f_i,j,k).

If k ∈ {j, j + 1}, Algorithm 6.1 constructs an optimum symmetric tree C_i,j,j with gt(C_i,j,k) = And for f_i,j,k. In this case, by Lemma 6.1.10, we have A_i,j,j :=

l∈{i,i+2,...,j−2,j}t_l. Thus, we have log₂(W(A_i,j,j))





 log₂





l∈{i,i+2,...,j−2,j}

2^a(t^l⁾









= delay(C_i,j,j).

Now assume that k ≥ j + 2. We have N(f_i,j,k) = ^j⁻₂ⁱ + k− j + 1 ≥ 2.

Hence, by induction hypothesis, we can assume that the statement holds for all 0≤i⁰ ≤j⁰ ≤k⁰ ≤m−1 withj⁰−i⁰ even and N(f_i0,j⁰,k⁰)< N(f_i,j,k),

The circuitC_i,j,k is computed by a split of type (6.4), (6.5), or (6.3).

Case 1: Assume that the split is of type (6.4) or (6.5).

Let C₁ and C₂ denote the circuits used in the split by Algorithm 6.1. By Lemma 6.1.10, the table computed by Algorithm 6.3 contains circuits C₁⁰ and C₂⁰ where for eachr ∈ {1,2},C_i⁰ is equivalent toCi andgt(Cr) = gt(C_r⁰). By induction hypothesis, we have

log₂(W(C_r⁰))

≤ delay(C_r) for each r ∈ {1,2}. Hence, the circuitC⁰ arising from mergingC₁⁰ and C₂⁰ by Algorithm 6.2 fulfills

W(C⁰)≤2d^log²^W^(C⁰¹⁾e+ 2d^log²^W(C²⁰⁾e ≤2^delay(C¹⁾+ 2^delay(C²⁾.

Thus, assuming without loss of generality thatdelay(C₁)≤delay(C₂), this implies log₂(W(C⁰))

≤

log₂

2^delay(C¹⁾+ 2^delay(C²⁾

≤

log₂

2·2^delay(C²⁾

delay(C2)∈N

= delay(C₂) + 1

= max

delay(C₁),delay(C₂) + 1

= delay(C_i,j,k). (6.8)

Case 2: Assume that C_i,j,k is computed via a symmetric split (6.3), i.e., C_i,j,k = C_i,i+2λ₋_2,i+2λ₋₂∧C_i+2λ,j,k for some 1≤λ≤ ^j⁻₂ⁱ.

An undetermined circuit A⁰_i,j,k for f_i,j,k can be obtained by recursive application of split of type (6.7), i.e.,

f_i,j,k =f_i,i,i∧

fi+2,i+2,i+2∧

. . .∧ f_i+2λ₋_2,i+2λ₋_2,i+2λ₋₂∧f_i+2λ,j,k

, (6.9) followed by Algorithm 6.2 after every split. As A_i,j,k is a split-optimum circuit for f_i,j,k, we haveW(A_i,j,k)≤W(A⁰_i,j,k), and it suffices to show that

log₂

W(A⁰_i,j,k

≤delay(C_i,j,k). (6.10)

By Lemma 6.1.10, both A_i+2λ,j,k and O_i+2λ,j,k have been computed, and a trivial realization A_r,r,r =t_r has been computed for all r ∈ {i, i+ 2, . . . , i+ 2λ−2}. Let C_i+2λ,j,k⁰ be the circuit amongA_i+2λ,j,k andO_i+2λ,j,k with the same output gate type asC_i+2λ,j,k. Then, by induction hypothesis, we have

log₂

W(C_i+2λ,j,k⁰ )

≤delay(C_i+2λ,j,k). (6.11) As all the outer gates in Equation (6.9) areAndgates, we have

W(A⁰_i,j,k) ≤ X

r∈{i,i+2,i+2λ−2}

W(t_r) + 2

log₂

W(C_i+2λ,j,k⁰ )

Alg. 6.1,l.4

≤ 2^delay(Ci,i+2λ−2,i+2λ−2)+ 2

log₂

W(C_i+2λ,j,k⁰ )

(6.11)

≤ 2^delay(Ci,i+2λ−2,i+2λ−2)+ 2^delay(^(Ci+2λ,j,k)).

From this, we can show Equation (6.10) the same way as in Equation (6.8) of case 1.

This proves the induction step and hence the claim.

In the case that m ≤ 2, Algorithm 6.3 outputs the circuit Ae_0,0,m−1 over Ω_mon arising fromA_0,0,m₋₁by application of Lemma 6.1.9. This is an optimum symmetric tree forf_i,j,k, so the statement holds.

When m ≥ 3, Algorithm 6.3 outputs the circuit Ce over Ω_mon arising from the weight-optimum circuit amongA_0,0,m₋₁andO_0,0,m₋₁by application of Lemma 6.1.9.

From the claim, we hence deduce delay

Ce _Lem._6.1.9

= minn

log₂(W(A_0,0,m₋₁)) ,

log₂(W(O_0,0,m₋₁))o

claim

≤ delay(C_0,0,m₋₁).

As in Proposition 6.1.4, we can show that if the circuits in Algorithm 6.3 are implemented as formula circuits, their size is at most quadratic. In practice, our circuits will have roughly linear size (see Section 6.2) as we heuristically optimize size as described in Section 6.1.4.

Proposition 6.1.13. Let Boolean input variables t = (t₀, . . . , t_m₋₁) with arrival times a(t0), . . . , a(tm−1) ∈ N be given. Consider 0 ≤ i ≤ j ≤ k with j −i even.

Let A_i,j,k and O_i,j,k (the latter only for k ≥ j + 2) be the undetermined circuits computed by Algorithm 6.3, and letAe_i,j,k andOe_i,j,kbe the binary circuits arising from applying Lemma 6.1.9 to A_i,j,k and O_i,j,k, respectively. Then, we have size(Ae_i,j,k), size(Oe_i,j,k)≤(k−j+ 1)(k−i+ 1)−1. In particular, the circuitC_0,0,m₋₁ forf_0,0,m₋₁ computed by Algorithm 6.3 has size at most m²−1.

Proof. We prove the statement by induction onN(f_i,j,k). Fork≤j+1(in particular for N(f_i,j,k)≤1), the circuitA_i,j,k is a binary tree with size k−i <(k−j+ 1)(k− i+ 1)−1 ask=j.

Thus, assume now that N(f_i,j,k) ≥ 2 and k ≥ j+ 2, where we construct A_i,j,k andO_i,j,k in lines 5 to 7. Without loss of generality, we only prove the statement for A_i,j,k. Assume that we use a split that builds A_i,j,k from undetermined circuits C₁ and C₂. Note that the size of Ae_i,j,k is independent of the symmetric tree

constructed for out(A_i,j,k). Thus, we can assume that A_i,j,k = Ce1 ∧Ce2, where Ce_i arises from C_i by application of Lemma 6.1.9 for i ∈ {1,2}. Hence, we have size(Ae_i,j,k) ≤ size(Ce₁) + size(Ce₂) + 1. For Ce₁ and Ce₂, the induction hypothesis is fulfilled. From here, the proof of the induction step can be continued as in Proposition 6.1.4 as every split performed in Algorithm 6.3 is also performed in Algorithm 6.1.

In the next theorem, we summarize the characteristics of Algorithm 6.3.

Theorem 6.1.14. Given Boolean input variables t = (t0, . . . , tm−1) with arrival times a(t₀), . . . , a(t_m−1) ∈ N, Algorithm 6.3 computes a split-optimum circuit Ce realizing the And-Or path g(t) =f_0,0,m₋₁ with delay at most

delay(C)e ≤log₂W + log₂log₂m+ log₂log₂log₂m+ 4.3 and size at most

size(C)e ≤m²−1

and can be implemented to run in timeO(m⁴).

Proof. Split-optimality of Ce holds by Observation 6.1.11, and the size bound is proven in Proposition 6.1.13. The delay guarantee follows from combining Proposi-tion 6.1.12 and Theorem 6.1.5.

The running time guarantee is also implied by Theorem 6.1.5 since Algorithm 6.3 performs a strict subset of splits of Algorithm 6.1; only the running time of Algo-rithm 6.2 for the combination of sub-solutions by is additional (up to constant steps).

Note that during the course of Algorithm 6.3, we only need to know the weight and output gate type of an undetermined circuit, not its concrete structure. Hence, it suffices to actually construct symmetric trees in the final circuit Ce_0,0,m₋₁ only.

By postponing Huffman coding (Theorem 2.3.21), Algorithm 6.2 boils down to computing the output gate, summing up the weights ofC₁andC₂ and – eventually – rounding them up to the next power of2. These tasks can be performed in constant time. Hence, lines 1 to 7 of Algorithm 6.3 can be implemented to run in timeO(m⁴) when no circuit is actually constructed.

For the construction of the final circuitC, we now perform backtracking on the˜ performed splits. For each split, we apply Algorithm 6.2, this time with application of Huffman coding [Huf52] (see Theorem 2.3.21), which takes time O(rlog₂r) for each call on r inputs. As the size ofCe has been proven to be at most quadratic in m, we have r ∈ O(m²) for each Huffman coding call. Hence, all Huffman coding calls together take time at mostO(m²log₂m), which does not increase the overall running time.

We conclude that Algorithm 6.3 can be implemented to run in time O(m⁴).

Using Algorithm 6.3, we will in particular compute the optimum solution as depicted in Figure 6.2(b) for the instance from Example 6.1.6.

Im Dokument Faster Circuits for And-Or Paths and Binary Addition (Seite 166-173)