Binary-Circuit Dynamic Program - Computational Results

5.6 Computational Results

6.1.1 Binary-Circuit Dynamic Program

that in particular f0,0,m−1 = g(t) is the And-Or path on all inputs t. We can now rewrite the three splits (6.1) to (6.3) using this notation. For odd prefix length l= 2λ+ 1∈N with1≤l≤k−j and thusλ∈N,0≤λ≤ ^k⁻₂^j⁻¹, we have

f_i,j,k =f_i,j,j+2λ∧fj+1,j+2λ+1,k^∗ , (6.4) for even prefix length l= 2λwith 2≤ l≤k−j and thusλ∈N, 1≤λ≤ ^k−j₂ , we have

f_i,j,k =f_{i,j,j+2λ−1}∨f_i,j+2λ,k, (6.5)

and for1≤λ≤ ^j⁻₂ⁱ, we have

f_i,j,k =f_i,i+2λ₋_2,i+2λ₋₂∧f_i+2λ,j,k. (6.6) Note that oncej−i is even, in each of these splits, the difference of the “j” and “i”

indices is even for any occurring sub-function. Thus, every split in (6.1) to (6.3) can be represented using the functionsf_i,j,k for indicesi, j, kwithj−ieven as defined in Notation 6.1.1. Furthermore, note that indeed, each sub-formula occurring in (6.4) to (6.6) has strictly fewer inputs thatf_i,j,k.

Algorithm 6.1:Binary-circuit dynamic program for delay optimization of And-Orpaths

Input: Boolean input variables t= (t₀, . . . , t_m₋₁) with arrival times a(t0), . . . , a(tm−1)∈N.

Output: A circuit overΩ_mon computingf_0,0,m−1.

1 for l←1 to mdo

2 for 0≤i≤j≤k < m s.t. j−ieven and N(fi,j,k) =ldo

3 if k∈ {j, j+ 1}then

4 C_i,j,k := circuit forf_i,j,k computed by Huffman coding [Huf52]

(see Theorem 2.3.21)

5 else

6 C :=list of candidate circuits for f_i,j,k arising from applying split (6.4), (6.5) or (6.6) with any validλ

7 C_i,j,k := argmin

d(C) :C∈ C

8 return C_0,0,m−1

Proof. Assume that Algorithm 4.1 is applied for the realization of theAnd-Orpath g (t0, . . . , tm−1)

=f0,0,m−1. Here, by Observation 4.1.18, the recursive call in line 19 can be avoided by instead using split (4.27). We show that with this modification, all recursive constructions of Algorithm 4.1 can be expressed by (possibly recursive applications of) the splits Equations (6.4) to (6.6):

• The symmetric split in line 10 is split (6.6) with λ= ^j⁻₂ⁱ.

• The split in line 14 is an alternating split (6.4) with a prefix of length1.

• The split in line 29 is the alternating split as in Equation (6.4).

• The split (4.27) which is used to avoid the recursive call in line 19 is an alternating split with an even prefix as in Equation (6.5).

This shows that whenever Algorithm 4.1 (modified according to Observation 4.1.18) is applied recursively, the realized function is of the form f_i,j,k for some 0≤i≤j ≤ k < m−1 withj−ieven.

Now we verify that each explicit construction in Algorithm 4.1 can also be found by Algorithm 6.1:

• The binary trees in line 4 of Algorithm 6.1 can be computed in line 4 of Algorithm 6.1.

• The realization g(t) = t0∧(t1∨t2) computed in line 7 of Algorithm 6.1 can be obtained by applying Equation (6.4) with prefix length 1 to g(t), and then by using the symmetric treet₁∨t₂ which is dual to one of the symmetric trees constructed in line 4.

• The construction in line 26 is an alternating split (6.4) with an odd prefix of length 1.

Hence, all explicit and recursive constructions in Algorithm 4.1 can be performed by Algorithm 6.1. By Observation 6.1.2, this implies delay(C)≤delay(C⁰).

The way Algorithm 6.1 is formulated, we construct a formula circuit forf0,0,m−1. However, in practice, we try to avoid building the same sub-circuit twice and in-stead re-use the function computed by its output gate, see, e.g., the circuit in Fig-ure 6.2(a). Our precise size improvement strategy is presented in Section 6.1.4, and in Section 6.2, we will see that in practice, our circuits seem to have a linear size.

From a theoretical view, we can show that the size of the formula circuit constructed by Algorithm 6.1 is at most quadratic in the number of inputs.

Proposition 6.1.4. Let Boolean input variablest= (t₀, . . . , t_m−1)with arrival times a(t₀), . . . , a(t_m₋₁) ∈N be given. Consider 0≤i≤j≤k with j−i even. Then, the circuitC_i,j,k computed by Algorithm 6.1 has size at most (k−j+ 1)(k−i+ 1)−1.

In particular, the circuitC_0,0,m−1 forf_0,0,m−1 computed by Algorithm 6.1 has size at mostm²−1.

Proof. The second statement is a special case of the first statement. We prove the first statement by induction onN(f_i,j,k).

Ifk∈ {j, j+ 1}, then we constructC_i,j,k as a binary tree with size at mostk−i in line 4. As k≥j, this can be bounded from above by(k−j+ 1)(k−i+ 1)−1, so the size bound is fulfilled.

This covers the caseN(f_i,j,k)≤1, so assume now thatN(f_i,j,k)≥2, whereC_i,j,k is constructed in lines 6 to 7. We consider three different cases based on the type of split that is used to constructC_i,j,k.

First assume thatC_i,j,k =C_i,j,j+2λ∧Cj+1,j+2λ+1,k^∗ for some0≤λ≤ ^k⁻₂^j⁻¹ as in Equation (6.4). Then, we have

size(C_i,j,k)

= size C_i,j,j+2λ

+ size

Cj+1,j+2λ+1,k^∗

+ 1

(IH)≤ (j+ 2λ−j+ 1)(j+ 2λ−i+ 1) + (k−(j+ 2λ+ 1) + 1)(k−(j+ 1) + 1)−1

= (k−j+ 1) max

j+ 2λ−i+ 1, k−(j+ 1) + 1 −1

j+2λ≤k,

j≥i≤ (k−j+ 1)(k−i+ 1)−1.

Now assume that C_i,j,k = C_{i,j,j+2λ−1} ∨ C_i,j+2λ,k for some 1 ≤ λ ≤ ^k−j₂ as in Equation (6.5). Then, similar to the first case, we obtain

size(Ci,j,k)

≤ size C_i,j,j+2λ₋₁

+ size C_i,j+2λ,k + 1

(IH)

≤ (j+ 2λ−1−j+ 1)(j+ 2λ−1−i+ 1) + (k−(j+ 2λ) + 1)(k−i+ 1)−1

j+2λ≤k, j≥i

≤ (k−j+ 1) max{j+ 2λ−i, k−i+ 1} −1

j+2λ≤k

= (k−j+ 1)(k−i+ 1)−1.

Finally, in case of the split C_i,j,k =C_i,i+2λ₋_2,i+2λ₋₂∧C_i+2λ,j,k with1≤λ≤ ^j⁻₂ⁱ as

in Equation (6.6), we have

size(C_i,j,k) = size Ci,i+2λ−2,i+2λ−2

+ size C_i+2λ,j,k + 1

(IH)

≤ 1·(i+ 2λ−2−i+ 1) + (k−j+ 1)(k−(i+ 2λ) + 1)−1

= (2λ−1) + (k−j+ 1)(k−i−2λ+ 1)−1

k≥j< (k−j+ 1)(k−i)−1.

This proves the induction step and hence the first statement.

The following theorem summarizes all important properties of Algorithm 6.1.

Theorem 6.1.5. Given Boolean input variablest= (t₀, . . . , t_m−1) with arrival times a(t₀), . . . , a(t_m₋₁) ∈ N, Algorithm 6.1 computes a circuit C realizing the And-Or path g(t) =f_0,0,m₋₁ with

delay(C)≤log₂W + log₂log₂m+ log₂log₂log₂m+ 4.3 and

size(C)≤m²−1 in running time O(m⁴).

Proof. The size bound is proven in Proposition 6.1.4. We now prove the delay bound.

LetC⁰ denote the circuit computed by Theorem 4.2.4 on the same instance. We show that delay(C;a)≤delay(C⁰;a), following the proof of Theorem 4.2.4.

For m <500,C⁰ is a standardAnd-Orpath circuit or the circuit computed by the algorithm by Held and Spirkl [HS17b] (for modified arrival times).

First assume that C⁰ is the standard And-Or path circuit. Note that the standard And-Or path circuit for f_0,0,m−1 can be created by recursive applica-tion of Equaapplica-tion (6.4) with λ = 0 and finally constructing the symmetric circuit f_m₋_2,m₋_2,m₋₁. Hence, by Observation 6.1.2, we havedelay(C;a)≤delay(C⁰;a).

Now assume that C⁰ is computed by Held and Spirkl [HS17b]. Recall from Section 2.6.4 that this circuit arises from recursive application of split (2.42) which is a special case of the alternating split (6.4). Note that Held and Spirkl [HS17b]

construct special binary trees that follow the recursion of the alternating split, but as we always compute optimum binary trees in line 4, Observation 6.1.2 still implies delay(C;a)≤delay(C⁰;a).

For m ≥ 500, the circuit C⁰ is computed by Algorithm 4.1 (page 119) (on modified arrival times). In the proof of Proposition 6.1.3, we have seen that all initial and recursive circuit constructions used to constructC⁰ can also be performed by Algorithm 6.1. Hence, by Observation 6.1.2, we have delay(C;a)≤delay(C⁰;a).

From delay(C;a)≤delay(C⁰;a) and Theorem 4.2.4, the delay bound follows.

In order to derive the running time bound, note that line 4 is executed O(m²) times, while lines 6 and 7 are executed O(m³) times. By Theorem 2.3.21, Huffman coding can be implemented in time O(m) after sorting and in time O(mlog₂m) if sorting is needed. For a single execution of lines 6 and 7, the running time is in the order of O(m) as there are 3 types of splits and at most m choices for λ per split.

Hence, the total running time is in O(m²·mlog₂m+m³·m) =O(m⁴).

In Section 6.2, we shall see that for most small instances, the solution computed by Algorithm 6.1 is much better than the solution computed by Theorem 4.2.4, the algorithm from Held and Spirkl [HS17b] or Rautenbach, Szegedy, and Werber [RSW06], see Figure 6.6 (page 179). However, the comparison with optimum delays in Figure 6.8 (page 182) shows that there is still much room for improvement. Thus, we will present a refined algorithm in Section 6.1.2. The following example shows that binary trees occurring in the middle of the circuit cannot be optimized by Al-gorithm 6.1; but it also explains the structure of circuits computed by AlAl-gorithm 6.1 on an example instance.

10 9 12 6 4 0 5 1 12 14

t9 t8 t7 t6 t5 t4 t3 t2 t1 t0

11 13 14 15 16 17 18 19 20

(a)The standardAnd-Orpath circuit for f0,0,9 with prescribed input arrival times and computed gate arrival times.

10 9 12 6 4 0 5 1 12 14

t9 t8 t7 t6 t5 t4 t3 t2 t1 t0

11 6

7 13

14 13

15 15

(b) A circuit for the instance on the left with delay16computed by Algorithm 6.1 as described in Example 6.1.6.

Figure 6.1: Applying Algorithm 6.1 (page 162) to compute theAnd -Orpathf_0,0,9.

Example 6.1.6. Figure 6.1(b) depicts the solution computed by Algorithm 6.1 when run on the instance from Figure 6.1(a). The structure of the solution can be described as follows: At the output gate, we see that the alternating split with an odd prefix (see Equation (6.4)) has been applied withλ= 2:

f_0,0,9=f_0,0,4∧f_1,5,9^∗

The sub-functionf_0,0,4 is realized by the standard circuit, while f_1,5,9^∗ is realized by the alternating split with an odd prefix (see Equation (6.4)) withλ= 0:

f_1,5,9^∗ =f_1,5,5^∗ ∨f_6,6,9

Figure 6.2(a) depicts a candidate solution contained in C for f_0,0,9 which is not delay-optimum and thus not output by the algorithm. To simplify explanations, we have marked important splits in the picture.

Splits 1 and 2: In these cases, the alternating split with an odd prefix (see Equation (6.4)) is used withλ= 0:

f_0,0,9 =f_0,0,0∧f_1,1,9^∗ f_1,1,9^∗ =f_1,1,1,^∗ ∨f2,2,9

10 9 12 6 4 0 5 1 12 14

t9 t8 t7 t6 t5 t4 t3 t2 t1 t0

5 2

7 6

10 13 7

14 split 4

15 split 3

16 split 2

17 split 1

(a) Another circuit implementing the And-Or path from Figure 6.1(a) which has been considered by Algorithm 6.1, but has delay17.

10 9 12 6 4 0 5 1 12 14

t9 t8 t7 t6 t5 t4 t3 t2 t1 t0

5 2

7 6

10 13 7

11 12

13 14 15

(b)Circuit with optimum delay15arising from the circuit in Figure 6.2(a) by per-forming Huffman coding on the group of Orgates.

Figure 6.2: The circuit on the left-hand side is a candidate solution of Algorithm 6.1 (page 162) forf_0,0,9 obtained by the recursion formulas (6.4) to (6.6) as described in Example 6.1.6. The circuit on the right-hand side cannot be computed by Algorithm 6.1, but is delay-optimum for the instance in Figure 6.1(a) as the critical input t₀ traverses only 1gate, which is best possible.

There is nothing to be done for the computation off_0,0,0 =t₀ or f_1,1,1,^∗ =t₁.

Split 3: Now, we apply the alternating split with an even prefix (see Equa-tion (6.5)) with λ= 2:

f_2,2,9 =f_2,2,5∨f_2,6,9 The sub-functionf2,2,5 is realized by the standard circuit.

Split 4: Here, the alternating split with an even prefix (see Equation (6.5)) is applied withλ= 1:

f2,6,9 =f2,6,7∨f2,8,9

As the two arising sub-functions are symmetric, they can be constructed using Huffman coding. Note that the circuit fort₂∧t₄∧t₆ is used in both sub-circuits.

In Figure 6.2(a), one can see very well where Algorithm 6.1 (page 162) lacks flexibility: Each drawn split line partitions the circuit into three parts: two sub-circuits, and a concatenation gate. The algorithm optimizes the two sub-circuits separately and does not cross these split lines during optimization. In this concrete example, re-arranging the Orconcatenation gates as shown in Figure 6.2(b) is not possible for Algorithm 6.1. However, this would lead to a circuit with delay 15 which is by one better than the delay of the circuit in Figure 6.1(b) computed by Algorithm 6.1.

Im Dokument Faster Circuits for And-Or Paths and Binary Addition (Seite 161-166)