General Algorithm - Faster Circuits for And-Or Paths and Binary Addition

Let D^Or := {t0, . . . , tm−1}\S^Or. As h(t) is an And-Or path and m ≥ 3, we haveD^Or6=∅. Ash(t) is anAnd-Orpath andt_m−1 ∈S^Or, for everyt_i∈D^Or, we havet_i+1 ∈S^Or. Hence, the function

ϑ:D^Or →S^Or, ti 7→ti+1

is well-defined. Fork∈ {1,2}, letD_k^Or:=ϑ⁻¹ S_k^Or

. Note thatD^Or=D₁^Or·∪D^Or₂ . Now, for each k6=l∈ {1,2}, let B_k denote the reduced circuit arising fromC_k by fixing all inputst_i∈D_l to α:= 1, and let g_k:=f(B_k). Then, as all inputs in D_l are propagate signals, by considering the standard circuit forh(t), we observe that g_k = (f_k)_[

D^Or_l . By construction, the essential variables ofg_k are the variables ofS_k^Or andD^Or_k . Let t_j_k be the essential variable of g_k withj_k maximum.

Consider k∈ {1,2}. We show that g_k is an And-Or path: First note that by Observation 5.2.2 and the choice ofα, every input ofg_k except fort_j_k is a propagate signal (generate signal) ofg_k if and only if it is a propagate signal (generate signal ofh(t). By definition of ϑ, for any two generate signals t_i, t_j of g_k with i < j < j_k, the propagate signalt_j₋₁ =ϑ⁻¹(t_j) of h(t) is an input ofg_k. Furthermore, for any two propagate signals t_i 6=t_j of g_k withi < j < j_k, the generate signal t_i+1 =ϑ(t_i) of h(t) is an input of g_k. Hence, the inputs of g_k (except for t_j_k) are alternatingly propagate and generate signals andg_k is anAnd-Orpath.

Letm₁ and m₂ be the numbers of inputs ofB₁ and B₂, respectively. As {t0, . . . , tm−1}=S^Or ·∪D^Or=S^Or₁ ·∪S₂^Or ·∪D₁^Or ·∪D₂^Or,

we havem₁+m₂ = m. As B₁ and B₂ are both And-Orpath circuits with depth at mostd, we havem₁, m₂ ≤m(d,0). Together, this implies

m=m₁+m₂ ≤2m(d,0).

For the special case when all input arrival times are equal, we conjecture that partitions of the same-gate inputs into two “non-overlapping“ sets are always best for the delay.

Conjecture 5.2.11. Consider Theorem 5.2.9 for the case of uniform input arrival times and let S^◦ = S₁^◦ ·∪S₂^◦ be a partition as in the theorem. Then, for all inputs t_i ∈S₁^◦ and t_j ∈S₂^◦, we have i < j.

We will see in Section 6.3 why we assume this statement to be satisfied. For non-uniform arrival times, we already know that the conjecture is not fulfilled, see Figure 6.12 (page 189).

Algorithm 5.1: Exact algorithm for delay optimization of generalized And-Orpaths

Input: Boolean input variables t= (t₀, . . . , t_m₋₁) with arrival times a(t0), . . . , a(tm−1)∈R, and gate types Γ = (◦0, . . . ,◦m−2).

Output: Optimum delay of any circuit over Ω_mon computing h(t; Γ).

1 foreach∅ 6=I ⊆ {t₀, . . . , t_m₋₁} do

2 Set d(I) :=∞.

3 return compute_opt({t₀, . . . , t_m−1}) // Assume that ∅ 6=I ⊆ {t₀, . . . , t_m₋₁}.

4 procedure compute_opt(I)

5 Assume that I =

t_i₀, . . . , t_i_r−1 with0≤i₀ < . . . < i_r−1≤m−1and letΓ⁰ := ◦i0, . . . ,◦ir−2

6 if d(I)<∞ then

7 return d(I)

8 if r= 1 then

9 Set d(I) =a(t_i_r−1).

10 return d(I)

11 foreach◦ ∈ {And,Or} do

12 Let S^◦⊆I consist of all signals t_i_j with◦ij =◦ andt_i_r−1.

13 foreachpartition S^◦ =S₁^◦ ·∪S₂^◦ with S₁^◦, S₂^◦6=∅do

14 foreachk∈ {1,2} do

15 LetI_k denote the input set of h

ti0, . . . , tir−1

; Γ⁰

S_k^◦.

16 Let d_k:= compute_opt(I_k).

17 Set d(I) = min

d(I),max{d1, d2}+ 1 .

18 return d(I)

which maps each generalized And-Orpath to its essential inputs. This map is in-jective as for h(t; Γ) by Observation 5.2.2, every input t_i with i < i_k (with i_k as in Observation 5.2.2) that is essential for the generalized And-Or path h(t; Γ)_S_k^◦ is a propagate signal (generate signal) of h(t; Γ)_S◦

k if and only if it is a propagate signal (generate signal) of h(t; Γ).

Hence, we may identify a generalized And-Orpath considered during recursive applications of Theorem 5.2.9 with the set of its essential inputs. Algorithm 5.1 describes our algorithm which recursively applies Theorem 5.2.9 and stores the computed delays d(I) for subsets I of {t0, . . . , tm−1} in a dynamic programming table of size at most 2^m−1.

It is not hard to see that κ is actually a bijection: Given some subset ∅ 6=I ( {t0, . . . , tm−1}, we need to find a series of partitions according to Theorem 5.2.9 such that the generalized And-Or path with essential inputs I arises. Choose i ∈ {0, . . . , m−1} maximum with t_i ∈ I. Assume that t_i is a generate signal (the other case follows by duality). First, use an Or gate and partition the same-gate signals S^Or of h(t; Γ) and Or into those contained in I and the rest. Then, h(t; Γ)_SOr∩I is a generalizedAnd-Orpath with the generate signals contained inI, plus all propagate signals tj of h with j < i. Afterwards, partition the propagate signals of h(t; Γ)_SOr∩I into those contained in I and the rest. This yields the

generalizedAnd-Orpath with essential input setI.

In the following theorem, we estimate the running time of Algorithm 5.1.

Theorem 5.3.1. Let input variables t = (t₀, . . . , t_m₋₁) with arrival times a(t0), . . . , a(tm−1)∈R and gate types Γ = (◦0, . . . ,◦m−2) be given. Then, Algo-rithm 5.1 computes the optimum delay of any circuit realizing the generalized And -Or path h(t; Γ). The dynamic programming table needed to store the delay of all generalized And-Or paths considered during the computation has 2^m −1 entries.

Denoting by g and p the number of generate signals and propagate signals among t₀, . . . , t_m₋₂, the algorithm can be implemented to run in time O(3^g2^p + 2^g3^p). In particular, if h(t; Γ) is an And-Or path, then the running time is O

√ 6n

. By backtracking, we can obtain a delay-optimum formula circuit forh(t; Γ).

Proof. We have already argued that a generalizedAnd-Orpath arising from recur-sive application of Theorem 5.2.9 can be identified with the set of its essential inputs via a bijection κ. Hence, by induction on m and Theorem 5.2.9, we can see that Algorithm 5.1 computes the optimum delay of any formula circuit for h(t; Γ). By Theorem 2.3.11, this is the optimum delay of any circuit for h(t; Γ). The dynamic programming table has size exactly2^m−1.

Let T := {t₀, . . . , t_m−1}. The running time of Algorithm 5.1 is dominated by enumerating all partitions of the respective setS^◦ in line 13 for the two cases that

◦=Andor ◦ =Or for all subsets ∅ 6=I ⊆T. A partition of S^◦ into2 non-empty subsets corresponds to choosing a subsetS₁^◦ ⊆S^◦\

t_i_k and setting S₂^◦ := S^◦\S₁^◦. By Observation 5.2.2, the generalized And-Orpaths h(t; Γ)_SOr

1 and h(t; Γ)_SOr

2 are

uniquely determined byI,S^Or and S₁^Or.

Hence, it remains to bound the number of setsS₁^Or(S^Or (I considered during the algorithm. For fixed I, S^Or and S₁^Or, the following holds: A propagate signal of h may by in I or in T\I. Each generate signal of h has three options: it is contained in S^Or₁ , in S^Or\S₁^Or or in {t0, . . . , tm−1}\S^Or. Hence, there are at most 3^g2^p partitions for the case that the split gate is anOr.

Similarly, when ◦ = And, we have 3^p2^g partitions. Summing up yields the running time bound.

Whenh(t; Γ)is anAnd-Orpath, we havep, g∈h _n

,_n

. Hence, the running time follows from the previous statement.

We call a circuit C strongly delay-optimum if each sub-circuit of C has optimum delay. Note that the formula circuit constructed by our algorithm is strongly delay-optimum. Our algorithm can naturally be adapted to compute a size-optimum circuit among all strongly delay-size-optimum circuits by storing both delay and size for each generalized And-Or path in line 17 and updating it accordingly.

However, for computing a optimum circuit with minimum size among all delay-optimum circuits, we would need to store multiple candidate circuits for each sub-circuit (cf. Section 6.1.4, where this is done for another algorithm) which we did not implement so far.

In Figure 5.4, we show two depth-optimum formula circuits for theAnd-Orpath g (t₀, . . . , t₁₄)

. The circuit in Figure 5.4(a) is a circuit with best depth and size17 computed by Algorithm 6.3, while the circuit in Figure 5.4(b) is size-optimum among all strongly delay-optimum formula circuits, hence a possible output of Algorithm 5.1.

Note that in Figure 5.4(a), the left predecessor of the output gate computes anAnd

-t₁₃ t₁₂ t₁₁ t₁₀ t₉ t₈ t₇ t₆ t₅ t₄ t₃ t₂ t₁ t₀

(a) A size-optimum formula circuit forg(t)with size17.

t13

t12

t11

t10

(b) A size-optimum circuit among all strongly delay-optimum formula circuits forg(t)with size18.

Figure 5.4: Two formula circuits for the And-Or path g(t) with t = (t₀, . . . , t₁₃) with optimum depth 5. They only differ in the left sub-circuit of the final output.

Orpath on5inputs with a depth of4and a size of5. In Figure 5.4(b), we instead use an implementation with depth3and size4, which increases the size by1, but makes the circuit strongly delay-optimum. This can be verified using the lower bound of dlog₂neon each sub-circuit with n inputs.

Note that in Figure 5.4(b), Conjecture 5.2.11 is fulfilled. For instance, for the outermost partition, we have S^And={t₀, t₂, t₄} ·∪ {t₆, t₈, t₁₀, t₁₂, t₁₃}.

There are two other exact algorithms for the special case of depth optimization of And-Or paths. Grinchuk [Gri13] provides an exact algorithm for depth opti-mization of And-Or paths, but with a running time of Ω(4^m), see Section 2.6.4.

The theoretical running time of our algorithm for the special case of And-Orpaths coincides with the running time of the formula enumeration algorithm by Hegerfeld [Heg18] for depth optimization, see his Theorem 4.2.16. However, for depth opti-mization of And-Or paths, we shall improve our algorithm to obtain a running time of O(m2.02^m) in Theorem 5.4.6. In his algorithm, Hegerfeld does not directly enumerate formula circuits for And-Or paths, but so-called rectangle-good proto-col trees for Karchmer-Wigderson games (see Karchmer and Wigderson [KW90]) for And-Or paths, which originate from the area of communication complexity. From these, he derives his formula circuits.

Hegerfeld [Heg18] computes a formula circuit with optimum size among all strongly delay-optimum formula circuits, although he states that he even computes a size-optimum formula circuit among all delay-optimum formula circuits. For in-stance, for the And-Or path on 14 inputs, Hegerfeld reports a size of 18 (see Ta-ble 5.4), but in Figure 5.4(a), we saw a depth-optimum formula circuit with size17.

We shall see in Table 5.4 that the practical running times of Hegerfeld’s algorithm are much worse than ours. One reason for this is our more efficient practical implementation which we present in Section 5.5. Another reason is that for depth optimization of And-Orpaths, the algorithm and its running time can further be improved as in described the following section.

Im Dokument Faster Circuits for And-Or Paths and Binary Addition (Seite 141-145)