• Keine Ergebnisse gefunden

Let DOr := {t0, . . . , tm1}\SOr. As h(t) is an And-Or path and m ≥ 3, we haveDOr6=∅. Ash(t) is anAnd-Orpath andtm−1 ∈SOr, for everyti∈DOr, we haveti+1 ∈SOr. Hence, the function

ϑ:DOr →SOr, ti 7→ti+1

is well-defined. Fork∈ {1,2}, letDkOr:=ϑ−1 SkOr

. Note thatDOr=D1Or·∪DOr2 . Now, for each k6=l∈ {1,2}, let Bk denote the reduced circuit arising fromCk by fixing all inputsti∈Dl to α:= 1, and let gk:=f(Bk). Then, as all inputs in Dl are propagate signals, by considering the standard circuit forh(t), we observe that gk = (fk)[

DOrl . By construction, the essential variables ofgk are the variables ofSkOr andDOrk . Let tjk be the essential variable of gk withjk maximum.

Consider k∈ {1,2}. We show that gk is an And-Or path: First note that by Observation 5.2.2 and the choice ofα, every input ofgk except fortjk is a propagate signal (generate signal) ofgk if and only if it is a propagate signal (generate signal ofh(t). By definition of ϑ, for any two generate signals ti, tj of gk with i < j < jk, the propagate signaltj11(tj) of h(t) is an input ofgk. Furthermore, for any two propagate signals ti 6=tj of gk withi < j < jk, the generate signal ti+1 =ϑ(ti) of h(t) is an input of gk. Hence, the inputs of gk (except for tjk) are alternatingly propagate and generate signals andgk is anAnd-Orpath.

Letm1 and m2 be the numbers of inputs ofB1 and B2, respectively. As {t0, . . . , tm1}=SOr ·∪DOr=SOr1 ·∪S2Or ·∪D1Or ·∪D2Or,

we havem1+m2 = m. As B1 and B2 are both And-Orpath circuits with depth at mostd, we havem1, m2 ≤m(d,0). Together, this implies

m=m1+m2 ≤2m(d,0).

For the special case when all input arrival times are equal, we conjecture that partitions of the same-gate inputs into two “non-overlapping“ sets are always best for the delay.

Conjecture 5.2.11. Consider Theorem 5.2.9 for the case of uniform input arrival times and let S = S1 ·∪S2 be a partition as in the theorem. Then, for all inputs ti ∈S1 and tj ∈S2, we have i < j.

We will see in Section 6.3 why we assume this statement to be satisfied. For non-uniform arrival times, we already know that the conjecture is not fulfilled, see Figure 6.12 (page 189).

Algorithm 5.1: Exact algorithm for delay optimization of generalized And-Orpaths

Input: Boolean input variables t= (t0, . . . , tm1) with arrival times a(t0), . . . , a(tm1)∈R, and gate types Γ = (◦0, . . . ,◦m2).

Output: Optimum delay of any circuit over Ωmon computing h(t; Γ).

1 foreach∅ 6=I ⊆ {t0, . . . , tm1} do

2 Set d(I) :=∞.

3 return compute_opt({t0, . . . , tm−1}) // Assume that ∅ 6=I ⊆ {t0, . . . , tm1}.

4 procedure compute_opt(I)

5 Assume that I =

ti0, . . . , tir−1 with0≤i0 < . . . < ir−1≤m−1and letΓ0 := ◦i0, . . . ,◦ir2

.

6 if d(I)<∞ then

7 return d(I)

8 if r= 1 then

9 Set d(I) =a(tir−1).

10 return d(I)

11 foreach◦ ∈ {And,Or} do

12 Let S⊆I consist of all signals tij with◦ij =◦ andtir−1.

13 foreachpartition S =S1 ·∪S2 with S1, S26=∅do

14 foreachk∈ {1,2} do

15 LetIk denote the input set of h

ti0, . . . , tir1

; Γ0

Sk.

16 Let dk:= compute_opt(Ik).

17 Set d(I) = min

d(I),max{d1, d2}+ 1 .

18 return d(I)

which maps each generalized And-Orpath to its essential inputs. This map is in-jective as for h(t; Γ) by Observation 5.2.2, every input ti with i < ik (with ik as in Observation 5.2.2) that is essential for the generalized And-Or path h(t; Γ)Sk is a propagate signal (generate signal) of h(t; Γ)S

k if and only if it is a propagate signal (generate signal) of h(t; Γ).

Hence, we may identify a generalized And-Orpath considered during recursive applications of Theorem 5.2.9 with the set of its essential inputs. Algorithm 5.1 describes our algorithm which recursively applies Theorem 5.2.9 and stores the computed delays d(I) for subsets I of {t0, . . . , tm1} in a dynamic programming table of size at most 2m−1.

It is not hard to see that κ is actually a bijection: Given some subset ∅ 6=I ( {t0, . . . , tm1}, we need to find a series of partitions according to Theorem 5.2.9 such that the generalized And-Or path with essential inputs I arises. Choose i ∈ {0, . . . , m−1} maximum with ti ∈ I. Assume that ti is a generate signal (the other case follows by duality). First, use an Or gate and partition the same-gate signals SOr of h(t; Γ) and Or into those contained in I and the rest. Then, h(t; Γ)SOrI is a generalizedAnd-Orpath with the generate signals contained inI, plus all propagate signals tj of h with j < i. Afterwards, partition the propagate signals of h(t; Γ)SOrI into those contained in I and the rest. This yields the

generalizedAnd-Orpath with essential input setI.

In the following theorem, we estimate the running time of Algorithm 5.1.

Theorem 5.3.1. Let input variables t = (t0, . . . , tm1) with arrival times a(t0), . . . , a(tm1)∈R and gate types Γ = (◦0, . . . ,◦m2) be given. Then, Algo-rithm 5.1 computes the optimum delay of any circuit realizing the generalized And -Or path h(t; Γ). The dynamic programming table needed to store the delay of all generalized And-Or paths considered during the computation has 2m −1 entries.

Denoting by g and p the number of generate signals and propagate signals among t0, . . . , tm2, the algorithm can be implemented to run in time O(3g2p + 2g3p). In particular, if h(t; Γ) is an And-Or path, then the running time is O

√ 6n

. By backtracking, we can obtain a delay-optimum formula circuit forh(t; Γ).

Proof. We have already argued that a generalizedAnd-Orpath arising from recur-sive application of Theorem 5.2.9 can be identified with the set of its essential inputs via a bijection κ. Hence, by induction on m and Theorem 5.2.9, we can see that Algorithm 5.1 computes the optimum delay of any formula circuit for h(t; Γ). By Theorem 2.3.11, this is the optimum delay of any circuit for h(t; Γ). The dynamic programming table has size exactly2m−1.

Let T := {t0, . . . , tm−1}. The running time of Algorithm 5.1 is dominated by enumerating all partitions of the respective setS in line 13 for the two cases that

◦=Andor ◦ =Or for all subsets ∅ 6=I ⊆T. A partition of S into2 non-empty subsets corresponds to choosing a subsetS1 ⊆S\

tik and setting S2 := S\S1. By Observation 5.2.2, the generalized And-Orpaths h(t; Γ)SOr

1 and h(t; Γ)SOr

2 are

uniquely determined byI,SOr and S1Or.

Hence, it remains to bound the number of setsS1Or(SOr (I considered during the algorithm. For fixed I, SOr and S1Or, the following holds: A propagate signal of h may by in I or in T\I. Each generate signal of h has three options: it is contained in SOr1 , in SOr\S1Or or in {t0, . . . , tm1}\SOr. Hence, there are at most 3g2p partitions for the case that the split gate is anOr.

Similarly, when ◦ = And, we have 3p2g partitions. Summing up yields the running time bound.

Whenh(t; Γ)is anAnd-Orpath, we havep, g∈h n

2

,n

2

i

. Hence, the running time follows from the previous statement.

We call a circuit C strongly delay-optimum if each sub-circuit of C has optimum delay. Note that the formula circuit constructed by our algorithm is strongly delay-optimum. Our algorithm can naturally be adapted to compute a size-optimum circuit among all strongly delay-size-optimum circuits by storing both delay and size for each generalized And-Or path in line 17 and updating it accordingly.

However, for computing a optimum circuit with minimum size among all delay-optimum circuits, we would need to store multiple candidate circuits for each sub-circuit (cf. Section 6.1.4, where this is done for another algorithm) which we did not implement so far.

In Figure 5.4, we show two depth-optimum formula circuits for theAnd-Orpath g (t0, . . . , t14)

. The circuit in Figure 5.4(a) is a circuit with best depth and size17 computed by Algorithm 6.3, while the circuit in Figure 5.4(b) is size-optimum among all strongly delay-optimum formula circuits, hence a possible output of Algorithm 5.1.

Note that in Figure 5.4(a), the left predecessor of the output gate computes anAnd

-t13 t12 t11 t10 t9 t8 t7 t6 t5 t4 t3 t2 t1 t0

(a) A size-optimum formula circuit forg(t)with size17.

t13

t12

t11

t10

t9

t8

t7

t6

t5

t4

t3

t2

t1

t0

(b) A size-optimum circuit among all strongly delay-optimum formula circuits forg(t)with size18.

Figure 5.4: Two formula circuits for the And-Or path g(t) with t = (t0, . . . , t13) with optimum depth 5. They only differ in the left sub-circuit of the final output.

Orpath on5inputs with a depth of4and a size of5. In Figure 5.4(b), we instead use an implementation with depth3and size4, which increases the size by1, but makes the circuit strongly delay-optimum. This can be verified using the lower bound of dlog2neon each sub-circuit with n inputs.

Note that in Figure 5.4(b), Conjecture 5.2.11 is fulfilled. For instance, for the outermost partition, we have SAnd={t0, t2, t4} ·∪ {t6, t8, t10, t12, t13}.

There are two other exact algorithms for the special case of depth optimization of And-Or paths. Grinchuk [Gri13] provides an exact algorithm for depth opti-mization of And-Or paths, but with a running time of Ω(4m), see Section 2.6.4.

The theoretical running time of our algorithm for the special case of And-Orpaths coincides with the running time of the formula enumeration algorithm by Hegerfeld [Heg18] for depth optimization, see his Theorem 4.2.16. However, for depth opti-mization of And-Or paths, we shall improve our algorithm to obtain a running time of O(m2.02m) in Theorem 5.4.6. In his algorithm, Hegerfeld does not directly enumerate formula circuits for And-Or paths, but so-called rectangle-good proto-col trees for Karchmer-Wigderson games (see Karchmer and Wigderson [KW90]) for And-Or paths, which originate from the area of communication complexity. From these, he derives his formula circuits.

Hegerfeld [Heg18] computes a formula circuit with optimum size among all strongly delay-optimum formula circuits, although he states that he even computes a size-optimum formula circuit among all delay-optimum formula circuits. For in-stance, for the And-Or path on 14 inputs, Hegerfeld reports a size of 18 (see Ta-ble 5.4), but in Figure 5.4(a), we saw a depth-optimum formula circuit with size17.

We shall see in Table 5.4 that the practical running times of Hegerfeld’s algorithm are much worse than ours. One reason for this is our more efficient practical implementation which we present in Section 5.5. Another reason is that for depth optimization of And-Orpaths, the algorithm and its running time can further be improved as in described the following section.