Application: Loop-invariant Code Example:

(1)

1.8

Application: Loop-invariant Code

Example:

for (i = 0;i < n; i++) a[i] = b + 3;

// The expression b + 3 is recomputed in every iteration :-(

// This should be avoided :-)

(2)

The Control-flow Graph:

3 2

4 5 7

6 0

1

i = 0;

Neg(i < n) Pos(i < n)

y = b + 3;

A₁ = A + i;

M[A₁] = y;

(3)

Warning:

T = b + 3; may not be placed before the loop :

3 4 5 7

6 2

1 0

i = 0;

A¹ = A + i;

i = i+ 1;

T = b + 3;

y = T;

M[A¹] = y;

==⇒ There is no decent place for T = b + 3; :-(

(4)

Idea:

Transform into a do-while-loop ...

3 2

4 5 0

1

i = 0;

Pos(i < n)

A1 = A + i;

i = i+ 1;

Neg(i < n)

y = b+ 3;

M[A¹] = y;

(5)

... now there is a place for T = e; :-)

3 2

4 5 7 6

0

1

i = 0;

A¹ = A + i;

i = i+ 1;

Neg(i < n) Pos(i < n) Neg(i < n)

Pos(i < n)

T = b + 3;

y = T;

M[A¹] = y;

(6)

Application of T5 (PRE) :

3 2

4 5 0

1

i = 0;

Pos(i < n)

A¹ = A + i;

i = i + 1;

Neg(i < n)

y = b + 3;

M[A¹] = y;

A B

0 ∅ ∅

1 ∅ ∅

2 ∅ {b + 3}

3 {b + 3} ∅ 4 {b + 3} ∅ 5 {b + 3} ∅ 6 {b + 3} ∅

6 ∅ ∅

(7)

Application of T5 (PRE) :

3 2

4 5 0

1

i = 0;

7 6

Pos(i < n)

A¹ = A + i;

i = i + 1;

Neg(i < n)

Neg(i < n) Pos(i < n) y = b + 3;

M[A¹] = y;

A B

0 ∅ ∅

1 ∅ ∅

2 ∅ {b + 3}

3 {b + 3} ∅ 4 {b + 3} ∅ 5 {b + 3} ∅ 6 {b + 3} ∅

6 ∅ ∅

7 ∅ ∅

(8)

Conclusion:

• Elimination of partial redundancies may move loop-invariant code out of the loop :-))

• This only works properly for do-while-loops :-(

• To optimize other loops, we transform them into do-while-loops before-hand:

while (b) stmt ==⇒ if (b)

do stmt while (b);

(9)

Problem:

If we do not have the source program at hand, we must re-construct potential loop headers ;-)

==⇒ Pre-dominators

u pre-dominates v , if every path π : start →^∗ v contains u. We write: u ⇒ v .

“⇒” is reflexive, transitive and anti-symmetric :-)

(10)

Computation:

We collect the nodes along paths by means of the analysis:

P = 2^{N odes} , ⊑ = ⊇

[[(_, _, v)]]^♯ P = P ∪ {v}

Then the set P[v] of pre-dominators is given by:

P[v] = \

{[[π]]^♯ {start} | π : start →^∗ v}

(11)

Since [[k]]^♯ are distributive, the P[v] can computed by means of fixpoint iteration :-)

Example:

3 2

4 5

0

1

P

0 {0}

1 {0, 1}

2 {0,1, 2}

3 {0, 1, 2,3}

4 {0,1,2, 3, 4}

5 {0,1, 5}

(12)

The partial ordering “⇒” in the example:

3 2

4 0

1 5

P

0 {0}

1 {0, 1}

2 {0, 1,2}

3 {0, 1, 2, 3}

4 {0, 1, 2,3, 4}

5 {0, 1,5}

(13)

Apparently, the result is a tree :-) In fact, we have:

Theorem:

Every node v has at most one immediate pre-dominator.

Proof:

Assume:

there are u₁ 6= u₂ which immediately pre-dominate v.

If u₁ ⇒ u₂ then u₁ not immediate.

Consequently, u₁, u₂ are incomparable :-)

(14)

Now for every π : start →^∗ v :

π = π₁ π₂ with π₁ : start →^∗ u₁ π₂ : u₁ →^∗ v

If, however, u₁, u₂ are incomparable, then there is path: start →^∗ v avoiding u₂ :

start u₁

u₂ u₂

v

(15)

Now for every π : start →^∗ v :

π = π₁ π₂ with π₁ : start →^∗ u₁ π₂ : u₁ →^∗ v

If, however, u₁, u₂ are incomparable, then there is path: start →^∗ v avoiding u₂ :

start u₁

u₂ u₂

v

(16)

Observation:

The loop head of a while-loop pre-dominates every node in the body.

A back edge from the exit u to the loop head v can be identified through

v ∈ P[u]

:-)

Accordingly, we define:

(17)

Transformation 6:

u

v

u² u² u

lab Pos (e) Neg (e)

v

lab Pos (e) Neg (e)

Neg (e) Pos (e) u2,v ∈ P[u]

u1 6∈ P[u]

u¹ u¹

We duplicate the entry check to all back edges :-)

(18)

... in the Example:

3 2

4 5 7

0

1

i = 0;

6

A¹ = A + i;

i = i + 1;

y = b + 3;

M[A¹] = y;

(19)

... in the Example:

3 2

4 5 7

0

1

i = 0;

6

A¹ = A + i;

0, 1

0, 1, 2

0, 1, 2, 3, 4 0, 1, 2, 3 0, 1, 7

0

0, 1, 2, 3, 4, 5 0, 1, 2, 3, 4, 5, 6 i = i + 1;

y = b + 3;

M[A¹] = y;

(20)

... in the Example:

3 2

4 5 7

0

1

i = 0;

6

A¹ = A + i;

0, 1

0, 1, 2

0, 1, 2, 3, 4 0, 1, 2, 3 0, 1, 7

0

0, 1, 2, 3, 4, 5 0, 1, 2, 3, 4, 5, 6 i = i + 1;

M[A¹] = y;

y = b + 3;

(21)

... in the Example:

3 2

4 5 7

0

1

i = 0;

6

Neg(i < n) Pos(i < n) y = b + 3;

A¹ = A + i;

0, 1

0, 1, 2

0, 1, 2, 3, 4 0, 1, 2, 3 0, 1, 7

0

0, 1, 2, 3, 4, 5 0, 1, 2, 3, 4, 5, 6 i = i + 1;

Pos(i < n) Neg(i < n)

M[A¹] = y;

(22)

Warning:

There are unusual loops which cannot be rotated:

3 2 0

4 1

3 2 0

1

4 Pre-dominators:

(23)

... but also common ones which cannot be rotated:

3 2

4 5

0

1

3 2

4 0

1 5

Here, the complete block between back edge and conditional jump should be duplicated :-(

(24)

3 2

4 5

0

1

3 2

4 0

1 5

Here, the complete block between back edge and conditional jump should

(25)

3 2

4 5

0

1

5

3 2

4 1

0

Here, the complete block between back edge and conditional jump should be duplicated :-(

(26)

1.9

Eliminating Partially Dead Code

Example:

0

1

2

3 4

T = x + 1;

M[x] = T;

(27)

Idea:

0

1

2

3 4

0

1

2

3 4

T = x + 1;

M[x] = T; M[x] = T;

T = x + 1;

(28)

Problem:

• The definition x = e; (x 6∈ Vars_e) may only be moved to an edge where e is safe ;-)

• The definition must still be available for uses of x ;-)

==⇒

We define an analysis which maximally delays computations:

[[;]]^♯ D = D

[[x = e;]]^♯ D =

( D\(Usee ∪ Def _x) ∪ {x = e;} if x 6∈ Vars_e D\(Usee ∪ Def _x) if x ∈ Vars_e

(29)

... where:

Use_e = {y = e^′; | y ∈ Vars_e}

Def _x = {y = e^′; | y ≡ x ∨ x ∈ Vars_e^′}

(30)

... where:

Use_e = {y = e^′; | y ∈ Vars_e}

Def _x = {y = e^′; | y ≡ x ∨ x ∈ Vars_e^′}

For the remaining edges, we define:

[[x = M[e];]]^♯ D = D\(Usee ∪ Def _x) [[M[e₁] = e₂;]]^♯ D = D\(Usee₁ ∪ Use_e₂)

[[Pos(e)]]^♯ D = [[Neg(e)]]^♯ D = D\Use_e

(31)

Warning:

We may move y = e; beyond a join only if y = e; can be delayed along all joining edges:

0

1

2

3 4

T = x + 1;

x = M[T];

Here, T = x + 1; cannot be moved beyond 1 !!!

(32)

We conclude:

• The partial ordering of the lattice for delayability is given by “⊇”.

• At program start: D₀ = ∅.

Therefore, the sets D[u] of at u delayable assignments can be computed by solving a system of constraints.

• We delay only assignments a where a a has the same effect as a alone.

• The extra insertions render the original assignments as assignments to dead variables ...

(33)

Transformation 7:

v u

lab lab

v u

a ∈ D[u]\[[lab]]^♯(D[u])

a ∈ [[lab]]^♯(D[u])\D[v]

(34)

v₁ v₂ u

u

v₁ v₂

Pos(e) Neg(e)

u

Pos(e) Neg(e)

a ∈ D[u]\[[Pos(e)]]^♯(D[u])

a ∈[[Neg(e)]]^♯(D[u])\D[v₁] a ∈[[Pos(e)]]^♯(D[u])\D[v₂]

Note:

Transformation T7 is only meaningful, if we subsequently eliminate assignments to dead variables by means of transformation T2 :-)

In the example, the partially dead code is eliminated:

(35)

0

1

2

3 4

T = x + 1;

M[x] = T;

D

0 ∅

1 {T = x + 1;}

2 {T = x + 1;}

3 ∅

4 ∅

(36)

0

1

4

2

3

M[x] = T; T = x+ 1;

T = x + 1;

T = x+ 1; D

0 ∅

1 {T = x + 1;}

2 {T = x + 1;}

3 ∅

4 ∅

(37)

0

1

4

2

3

M[x] = T; T = x+ 1;

;

L 0 {x}

1 {x}

2 {x}

2^′ {x,T}

3 ∅

4 ∅

(38)

Remarks:

• After T7 , all original assignments y = e; with y 6∈ Vars_e are assignments to dead variables and thus can always be eliminated :-)

• By this, it can be proven that the transformation is guaranteed to be non-degradating efficiency of the code :-))

• Similar to the elimination of partial redundancies, the transformation can be repeated :-}

(39)

Conclusion:

→ The design of a meaningful optimization is non-trivial.

→ Many transformations are advantageous only in connection with other optimizations :-)

→ The ordering of applied optimizations matters !!

→ Some optimizations can be iterated !!!

(40)

... a meaningful ordering:

T4 Constant Propagation Interval Analysis

Alias Analysis T6 Loop Rotation

T1, T3, T2 Available Expressions T2 Dead Variables

T7, T2 Partially Dead Code

T5, T3, T2 Partially Redundant Code

(41)

2 Replacing Expensive Operations by Cheaper Ones

2.1

Reduction of Strength

(1) Evaluation of Polynomials

f (x) = an · xⁿ + an−1 · xⁿ⁻¹ + . . . + a₁ · x + a₀

Multiplications Additions

naive ¹₂n(n + 1) n

re-use 2n − 1 n

Horner-Scheme n n

(42)

Idea:

f (x) = (. . .((an · x + an−1) · x + an−2). . .) · x + a₀

(2) Tabulation of a polynomial

f(x) of degree n :

→ To recompute f(x) for every argument x is too expensive :-)

→ Luckily, the n-th differences are constant !!!

(43)

Example:

f(x) = 3x³ − 5x² + 4x + 13

n f(n) ∆ ∆² ∆³

0 13 2 8 18

1 15 10 26

2 25 36

3 61

4 . . .

Here, the n-th difference is always

∆ⁿ_h(f) = n! · an · hⁿ (h step width)

(44)

Costs:

• n times evaluation of f ;

• ¹₂ · (n − 1) · n subtractions to determine the ∆^k ;

• n additions for every further value :-)

==⇒

Number of multiplications only depends on n :-))

(45)

Simple Case: f (x) = a

₁

· x + a

₀

• ... naturally occurs in many numerical loops :-)

• The first differences are already constant:

f (x + h) − f (x) = a₁ · h

• Instead of the sequence: yi = f (x₀ + i · h), i ≥ 0 we compute: y₀ = f (x₀), ∆ = a₁ · h

yi = yi−1 + ∆, i > 0

(46)

Example:

for (i = i⁰;i < n;i = i + h) {

A = A0 + b · i;

M[A] = . . .; }

2 0

1

5 6

3 4 i = i⁰;

A = A⁰ + b· i;

i = i+ h;

M[A] = . . .;

(47)

... or, after loop rotation:

i = i⁰;

if (i < n) do {

A = A0 + b · i;

M[A] = . . .; i = i+ h;

} while (i < n);

2 0

5 6

3 4 1

i = i⁰;

A = A0 + b· i;

i = i+ h;

M[A] = . . .;

(48)

... and reduction of strength:

i = i⁰;

if (i < n) {

∆ = b · h;

A = A⁰ +b · i⁰; do {

M[A] = . . .; i = i +h;

A = A + ∆;

} while (i < n);

}

2

5 6

3 4 0

1

i = i⁰;

Neg(i < n)

Pos(i < n)

M[A] = . . .; i = i +h;

A = A + ∆;

∆ = b · h;

A = A⁰ + b · i;

(49)

Warning:

• The values b, h, A₀ must not change their values during the loop.

• i, A may be modified at exactly one position in the loop :-(

• One may try to eliminate the variable i altogether :

→ i may not be used else-where.

→ The initialization must be transformed into:

A = A₀ + b · i₀ .

→ The loop condition i < n must be transformed into:

A < N for N = A₀ + b · n .

→ b must always be different from zero !!!