1.8
Application: Loop-invariant Code
Example:
for (i = 0;i < n; i++) a[i] = b + 3;
// The expression b + 3 is recomputed in every iteration :-(
// This should be avoided :-)
The Control-flow Graph:
3 2
4 5 7
6 0
1
i = 0;
Neg(i < n) Pos(i < n)
y = b + 3;
A1 = A + i;
M[A1] = y;
Warning:
T = b + 3; may not be placed before the loop :3 4 5 7
6 2
1 0
i = 0;
Neg(i < n) Pos(i < n)
A1 = A + i;
i = i+ 1;
T = b + 3;
y = T;
M[A1] = y;
==⇒ There is no decent place for T = b + 3; :-(
Idea:
Transform into a do-while-loop ...3 2
4 5 0
1
i = 0;
Pos(i < n)
A1 = A + i;
i = i+ 1;
Neg(i < n)
y = b+ 3;
M[A1] = y;
... now there is a place for T = e; :-)
3 2
4 5 7 6
0
1
i = 0;
A1 = A + i;
i = i+ 1;
Neg(i < n) Pos(i < n) Neg(i < n)
Pos(i < n)
T = b + 3;
y = T;
M[A1] = y;
Application of T5 (PRE) :
3 2
4 5 0
1
i = 0;
Pos(i < n)
A1 = A + i;
i = i + 1;
Neg(i < n)
y = b + 3;
M[A1] = y;
A B
0 ∅ ∅
1 ∅ ∅
2 ∅ {b + 3}
3 {b + 3} ∅ 4 {b + 3} ∅ 5 {b + 3} ∅ 6 {b + 3} ∅
6 ∅ ∅
Application of T5 (PRE) :
3 2
4 5 0
1
i = 0;
7 6
Pos(i < n)
A1 = A + i;
i = i + 1;
Neg(i < n)
Neg(i < n) Pos(i < n) y = b + 3;
M[A1] = y;
A B
0 ∅ ∅
1 ∅ ∅
2 ∅ {b + 3}
3 {b + 3} ∅ 4 {b + 3} ∅ 5 {b + 3} ∅ 6 {b + 3} ∅
6 ∅ ∅
7 ∅ ∅
Conclusion:
• Elimination of partial redundancies may move loop-invariant code out of the loop :-))
• This only works properly for do-while-loops :-(
• To optimize other loops, we transform them into do-while-loops before-hand:
while (b) stmt ==⇒ if (b)
do stmt while (b);
Problem:
If we do not have the source program at hand, we must re-construct potential loop headers ;-)
==⇒ Pre-dominators
u pre-dominates v , if every path π : start →∗ v contains u. We write: u ⇒ v .
“⇒” is reflexive, transitive and anti-symmetric :-)
Computation:
We collect the nodes along paths by means of the analysis:
P = 2N odes , ⊑ = ⊇
[[(_, _, v)]]♯ P = P ∪ {v}
Then the set P[v] of pre-dominators is given by:
P[v] = \
{[[π]]♯ {start} | π : start →∗ v}
Since [[k]]♯ are distributive, the P[v] can computed by means of fixpoint iteration :-)
Example:
3 2
4 5
0
1
P
0 {0}
1 {0, 1}
2 {0,1, 2}
3 {0, 1, 2,3}
4 {0,1,2, 3, 4}
5 {0,1, 5}
The partial ordering “⇒” in the example:
3 2
4 0
1 5
P
0 {0}
1 {0, 1}
2 {0, 1,2}
3 {0, 1, 2, 3}
4 {0, 1, 2,3, 4}
5 {0, 1,5}
Apparently, the result is a tree :-) In fact, we have:
Theorem:
Every node v has at most one immediate pre-dominator.
Proof:
Assume:
there are u1 6= u2 which immediately pre-dominate v.
If u1 ⇒ u2 then u1 not immediate.
Consequently, u1, u2 are incomparable :-)
Now for every π : start →∗ v :
π = π1 π2 with π1 : start →∗ u1 π2 : u1 →∗ v
If, however, u1, u2 are incomparable, then there is path: start →∗ v avoiding u2 :
start u1
u2 u2
v
Now for every π : start →∗ v :
π = π1 π2 with π1 : start →∗ u1 π2 : u1 →∗ v
If, however, u1, u2 are incomparable, then there is path: start →∗ v avoiding u2 :
start u1
u2 u2
v
Observation:
The loop head of a while-loop pre-dominates every node in the body.
A back edge from the exit u to the loop head v can be identified through
v ∈ P[u]
:-)
Accordingly, we define:
Transformation 6:
u
v
u2 u2 u
lab Pos (e) Neg (e)
v
lab Pos (e) Neg (e)
Neg (e) Pos (e) u2,v ∈ P[u]
u1 6∈ P[u]
u1 u1
We duplicate the entry check to all back edges :-)
... in the Example:
3 2
4 5 7
0
1
i = 0;
6
Neg(i < n) Pos(i < n)
A1 = A + i;
i = i + 1;
y = b + 3;
M[A1] = y;
... in the Example:
3 2
4 5 7
0
1
i = 0;
6
Neg(i < n) Pos(i < n)
A1 = A + i;
0, 1
0, 1, 2
0, 1, 2, 3, 4 0, 1, 2, 3 0, 1, 7
0
0, 1, 2, 3, 4, 5 0, 1, 2, 3, 4, 5, 6 i = i + 1;
y = b + 3;
M[A1] = y;
... in the Example:
3 2
4 5 7
0
1
i = 0;
6
Neg(i < n) Pos(i < n)
A1 = A + i;
0, 1
0, 1, 2
0, 1, 2, 3, 4 0, 1, 2, 3 0, 1, 7
0
0, 1, 2, 3, 4, 5 0, 1, 2, 3, 4, 5, 6 i = i + 1;
M[A1] = y;
y = b + 3;
... in the Example:
3 2
4 5 7
0
1
i = 0;
6
Neg(i < n) Pos(i < n) y = b + 3;
A1 = A + i;
0, 1
0, 1, 2
0, 1, 2, 3, 4 0, 1, 2, 3 0, 1, 7
0
0, 1, 2, 3, 4, 5 0, 1, 2, 3, 4, 5, 6 i = i + 1;
Pos(i < n) Neg(i < n)
M[A1] = y;
Warning:
There are unusual loops which cannot be rotated:
3 2 0
4 1
3 2 0
1
4 Pre-dominators:
... but also common ones which cannot be rotated:
3 2
4 5
0
1
3 2
4 0
1 5
Here, the complete block between back edge and conditional jump should be duplicated :-(
... but also common ones which cannot be rotated:
3 2
4 5
0
1
3 2
4 0
1 5
Here, the complete block between back edge and conditional jump should
... but also common ones which cannot be rotated:
3 2
4 5
0
1
5
3 2
4 1
0
Here, the complete block between back edge and conditional jump should be duplicated :-(
1.9
Eliminating Partially Dead Code
Example:
0
1
2
3 4
T = x + 1;
M[x] = T;
Idea:
0
1
2
3 4
0
1
2
3 4
T = x + 1;
M[x] = T; M[x] = T;
T = x + 1;
Problem:
• The definition x = e; (x 6∈ Varse) may only be moved to an edge where e is safe ;-)
• The definition must still be available for uses of x ;-)
==⇒
We define an analysis which maximally delays computations:
[[;]]♯ D = D
[[x = e;]]♯ D =
( D\(Usee ∪ Def x) ∪ {x = e;} if x 6∈ Varse D\(Usee ∪ Def x) if x ∈ Varse
... where:
Usee = {y = e′; | y ∈ Varse}
Def x = {y = e′; | y ≡ x ∨ x ∈ Varse′}
... where:
Usee = {y = e′; | y ∈ Varse}
Def x = {y = e′; | y ≡ x ∨ x ∈ Varse′}
For the remaining edges, we define:
[[x = M[e];]]♯ D = D\(Usee ∪ Def x) [[M[e1] = e2;]]♯ D = D\(Usee1 ∪ Usee2)
[[Pos(e)]]♯ D = [[Neg(e)]]♯ D = D\Usee
Warning:
We may move y = e; beyond a join only if y = e; can be delayed along all joining edges:
0
1
2
3 4
T = x + 1;
x = M[T];
Here, T = x + 1; cannot be moved beyond 1 !!!
We conclude:
• The partial ordering of the lattice for delayability is given by “⊇”.
• At program start: D0 = ∅.
Therefore, the sets D[u] of at u delayable assignments can be computed by solving a system of constraints.
• We delay only assignments a where a a has the same effect as a alone.
• The extra insertions render the original assignments as assignments to dead variables ...
Transformation 7:
v u
lab lab
v u
a ∈ D[u]\[[lab]]♯(D[u])
a ∈ [[lab]]♯(D[u])\D[v]
v1 v2 u
u
v1 v2
Pos(e) Neg(e)
u
Pos(e) Neg(e)
a ∈ D[u]\[[Pos(e)]]♯(D[u])
a ∈[[Neg(e)]]♯(D[u])\D[v1] a ∈[[Pos(e)]]♯(D[u])\D[v2]
Note:
Transformation T7 is only meaningful, if we subsequently eliminate assignments to dead variables by means of transformation T2 :-)
In the example, the partially dead code is eliminated:
0
1
2
3 4
T = x + 1;
M[x] = T;
D
0 ∅
1 {T = x + 1;}
2 {T = x + 1;}
3 ∅
4 ∅
0
1
4
2
3
M[x] = T; T = x+ 1;
T = x + 1;
T = x+ 1; D
0 ∅
1 {T = x + 1;}
2 {T = x + 1;}
3 ∅
4 ∅
0
1
4
2
3
M[x] = T; T = x+ 1;
;
;
L 0 {x}
1 {x}
2 {x}
2′ {x,T}
3 ∅
4 ∅
Remarks:
• After T7 , all original assignments y = e; with y 6∈ Varse are assignments to dead variables and thus can always be eliminated :-)
• By this, it can be proven that the transformation is guaranteed to be non-degradating efficiency of the code :-))
• Similar to the elimination of partial redundancies, the transformation can be repeated :-}
Conclusion:
→ The design of a meaningful optimization is non-trivial.
→ Many transformations are advantageous only in connection with other optimizations :-)
→ The ordering of applied optimizations matters !!
→ Some optimizations can be iterated !!!
... a meaningful ordering:
T4 Constant Propagation Interval Analysis
Alias Analysis T6 Loop Rotation
T1, T3, T2 Available Expressions T2 Dead Variables
T7, T2 Partially Dead Code
T5, T3, T2 Partially Redundant Code
2 Replacing Expensive Operations by Cheaper Ones
2.1
Reduction of Strength
(1) Evaluation of Polynomials
f (x) = an · xn + an−1 · xn−1 + . . . + a1 · x + a0
Multiplications Additions
naive 12n(n + 1) n
re-use 2n − 1 n
Horner-Scheme n n
Idea:
f (x) = (. . .((an · x + an−1) · x + an−2). . .) · x + a0
(2) Tabulation of a polynomial
f(x) of degree n :→ To recompute f(x) for every argument x is too expensive :-)
→ Luckily, the n-th differences are constant !!!
Example:
f(x) = 3x3 − 5x2 + 4x + 13n f(n) ∆ ∆2 ∆3
0 13 2 8 18
1 15 10 26
2 25 36
3 61
4 . . .
Here, the n-th difference is always
∆nh(f) = n! · an · hn (h step width)
Costs:
• n times evaluation of f ;
• 12 · (n − 1) · n subtractions to determine the ∆k ;
• n additions for every further value :-)
==⇒
Number of multiplications only depends on n :-))
Simple Case: f (x) = a
1· x + a
0• ... naturally occurs in many numerical loops :-)
• The first differences are already constant:
f (x + h) − f (x) = a1 · h
• Instead of the sequence: yi = f (x0 + i · h), i ≥ 0 we compute: y0 = f (x0), ∆ = a1 · h
yi = yi−1 + ∆, i > 0
Example:
for (i = i0;i < n;i = i + h) {
A = A0 + b · i;
M[A] = . . .; }
2 0
1
5 6
3 4 i = i0;
Pos(i < n) Neg(i < n)
A = A0 + b· i;
i = i+ h;
M[A] = . . .;
... or, after loop rotation:
i = i0;
if (i < n) do {
A = A0 + b · i;
M[A] = . . .; i = i+ h;
} while (i < n);
2 0
5 6
3 4 1
Pos(i < n) Neg(i < n)
i = i0;
A = A0 + b· i;
i = i+ h;
M[A] = . . .;
Neg(i < n) Pos(i < n)
... and reduction of strength:
i = i0;
if (i < n) {
∆ = b · h;
A = A0 +b · i0; do {
M[A] = . . .; i = i +h;
A = A + ∆;
} while (i < n);
}
2
5 6
3 4 0
1
i = i0;
Neg(i < n)
Pos(i < n)
M[A] = . . .; i = i +h;
A = A + ∆;
∆ = b · h;
A = A0 + b · i;
Warning:
• The values b, h, A0 must not change their values during the loop.
• i, A may be modified at exactly one position in the loop :-(
• One may try to eliminate the variable i altogether :
→ i may not be used else-where.
→ The initialization must be transformed into:
A = A0 + b · i0 .
→ The loop condition i < n must be transformed into:
A < N for N = A0 + b · n .
→ b must always be different from zero !!!