Compilers and Language Processing Tools
Summer Term 2011
Prof. Dr. Arnd Poetzsch-Heffter
Software Technology Group TU Kaiserslautern
Content of Lecture
1. Introduction
2. Syntax and Type Analysis 2.1 Lexical Analysis
2.2 Context-Free Syntax Analysis 2.3 Context-Dependent Analysis 3. Translation to Target Language
3.1 Translation of Imperative Language Constructs 3.2 Translation of Object-Oriented Language Constructs 4. Selected Topics in Compiler Construction
4.1 Intermediate Languages 4.2 Optimization
4.3 Register Allocation 4.4 Just-in-time Compilation 4.5 Further Aspects of Compilation
4. Selected Topics in Compiler
Construction
Chapter Outline
4. Selected Topics in Compiler Construction 4.1 Intermediate Languages
4.1.1 3-Address Code
4.1.2 Other Intermediate Languages 4.2 Optimization
4.2.1 Classical Optimization Techniques 4.2.2 Potential of Optimizations
4.2.3 Data Flow Analysis 4.2.4 Non-local Optimization 4.3 Register Allocation
4.3.1 Sethi-Ullman Algorithm
4.3.2 Register Allocation by Graph Coloring 4.4 Just-in-time Compilation
4.5 Further Aspects of Compilation
Selected topics in compiler construction
Focus:
• Techniques that go beyond the direct translation of source languages to target languages
• Concentrate on concepts instead of language-dependent details
• Use program representations tailored for the considered tasks (instead of source language syntax):
I simplifies representation
I (but needs more work to integrate tasks)
Selected topics in compiler construction (2)
Learning objectives:
• Intermediate languages for translation and optimization of imperative languages
• Different optimization techniques
• Different static analysis techniques for (intermediate) programs
• Register allocation
• Some aspects of code generation
4.1 Intermediate languages
Intermediate languages
• Intermediate languages are used as
I appropriate program representation for certain language implementation tasks
I common representation of programs of different source languages Source
Language 1
Source Language 2
Source Language n
Intermediate Language
Target Language 1
Target Language 2
Target Language m ...
...
Intermediate languages (2)
• Intermediate languages for translation are comparable to data structures in algorithm design, i.e., for each task, an intermediate language is more or less suitable.
• Intermediate languages can conceptually be seen as abstract machines.
4.1.1 3-Address Code
3-address code
3-address code (3AC) is a common intermediate language with many variants.
Properties:
• only elementary data types (but often arrays)
• no nested expressions
• sequential execution, jumps and procedure calls as statements
• named variables as in a high level language
• unbounded number of temporary variables
3-address code (2)
A program in 3AC consists of
• a list of global variables
• a list of procedures with parameters and local variables
• a main procedure
• each procedure has a sequence of 3AC commands as body
3AC commands
Syntax Explanation
x := y bop z x : = uop z x:= y
x: variable (global, local, parameter, temporary) y,z: variable or constant
bop: binary operator uop: unary operator goto L
if x cop y goto L
jump or conditional jump to label L cop: comparison operator
only procedure-local jumps x:= a[i]
a[i]:= y a one-dimensional array x : = & a
x:= *y
a global, local variable or parameter
& a address of a
3AC commands (2)
Syntax Explanation
param x call p return y
call p(x1, ..., xn) is encoded as:
(block is considered as one command)
param x1 ...
param xn call p
return y causes jump to return address with (optional) result y
We assume that 3AC only contains labels for which jumps are used in the program.
Basic blocks
A sequence of 3AC commands can be uniquely partitioned into basic blocks.
Abasic block Bis a maximal sequence of commands such that
• at the end of B, exactly one jump, procedure call, or return command occurs
• labels only occur at the first command of a basic block
Basic blocks (2)
Remarks:
• The commands of a basic block are always executed sequentially, there are no jumps to the inside
• Often, a designated exit-block for a procedure containing the return jump at its end is required. This is handled by additional transformations.
• The transitions between basic blocks are often denoted by flow charts.
Example: 3AC and basic blocks
Consider the following C program:
Beispiel: (3AC und Basisblöcke)
Wir betrachten den 3AC für ein C-Programm:
int a[2];
int b[7];
int skprod(int i1, int i2, int lng) {... } int main( ) {
a[0] = 1; a[1] = 2;
b[0] = 4; b[1] = 5; b[2] = 6;
skprod(0 1 2);
skprod(0,1,2);
return 0;
}
3AC mit Basisblockzerlegung für die Prozedur main:
main:
a[0] := 1 a[0] := 1 a[1] := 2 b[0] := 4 b[1] := 5 b[2] := 6 param 0 param 1 param 2 call skprod
c
Prof. Dr. Arnd Poetzsch-Heffter Selected Topics in Compiler Construction 17
Intermediate languages 3-Address Code
Example: 3AC and basic blocks (2)
3AC with basic block partitioning for main procedure Beispiel: (3AC und Basisblöcke) Wir betrachten den 3AC für ein C-Programm:
int a[2];
int b[7];
int skprod(int i1, int i2, int lng) {... } int main( ) {
a[0] = 1; a[1] = 2;
b[0] = 4; b[1] = 5; b[2] = 6;
skprod(0 1 2);
skprod(0,1,2);
return 0;
}
3AC mit Basisblockzerlegung für die Prozedur main:
main:
a[0] := 1 a[0] := 1 a[1] := 2 b[0] := 4 b[1] := 5 b[2] := 6 param 0 param 1 param 2 call skprod return 0
Example: 3AC and basic blocks (3)
Procedure skprod:Prozedur skprod mit 3AC und Basisblockzerlegung:
int skprod(int i1, int i2, int lng) { int ix, res = 0;
for( ix=0; ix <= lng-1; ix++ ){
res += a[i1+ix] * b[i2+ix];
}
skprod:
}
return res;
}
res:= 0 ix := 0
t0 := lng-1 if ix<=t0
true false
t1 := i1+ix t2 := a[t1]
t1 := i2+ix t3 := b[t1]
t1 := t2*t3
return res t1 := t2*t3
res:= es+t1 ix := ix+1
c
Prof. Dr. Arnd Poetzsch-Heffter Selected Topics in Compiler Construction 19
Example: 3AC and basic blocks (4)
Procedure skprod as 3AC with basic blocks
int skprod(int i1, int i2, int lng) { int ix, res = 0;
for( ix=0; ix <= lng-1; ix++ ){
res += a[i1+ix] * b[i2+ix];
}
skprod:
}
return res;
}
res:= 0 ix := 0
t0 := lng-1 if ix<=t0
true false
t1 := i1+ix t2 := a[t1]
t1 := i2+ix t3 := b[t1]
t1 := t2*t3
return res t1 := t2*t3
res:= es+t1 ix := ix+1
Intermediate Language Variations
3 AC after elimination of array operations (at above example)
Variation im Rahmen einer Zwischensprache:
3-Adress-Code nach Elimination von Feldoperationen anhand des obigen Beispiels:
skprod:p
res:= 0 ix := 0
t0 := lng-1 if ix<=t0
t1 := i1+ix tx := t1*4 ta := a+tx
true false
return res t2 := *ta
t1 := i2+ix tx := t1*4 tb := b+tx t3 *tb t3 := *tb t1 := t2*t3 res:= res+t1
Characteristics of 3-Address Code
• Control flow is explicit.
• Only elementary operations
• Rearrangement and exchange of commands can be handled relatively easily.
4.1.2 Other Intermediate Languages
Further Intermediate Languages
We consider
• 3AC inStatic Single Assignment (SSA)representation
• Stack Machine Code
Single Static Assignment Form
If a variableais read at a program position, this is auseofa.
If a variableais written at a program position, this is adefinitionofa. For optimizations, the relationship between use and definition of variables is important.
In SSA representation, each variable has exactly one definition. Thus, relationship between use and definition in the intermediate language is explicit.
Single Static Assignment Form (2)
SSA is essentially a refinement of 3AC.
The different definitions of one variable are represented by indexing the variable.
For sequential command lists, this means that
• at each definition position, the variable gets a different index.
• at the use position, the variable has the index of its last definition.
Intermediate languages Other Intermediate Languages
Example: SSA
In SSA-Repräsentation besitzt jede Variable genau eine Definition. Dadurch wird der Zusammenhang
ischen An end ng nd Definition in der zwischen Anwendung und Definition in der Zwischensprache explizit, d.h. eine zusätzliche def-use-Verkettung oder use-def-Verkettung wird unnötig.
SSA ist im Wesentlichen eine Verfeinerung von 3AC.
Die Unterscheidung zwischen den Definitionsstellen wird häufig durch Indizierung der Variablen dargestellt wird häufig durch Indizierung der Variablen dargestellt.
Für sequentielle Befehlsfolgen bedeutet das:
• An jeder Definitionsstelle bekommt die Variable einen anderen Index
einen anderen Index.
• An der Anwendungsstelle wird die Variable mit dem Index der letzten Definitionsstelle notiert.
a := x + y Beispiel:
a := x + y
1 0 0b := a – 1
a := y + b b := x * 4 a := a + b
b := a - 1 a := y + b b := x * 4 a := a + b
1 1
2 2
0
0 1
a := a + b a := a + b
3 2 2SSA - Join Points of Control Flow
At join points of control flow, an additional mechanism is required:
An Stellen, an denen der Kontrollfluß zusammen- führt bedarf es eines zusätzlichen Mechanismus:
führt, bedarf es eines zusätzlichen Mechanismus:
3 2 2
a := x + y
1 0 0a := a – b
b := a
3?
...
Einführung der fiktiven Orakelfunktion“ ! die Einführung der fiktiven „Orakelfunktion ! , die
quasi den Wert der Variable im zutreffenden Zweig auswählt:
3 2 2
a := x + y
1 0 0a := a – b
a := ! (a ,a ) b := a
34 4 1 3...
c
Prof. Dr. Arnd Poetzsch-Heffter Selected Topics in Compiler Construction 28
Intermediate languages Other Intermediate Languages
SSA - Join Points of Control Flow (2)
Introduce an "oracle"Φthat selects the value of the variable of the use branch:
An Stellen, an denen der Kontrollfluß zusammen- führt bedarf es eines zusätzlichen Mechanismus:
führt, bedarf es eines zusätzlichen Mechanismus:
3 2 2
a := x + y
1 0 0a := a – b
b := a
3 ?...
Einführung der fiktiven Orakelfunktion“ ! die Einführung der fiktiven „Orakelfunktion ! , die quasi den Wert der Variable im zutreffenden Zweig auswählt:
3 2 2
a := x + y
1 0 0a := a – b
a := ! (a ,a )
b := a
34 4 1 3...
SSA - Remarks
• The construction of an SSA representation with a minimal number of applications of theΦoracle is a non-trivial task.
(cf. Appel, Sect. 19.1. and 19.2)
• The termsingle static assignmentform reflects that for each variable in the program text, there is only one assignment.
Dynamically, a variable in SSA representation can be assigned arbitrarily often (e.g., in loops).
Further intermediate languages
While 3AC and SSA representation are mostly used as intermediate languages in compilers, intermediate languages and abstract
machines are more and more often used as connections between compilers and runtime environments.
Java Byte Code and CIL (Common Intermediate Language, cf. .NET) are examples for stack machine code, i.e., intermediate results are stored on a runtime stack.
Further intermediate languages are, for instance, used for optimizations.
Stack machine code as intermediate language
Homogeneous scenario for Java:Sprachlich homogenes Szenario bei Java:
C1.java
C2.java jikes C1.class
C2 class
Java ByteCode
C2.java
C3.java javac2
C2.class C3.class
JVM
Sprachlich ggf. inhomogenes Szenario bei .NET:
Programme
Intermediate
C# - C il
prog1.cs prog1.il
verschiedener Hochsprachen
Intermediate Language
Compiler
prog2.cs prog2.il
prog3.il
CLR Haskell -
Compiler prog3.hs
Java-ByteCode und die MS-Intermediate Language sind Beispiele für Kellermaschinencode, d.h.
Z i h b i d f i L f itk ll
Zwischenergebnisse werden auf einem Laufzeitkeller verwaltet.
c
Prof. Dr. Arnd Poetzsch-Heffter Selected Topics in Compiler Construction 32
Intermediate languages Other Intermediate Languages
Stack machine code as intermediate language (2)
Inhomogeneous scenario for .NET:
Sprachlich homogenes Szenario bei Java:
C1.java
C2.java jikes C1.class
C2 class
Java ByteCode
C2.java
C3.java javac2
C2.class C3.class
JVM
Sprachlich ggf. inhomogenes Szenario bei .NET:
Programme
Intermediate
C# - C il
prog1.cs prog1.il
verschiedener Hochsprachen
Intermediate Language
Compiler
prog2.cs prog2.il
prog3.il
CLR Haskell -
Compiler prog3.hs
Java-ByteCode und die MS-Intermediate Language sind Beispiele für Kellermaschinencode, d.h.
Z i h b i d f i L f itk ll
Zwischenergebnisse werden auf einem Laufzeitkeller
c
Prof. Dr. Arnd Poetzsch-Heffter Selected Topics in Compiler Construction 33
Example: Stack machine code
Beispiel: (Kellermaschinencode)
package beisp;
class Weltklasse extends Superklasse implements BesteBohnen { Qualifikation studieren ( Arbeit schweiss){
return new Qualifikation();
} } }
Compiled from Weltklasse.java
class beisp Weltklasse extends beisp Superklasse class beisp.Weltklasse extends beisp.Superklasse
implements beisp.BesteBohnen{
beisp.Weltklasse();
beisp.Qualifikation studieren( beisp.Arbeit);
}
Method beisp.Weltklasse() 0 aload_0
1 invokespecial #6 <Method beisp.Superklasse()>
4 return
Method beisp.Qualifikation studieren( beisp.Arbeit ) 0 new #2 <Class beisp.Qualifikation>
3 dup
4 invokespecial #5 <Method beisp.Qualifikation()>
7 areturn 7 areturn
Bemerkung:
Weitere Zwischensprachen werden insbesondere auch Weitere Zwischensprachen werden insbesondere auch im Zusammenhang mit Optimierungen eingesetzt.
c
Prof. Dr. Arnd Poetzsch-Heffter Selected Topics in Compiler Construction 34
Intermediate languages Other Intermediate Languages
Example: Stack machine code (2)
Beispiel: (Kellermaschinencode)
package beisp;
class Weltklasse extends Superklasse implements BesteBohnen { Qualifikation studieren ( Arbeit schweiss){
return new Qualifikation();
} } }
Compiled from Weltklasse.java
class beisp Weltklasse extends beisp Superklasse class beisp.Weltklasse extends beisp.Superklasse
implements beisp.BesteBohnen{
beisp.Weltklasse();
beisp.Qualifikation studieren( beisp.Arbeit);
}
Method beisp.Weltklasse() 0 aload_0
1 invokespecial #6 <Method beisp.Superklasse()>
4 return
Method beisp.Qualifikation studieren( beisp.Arbeit ) 0 new #2 <Class beisp.Qualifikation>
3 dup
4 invokespecial #5 <Method beisp.Qualifikation()>
7 areturn 7 areturn
Bemerkung:
4.2 Optimization
Optimization
Optimization refers to improving the code with the following goals:
• Runtime behavior
• Memory consumption
• Size of code
• Energy consumption
Optimization (2)
We distinguish the following kinds of optimizations:
• machine-independent optimizations
• machine-dependent optimizations (exploit properties of a particular real machine)
and
• local optimizations
• intra-procedural optimizations
• inter-procedural/global optimizations
Remark on Optimization
Appel (Chap. 17, p 350):
"In fact, there can never be a complete list [of optimizations]. "
"Computability theory shows that it will always be possible to invent new optimizing transformations."
4.2.1 Classical Optimization Techniques
Constant Propagation
If the value of a variable is constant, the variable can be replaced with the constant.
Constant Folding
Evaluate all expressions with constants as operands at compile time.
Iteration of Constant Folding and Propagation:
Non-local Constant Optimization
Copy Propagation
Eliminate all copies of variables, i.e., if there exist several variables x,y,z at a program position, that are known to have the same value, all uses of y and z are replaced by x.
Copy Propagation (2)
This can also be done at join points of control flow or for loops:
For each program point, the information which variables have the same
Common Subexpression Elimination
If an expression or a statement contains the same partial expression several times, the goal is to evaluate this subexpression only once.
Common Subexpression Elimination (2)
Optimization of a basic block is done after transformation to SSA and construction of a DAG:
Common Subexpression Elimination (3)
Remarks:
• The elimination of repeated computations is often done before transformation to 3AC, but can also be reasonable following other transformations.
• The DAG representation of expressions is also used as intermediate language by some authors.
Algebraic Optimizations
Algebraic laws can be applied in order to be able to use other optimizations. For example, use associativity and commutativity of addition:
Caution: For finite data type, common algebraic laws are not valid in general.
Strength Reduction
Replace expensive operations by more efficient operations (partially machine-dependent).
For example: y: = 2* xcan be replaced by y : = x + x
or by y: = x « 1
Inline Expansion of Procedure Calls
Replace call to non-recursive procedure by its body with appropriate substitution of parameters.
Note: This reduces execution time, but increases code size.
Inline Expansion of Procedure Calls (2)
Remarks:
• Expansion is in general more than text replacement:
Inline Expansion of Procedure Calls (3)
• In OO programs with relatively short methods, expansion is an important optimization technique. But, precise information about the target object is required.
• A refinement of inline expansion is the specialization of procedures/functions if some of the current parameters are known. This technique can also be applied to recursive procedures/functions.
Dead Code Elimination
Remove code that is not reached during execution or that has no influence on execution.
In one of the above examples, constant folding and propagation produced the following code:
Dead Code Elimination (2)
A typical example for non-reachable and thus, dead code that can be eliminated:
Dead Code Elimination (3)
Remarks:
• Dead code is often caused by optimizations.
• Another source of dead code are program modifications.
• In the first case, liveness information is the prerequiste for dead code elimination.
Code motion
Move commands over branching points in the control flow graph such that they end up in basic blocks that are less often executed.
We consider two cases:
• Move commands in succeeding or preceeding branches
• Move code out of loops
Optimization of loops is very profitable, because code inside loops is executed more often than code not contained in a loop.
Move code over branching points
If a sequential computation branches, the branches are less often executed than the sequence.
Move code over branching points (2)
Prerequisite for this optimization is that a defined variable is only used in one branch.
Moving the command over a preceeding joint point can be advisable, if the command can be eliminated by optimization from one of the branches.
Partial redundancy elimination
Definition (Partial Redundancy)
An assignment isredundantat a program positions, if it has already been executed on all paths tos.
An expressioneisredundantats, if the value ofehas already been calculated on all paths tos.
An assignment/expression ispartially redundantats, if it is redundant with respect to some execution paths leading tos.
Partial redundancy elimination (2)
Example:
Partial redundancy elimination (3)
Elimination of partial redundancy:
Partial redundancy elimination (4)
Remarks:
• PRE can be seen as a combination and extension of common subexpression elimination and code motion.
• Extension: Elimination of partial redundancy according to estimated probability for execution of specific paths.
Code motion from loops
Idea: Computations in loops whose operations are not changed inside the loop should be done outside the loop.
Provided, t1 is not live at the end of the top-most block on the left side.
Optimization of loop variables
Variables and expressions that are not changed during the execution of a loop are calledloop invariant.
Loops often have variables that are increased/decreased systematically in each loop execution, e.g., for-loops.
Often, a loop variable depends on another loop variable, e.g., a relative address depends on the loop counter variable.
Optimization of loop variables (2)
Definition (Loop Variables)
A variablei is calledexplicit loop variableof a loopS, if there is exactly one definition ofi inSof the formi :=i+cwherec is loop invariant.
A variablek is calledderived loop variableof a loopS, if there is exactly one definition ofk inSof the formk :=j∗cork :=j+d wherej is a loop variable andcandd are loop invariant.
Induction variable analysis
Compute derived loop variables inductively, i.e., instead of computing them from the value of the loop variable, compute them from the valued of the previous loop execution.
Loop unrolling
If the number of loop executions is known statically or properties about the number of loop executions (e.g., always an even number) can be inferred, the loop body can be copied several times to save comparisons and jumps.
Loop unrolling (2)
Remarks:
• Partial loop unrolling aims at obtaining larger basic blocks in loops to have more optimization options.
• Loop unrolling is in particular important for parallel processor architectures and pipelined processing (machine-dependent).
Optimization for other language classes
The discussed optimizations aim at imperative languages. For optimizing programs of other language classes, special techniques have been developed.
For example:
• Object-oriented languages: Optimization of dynamic binding (type analysis)
• Non-strict functional languages: Optimization of lazy function calls (strictness analysis)
• Logic programming languages: Optimization of unification
4.2.2 Potential of Optimizations
Potential of optimizations - Example
Consider procedureskprodfor the evaluation of the optimization techniques:
4.2.2 Optimierungspotential
Am Beispiel der Prozedur skprod demonstrieren
i i i d bi T h ik d d
wir einige der obigen Techniken und das Verbesserungspotential, das durch Optimierungen erzielt werden kann; dabei skizzieren wir auch dessen Bewertung.
k d
skprod:
res:= 0 ix := 0
t0 := lng-1 if ix<=t0
true false
return res t1 := i1+ix
tx := t1*4 ta := a+tx t2 := *ta t1 := i2+ix t1 : i2+ix tx := t1*4 tb := b+tx t3 := *tb t1 := t2*t3 res:= res+t1 ix := ix+1
Evaluation:
Number of steps depending onlng:
2+2+13∗lng+1=13∗lng+5 lng=100: 1305
lng=1000: 13005
Potential of optimizations - Example (2)
Move computation of loop invariant out of loop:Herausziehen der Berechnung der Schleifeninvariante t0:
skprod:
res:= 0 res:= 0 ix := 0 t0 := lng-1
if i < t0
return res t1 := i1+ix
tx := t1*4 if ix<=t0
true false
ta := a+tx t2 := *ta t1 := i2+ix tx := t1*4 tb := b+tx tb : b+tx t3 := *tb t1 := t2*t3 res:= res+t1 ix := ix+1
Potential of optimizations - Example (3)
Optimization of loop variables: There are no derived loop variables, because t1 and tx have several definitions; transformation to SSA for t1 and tx yields that t11, tx1, ta, t12, tb become derived loop variables.
Optimierung von Schleifenvariablen (1):
Zunächst gibt es keine abgeleiteten Schleifenvariablen, da t1 und tx mehrere Definitionen besitzen; Einführen von SSA für t1 und tx macht t11, tx1, ta, t12, tx2, tb zu abgeleiteten Schleifenvariablen:
skprod:
res:= 0 res:= 0 ix := 0 t0 := lng-1
if i < t0
return res t11:= i1+ix
tx1:= t11*4 1 if ix<=t0
true false
ta := a+tx1 t2 := *ta t12:= i2+ix tx2:= t12*4 tb := b+tx2 tb : b t t3 := *tb t13:= t2*t3
Potential of optimizations - Example (4)
Optimization of loop variables(2): Inductive definition of loop variablesOptimierung von Schleifenvariablen (2):
Initialisierung und induktive Definition der S hl if i bl
Schleifenvariablen:
skprod:
res:= 0 res:= 0 ix := 0 t0 := lng-1 t11:= i1-1 tx1:= t11*4 ta := a+tx1 t12:= i2-1 tx2:= t12*4 tb := b+tx2
t11:= t11+1 if ix<=t0 true false
return res t11:= t11+1
tx1:= tx1+4 ta := ta+4 t2 := *ta t12:= t12+1 tx2:= tx2+4 tb := tb+4
Potential of optimizations - Example (5)
Dead Code Elimination: t11, tx1, t12, tx2 do not influence the result.Elimination toten Codes:
Die Zuweisungen an t11, tx1, t12, tx2sind toter Code da sie das Ergebnis nicht beeinflussen
skprod:
Code, da sie das Ergebnis nicht beeinflussen.
res:= 0 ix := 0 t0 := lng-1 t11:= i1-1 tx1:= t11*4 tx1: t11 4 ta := a+tx1 t12:= i2-1 tx2:= t12*4 tb := b+tx2
if ix<=t0
true false
return res ta := ta+4
t2 := *ta tb := tb+4 t3 := *tb t13:= t2*t3 t13: t2 t3 res:= res+t13 ix := ix+1
Potential of optimizations - Example (6)
Algebraic Optimizations: Use invariantsta=4∗(i1−1+ix) +afor the comparisonta≤4∗(i1Algebraische Optimierung:−1+t0) +a
Ausnutzen der Invarianten: ta = 4*(i1-1+ix)+ a für den Vergleich: ta < 4*(i1 1+t0)+ a für den Vergleich: ta <= 4*(i1-1+t0)+ a
skprod:
res:= 0 ix := 0 t0 := lng-1 t11:= i1-1 tx1:= t11*4 tx1: t11 4 ta := a+tx1 t12:= i2-1 tx2:= t12*4 tb := b+tx2 t4 := t11+t0 t5 := 4*t4 t6 := t5+a
ta := ta+4 t2 := *ta
if ta<=t6
true false
return res t2 : ta
tb := tb+4
Potential of optimizations - Example (7)
Dead Code Elimination: Assignment to ix is dead code and can be eliminated.Elimination toten Codes:
Durch die Transformation der Schleifenbedingung ist
di Z i C d d d k
die Zuweisung an ixtoter Code geworden und kann eliminiert werden:
skprod:
res:= 0 t0 := lng-1 t11:= i1-1 tx1:= t11*4 ta := a+tx1 ta := a+tx1 t12:= i2-1 tx2:= t12*4 tb := b+tx2 t4 := t11+t0 t5 := 4*t4 t6 := t5+a if ta<=t6
return res ta := ta+4
t2 := *ta tb := tb+4
if ta< t6
true false
tb : tb+4 t3 := *tb t13:= t2*t3 res:= res+t13
Potential of optimizations - Example (8)
Remarks:
• Reduction of execution steps by almost half, where the most significant reductions are achieved by loop optimization.
• Combination of optimization techniques is important. Determining the ordering of optimizations is in general difficult.
• We have only considered optimizations at examples. The difficulty is to find algorithms and heuristics for detecting optimization potential automatically and for executing the optimizing transformations.
4.2.3 Data flow analysis
Data flow analysis
For optimizations, data flow information is required that can be obtained by data flow analysis.
Goal: Explanation of basic concepts of data flow analysis at examples Outline:
• Liveness analysis (Typical example of data flow analysis)
• Data flow equations
• Important analyses classes
Each analysis has an exact specification which information it provides.
Liveness analysis
Definition (Liveness Analysis)
LetPbe a program. A variablev isliveat a program positionSinP if there is an execution pathπ fromSto a use ofv such that there is no definition ofv onπ.
Theliveness analysisdetermines for all positionsSinP which variables are live atS.
Liveness analysis (2)
Remarks:
• The definition of liveness of variables is static/syntactic. We have defined dead code dynamically/semantically.
• The result of the liveness analysis for a programmP can be represented as a functionlivemapping positions inP to bit vectors, where a bit vector contains an entry for each variable in P. Letibe the index of a variable inP, then it holds that:
live(S)[i] =1 iff v is live at positionS
Liveness analysis (3)
Idea:
• In a procedure-local analysis, exactly the global variables are live at the end of the exit block of the procedure.
• If the live variablesout(B)at the end of a basic blockBare known, the live variablesin(B)at the beginning ofBare computed by:
in(B) =gen(B)∪(out(B)\kill(B)) where
I gen(B)is the set of variablesv such thatv is applied inBwithout a prior definition ofv
I kill(B)is the set of variables that are defined inB
Liveness analysis (4)
As the setin(B)is computed from out(B), we have abackward analysis.
ForBnot the exit block of the procedure,out(B)is obtained by out(B) =[
in(Bi)for all successorsBiofB
Thus, for a program without loops,in(B)andout(B)are defined for all basic blocksB. Otherwise, we obtain a system of recursive equations.
Liveness analysis - Example
Data flow equations
Theory:
• There is always a solution for equations of theconsidered form.
• There is always a smallest solution that is obtained by an iteration starting from emptyinandoutsets.
Note: The equations may have several solutions.
Ambiguity of solutions - Example
a := a
B0:
b := 7
B1:
out(B0) =in(B0)∪in(B1) out(B1) ={ }
in(B0) =gen(B0)∪(out(B0)\kill(B0))
={a} ∪out(B0)
in(B1) =gen(B1)∪(out(B1)\kill(B1))
={ }
Thus,out(B0) =in(B0), and hencein(B0) ={a} ∪in(B0).
Possible Solutions: in(B0) ={a}orin(B0) ={a,b}
Computation of smallest fixpoint
1. Computegen(B),kill(B)for allB.
2. Setout(B) =∅for allBexcept for the exit block. For the exit block, out(B)comes from the program context.
3. Whileout(B)orin(B)changes for anyB:
Computein(B)from currentout(B)for allB.
Computeout(B)fromin(B)of its successors.
Further analyses and classes of analyses
Many data flow analyses can be described as bit vector problems:
• Reaching definitions: Which definitions reach a positionS?
• Available expressions for elimination of repeated computations
• Very busy expressions: Which expression is needed for all subsequent computations?
The according analyses can be treated analogue to liveness analysis, but differ in
• the definition of the data flow information
• the definition ofgenandkill
• the direction of the analysis and the equations
Further analyses and classes of analyses (2)
For backward analyses, the data flow information at the entry of a basic blockBis obtained from the information at the exit ofB:
in(B) =gen(B)∪(out(B)\kill(B))
Analyses can be distinguished if they consider the conjunction or the intersection of the successor information:
out(B) = [
Bi∈succ(B)
in(Bi)
or
out(B) = \
in(Bi)
Further analyses and classes of analyses (3)
For forward analyses, the dependency is the other way round:
out(B) =gen(B)∪(in(B)\kill(B)) with
in(B) = [
Bi∈pred(B)
out(Bi)
or
in(B) = \
Bi∈pred(B)
out(Bi)
Further analyses and classes of analyses (4)
Overview of classes of analyses:
conjunction intersection forward reachable definitions available expressions backward live variables busy expressions
Further analyses and classes of analyses (5)
For bit vector problems, data flow information consists of subsets of finite sets.
For other analyses, the collected information is more complex, e.g., for constant propagation, we consider mappings from variables to values.
For interprocedural analyses, complexity increases because the flow graph is not static.
Formal basis for the development and correctness of optimizations is provided by the theory ofabstract interpretation.
4.2.4 Non-Local Program Analysis
Non-local program analysis
We use apoints-toanalysis to demonstrate:
• interprocedural aspects: The analysis crosses the borders of single procedures.
• constraints: Program analysis very often involves solving or refining constraints.
• complex analysis results: The analysis result cannot be represented locally for a statement.
• analysis as abstraction: The result of the analysis is an abstraction of all possible program executions.
Points-to analysis
Analysis for programs with pointers and for object-oriented programs Goal: Compute which references to which records/objects a variable can hold.
Applications of Analysis Results:
Basis for optimizations
• Alias information (e.g., important for code motion)
I Can p.f = x cause changes to an object referenced by q?
I Can z = p.f read information that is written by p.f = x?
• Call graph construction
• Resolution of virtual method calls
Optimization Non-Local Program Analysis
Alias Information
Beispiele: (Verwendung von Points-to- Analyseinformation)
Analyseinformation)
(1) p.f = x;
(2) f
A. Nutzen von Alias-Information:
(2) y = q.f;
(3) q.f = z;
p == q: (1)
(2) y = x;
(2) y x;
(3) q.f = z;
p != q: Erste Anweisung lässt sich mit den anderen beiden vertauschen
anderen beiden vertauschen.
B. Elimination dynamischer Bindung:
class A { class A {
void m( ... ) { ... } }
class B extends A { void m( ) { } void m( ... ) { ... } }
...
A p;
p = new B();
p.m(...) // Aufruf von B::m
First two statements can be switched.
c
Prof. Dr. Arnd Poetzsch-Heffter Selected Topics in Compiler Construction 98
Optimization Non-Local Program Analysis
Elimination of Dynamic Binding
Beispiele: (Verwendung von Points-to- Analyseinformation)
Analyseinformation)
(1) p.f = x;
(2) f
A. Nutzen von Alias-Information:
(2) y = q.f;
(3) q.f = z;
p == q: (1)
(2) y = x;
(2) y x;
(3) q.f = z;
p != q: Erste Anweisung lässt sich mit den anderen beiden vertauschen
anderen beiden vertauschen.
B. Elimination dynamischer Bindung:
class A { class A {
void m( ... ) { ... } }
class B extends A { void m( ) { } void m( ... ) { ... } }
...
A p;
p = new B();
p.m(...) // Aufruf von B::mCall of B::m
Escape Analysis
C. Escape-Analyse:
R m( A p ) {( p ) { B q;
q = new B(); // Kellerverwaltung möglich q.f = p;
q.g = p.n();
q g p ();
return q.g;
}
Eine Points-to-Analyse für Java:
Vereinfachungen:
• Gesamte Programm ist bekannt.
• Nur Zuweisungen und Methodenaufrufe der folgenden Form:
Di kt Z i
- Direkte Zuweisung: l = r - Schreiben auf Instanzvariablen: l.f = r - Lesen von Instanzvariablen: l = r.f
Objekterzeugung: l C()
- Objekterzeugung: l = new C() - Einfacher Methodenaufruf: l = r0.m(r1,..)
• Ausdrücke ohne Seiteneffekte
• Zusammengesetzte Anweisungen
• Zusammengesetzte Anweisungen
Can be stored on stack
c
Prof. Dr. Arnd Poetzsch-Heffter Selected Topics in Compiler Construction 100
A Points-to Analysis for Java
Simplifications and assumptions about underlying language
• Complete program is known.
• Only assignments and method calls of the following form are used:
I Direct assignment:l = r
I Write to instance variables:l.f = r
I Read of instance variables:l = r.f
I Object creation:l = new C()
I Simple method call:l = r0.m(r1, ...)
• Expressions without side effects
• Compound statements
A Points-to Analysis for Java (2)
Analysis type
• Flow-insensitive:The control flow of the program has no influence on the analysis result. The states of the variables at different program points are combined.
• Context-insensitive: Method calls at different program points are not distinguished.
A Points-to Analysis for Java (3)
Points-to graph as abstraction
Result of the analysis is a so-calledpoints-to graphhaving
• abstract variablesandabstract objectsas nodes
• edges represent that anabstract variablemay have areference to an abstract object
Abstract variables V represent sets of concrete variables at runtime.
Abstract objects O represent sets of concrete objects at runtime.
An edge between V and O means that in a certain program state, a
Points-to Graph - Example
Beispiel: (Points-to-Graph)
class Y { ... } class X {
Y f;
void set( Y r ) { this.f = r; } static void main() {
X p = new X(); // s1 „erzeugt“ o1 Y q = new Y(); // s2 „erzeugt“ o2q (); // „ g p.set(q);
} }
p
o1 this
o1
f q
r
o2 r
c
Prof. Dr. Arnd Poetzsch-Heffter Selected Topics in Compiler Construction 104
Points-to Graph - Example (2)
Definition of the Points-to Graph
For all method implementations,
• create node o for each object creation
• create nodes for
I each local variablev
I each formal parameterpof any method (incl.thisand results (ret))
I each static variables
(Instance variables are modeled by labeled edges.)
Definition of the Points-to Graph (2)
Edges: Smallest Fixpoint off :PtGraph×Stmt→PtGraphwith
• f(G,l=new C()) =G∪ {(l,oi)}
• f(G,l=r) =G∪ {(l,oi)|oi ∈Pt(G,r)}
• f(G,l.f =r) =G∪ {(<oi,f >,oj)|oi ∈Pt(G,l),oj ∈Pt(G,r)}
• f(G,l=r.f) =G∪ {(l,oi)| ∃oj ∈Pt(G,r).oi ∈Pt(G, <oj,f >)}
• f(G,l=r0.m(r1, . . . ,rn)) = G∪S
oi∈Pt(G,r0)resolve(G,m,oi,r1, . . . ,rn,l) wherePt(G,x)is the points-to set of x in G,
resolve(G,m,oi,r1, . . . ,rn,l) =
letmj(p0,p1, . . . ,pn,retj) =dispatch(oi,m)in
{(p ,o)} ∪f(G,p =r )∪. . .∪f(G,l =ret)end
Definition of the Points-to Graph (3)
Remark:
The main problem for practical use of the analysis is the efficient implementation of thecomputationof the points-to graph.
Literature:
A. Rountev, A. Milanova, B. Ryder: Points-to Analysis for Java Using Annotated Constraints. OOPSLA 2001.
4.3 Register Allocation
Register allocation
Efficient code has to make good use of the available registers on the target machine: Accessing registers is much faster then accessing memory (the same holds for cache).
Register allocationhas two aspects:
• Determine which variables are implemented by registers at which positions.
• Determine which register implements which variable at which positions (register assignment).
Register allocation (2)
Goals of register allocation
1. Generate code that requires as little registers as possible
2. Avoid unnecessary memory accesses, i.e., not only temporaries, but also program variables are implemented by registers.
3. Allocate registers such for variables that are used often (do not use them for variables that are only rarely accessed).
4. Obey programmer’s requirements.
Register allocation (3)
Outline
• Algorithm interleaving code generation and register allocation for nested expressions (cf. Goal 1)
• Algorithm for procedure-local register allocation (cf. Goals 2 and 3)
• Combination and other aspects
4.3.1 Sethi-Ullmann Algorithm
Register Allocation Sethi-Ullmann Algorithm
Evaluation ordering with minimal registers
The algorithm by Sethi and Ullmann is an example of an integrated approach for register allocation and code generation.
(cf. Wilhelm, Maurer, Sect. 12.4.1, p. 584 ff) Input:
An assignment with a nested expression on the right hand side
4.3.1 Auswertungsordnung mit minimalem Registerbedarf minimalem Registerbedarf
Der Algorithmus von Sethi-Ullman ist ein Beispiel für eine integriertes Verfahren zur Registerzuteilung und Codeerzeugung.
Eingabe:
Eine Zuweisung mit zusammengesetztem Ausdruckg g auf der rechten Seite:
Assign ( Var, Exp ) Exp = BinExp | Var BinExp ( Exp Op Exp ) BinExp ( Exp, Op, Exp ) Var ( Ident )
Ausgabe:
Zugehörige Maschinencode bzw Zwischensprachen Zugehörige Maschinencode bzw. Zwischensprachen- code mit zugewiesenen Registern. Wir betrachten hier Zwei-Adresscode, d.h. Code mit maximal einem
Speicherzugriff:
i [ ]
Ri := M[V]
M[V] := Ri
Ri := Ri op M[V]
Ri := Ri op Rj
(vgl. Wilhelm/Maurer 12.4.1, Seite 584 ff)
c
Prof. Dr. Arnd Poetzsch-Heffter Selected Topics in Compiler Construction 114
Register Allocation Sethi-Ullmann Algorithm
Evaluation ordering with minimal registers (2)
Output:
Machine or intermediate language code with assigned registers.
We consider two-address code, i.e., code with one memory access at maximum. The machine hasr registers represented byR0, . . . ,Rr−1.
4.3.1 Auswertungsordnung mit minimalem Registerbedarf minimalem Registerbedarf
Der Algorithmus von Sethi-Ullman ist ein Beispiel für eine integriertes Verfahren zur Registerzuteilung und Codeerzeugung.
Eingabe:
Eine Zuweisung mit zusammengesetztem Ausdruck g g auf der rechten Seite:
Assign ( Var, Exp ) Exp = BinExp | Var BinExp ( Exp Op Exp ) BinExp ( Exp, Op, Exp ) Var ( Ident )
Ausgabe:
Zugehörige Maschinencode bzw Zwischensprachen Zugehörige Maschinencode bzw. Zwischensprachen- code mit zugewiesenen Registern. Wir betrachten hier Zwei-Adresscode, d.h. Code mit maximal einem
Speicherzugriff:
i [ ]
Ri := M[V]
M[V] := Ri
Ri := Ri op M[V]
Ri := Ri op Rj
(vgl. Wilhelm/Maurer 12.4.1, Seite 584 ff)
Example: Code generation w/ register allocation
Considerf := (a+b)−(c−(d+e))
Assume that there are two registersR0 andR1 available for the translation.
Result of direct translation:
Beispiel: (Codeerzeugung mit Registerzuteil.)
Betrachte: f:= (a+b)-(c-(d+e)) Betrachte: f:= (a+b) (c (d+e))
Annahme: Zur Übersetzung stehen nur zwei Register zur Verfügung.
Ergebnis der direkten Übersetzung:
R0 := M[a]
R0 := R0 + M[b]
R1 := M[d]
R1 := M[d]
R1 := R1 + M[e]
M[t1] := R1 R1 := M[c]
R1 := R1 – M[t1]
R0 := R0 – R1 M[f] := R0 Ergebnis von Sethi-Ullman:
R0 := M[c]
R1 := M[d]
R1 := R1 + M[e]
R0 := R0 – R1 R1 : M[a]
R1 := M[a]
R1 := R1 + M[b]
R1 := R1 – R0 M[f] := R1
Besser, weil ein Befehl weniger und keine Zwischen-
c
Prof. Dr. Arnd Poetzsch-Heffter Selected Topics in Compiler Construction 116
Register Allocation Sethi-Ullmann Algorithm
Example: Code generation w/ register allocation (2)
Result of Sethi-Ullmann algorithm:
Beispiel: (Codeerzeugung mit Registerzuteil.)
Betrachte: f:= (a+b)-(c-(d+e)) Betrachte: f:= (a+b) (c (d+e))
Annahme: Zur Übersetzung stehen nur zwei Register zur Verfügung.
Ergebnis der direkten Übersetzung:
R0 := M[a]
R0 := R0 + M[b]
R1 := M[d]
R1 := M[d]
R1 := R1 + M[e]
M[t1] := R1 R1 := M[c]
R1 := R1 – M[t1]
R0 := R0 – R1 M[f] := R0 Ergebnis von Sethi-Ullman:
R0 := M[c]
R1 := M[d]
R1 := R1 + M[e]
R0 := R0 – R1 R1 : M[a]
R1 := M[a]
R1 := R1 + M[b]
R1 := R1 – R0 M[f] := R1
Besser, weil ein Befehl weniger und keine Zwischen- Speicherung nötig.
More efficient, because it uses one instruction less and does not need
Sethi-Ullmann algorithm
Goal: Minimize number of registers and number of temporaries.
Idea: Generate code for subexpression requiring more registers first.
Procedure:
• Define functionregbed that computes the number of registers needed for an expression
• Generate code for an expressionE = BinExp(L,OP,R);
Sethi-Ullmann algorithm (2)
We use the following notations:
• v_reg(E): the set of available registers for the translation of E
• v_tmp(E): the set of addresses where values can be stored temporarily when translating E
• cell(E): register/memory cell where the result of E is stored Now, let
• E be an expression
• L the left subexpression of E
• R the right subexpression of E
• vr abbreviate |v_reg(E)|
Sethi-Ullmann algorithm (3)
We distinguish the following cases:
1. regbed(L)<vr
2. regbed(L)≥vr and regbed(R)<vr 3. regbed(L)≥vr and redbed(R)≥vr
Sethi-Ullmann algorithm (4)
Case 1: regbed(L)<vr
• Generate code for R using v_reg(E) and v_tmp(E) with result in cell(R)
• Generate code for L using v_reg(E)\{cell(R)}and v_tmp(E) with result in cell(L)
• Generate code for the operation cell(L) := cell(L) OP cell(R)
• Set cell(E) = cell(L)
Sethi-Ullmann algorithm (5)
Case 2: regbed(L)≥vr and regbed(R)<vr
• Generate code for L using v_reg(E) and v_tmp(E) with result in cell(L)
• Generate code for R using v_reg(E)\{cell(L)}and v_tmp(E) with result in cell(R)
• Generate code for the operation cell(L) := cell(L) OP cell(R)
• Set cell(E) = cell(L)