4. Selected Topics in Compiler

(1)

Compilers and Language Processing Tools

Summer Term 2011

Prof. Dr. Arnd Poetzsch-Heffter

Software Technology Group TU Kaiserslautern

(2)

Content of Lecture

1. Introduction

2. Syntax and Type Analysis 2.1 Lexical Analysis

2.2 Context-Free Syntax Analysis 2.3 Context-Dependent Analysis 3. Translation to Target Language

3.1 Translation of Imperative Language Constructs 3.2 Translation of Object-Oriented Language Constructs 4. Selected Topics in Compiler Construction

4.1 Intermediate Languages 4.2 Optimization

4.3 Register Allocation 4.4 Just-in-time Compilation 4.5 Further Aspects of Compilation

(3)

4. Selected Topics in Compiler

Construction

(4)

Chapter Outline

4. Selected Topics in Compiler Construction 4.1 Intermediate Languages

4.1.1 3-Address Code

4.1.2 Other Intermediate Languages 4.2 Optimization

4.2.1 Classical Optimization Techniques 4.2.2 Potential of Optimizations

4.2.3 Data Flow Analysis 4.2.4 Non-local Optimization 4.3 Register Allocation

4.3.1 Sethi-Ullman Algorithm

4.3.2 Register Allocation by Graph Coloring 4.4 Just-in-time Compilation

4.5 Further Aspects of Compilation

(5)

Selected topics in compiler construction

Focus:

• Techniques that go beyond the direct translation of source languages to target languages

• Concentrate on concepts instead of language-dependent details

• Use program representations tailored for the considered tasks (instead of source language syntax):

I simplifies representation

I (but needs more work to integrate tasks)

(6)

Selected topics in compiler construction (2)

Learning objectives:

• Intermediate languages for translation and optimization of imperative languages

• Different optimization techniques

• Different static analysis techniques for (intermediate) programs

• Register allocation

• Some aspects of code generation

(7)

4.1 Intermediate languages

(8)

Intermediate languages

• Intermediate languages are used as

I appropriate program representation for certain language implementation tasks

I common representation of programs of different source languages Source

Language 1

Source Language 2

Source Language n

Intermediate Language

Target Language 1

Target Language 2

Target Language m ...

...

(9)

Intermediate languages (2)

• Intermediate languages for translation are comparable to data structures in algorithm design, i.e., for each task, an intermediate language is more or less suitable.

• Intermediate languages can conceptually be seen as abstract machines.

(10)

4.1.1 3-Address Code

(11)

3-address code

3-address code (3AC) is a common intermediate language with many variants.

Properties:

• only elementary data types (but often arrays)

• no nested expressions

• sequential execution, jumps and procedure calls as statements

• named variables as in a high level language

• unbounded number of temporary variables

(12)

3-address code (2)

A program in 3AC consists of

• a list of global variables

• a list of procedures with parameters and local variables

• a main procedure

• each procedure has a sequence of 3AC commands as body

(13)

3AC commands

Syntax Explanation

x := y bop z x : = uop z x:= y

x: variable (global, local, parameter, temporary) y,z: variable or constant

bop: binary operator uop: unary operator goto L

if x cop y goto L

jump or conditional jump to label L cop: comparison operator

only procedure-local jumps x:= a[i]

a[i]:= y a one-dimensional array x : = & a

x:= *y

a global, local variable or parameter

& a address of a

(14)

3AC commands (2)

Syntax Explanation

param x call p return y

call p(x1, ..., xn) is encoded as:

(block is considered as one command)

param x1 ...

param xn call p

return y causes jump to return address with (optional) result y

We assume that 3AC only contains labels for which jumps are used in the program.

(15)

Basic blocks

A sequence of 3AC commands can be uniquely partitioned into basic blocks.

Abasic block Bis a maximal sequence of commands such that

• at the end of B, exactly one jump, procedure call, or return command occurs

• labels only occur at the first command of a basic block

(16)

Basic blocks (2)

Remarks:

• The commands of a basic block are always executed sequentially, there are no jumps to the inside

• Often, a designated exit-block for a procedure containing the return jump at its end is required. This is handled by additional transformations.

• The transitions between basic blocks are often denoted by flow charts.

(17)

Example: 3AC and basic blocks

Consider the following C program:

Beispiel: (3AC und Basisblöcke)

Wir betrachten den 3AC für ein C-Programm:

int a[2];

int b[7];

int skprod(int i1, int i2, int lng) {... } int main( ) {

a[0] = 1; a[1] = 2;

b[0] = 4; b[1] = 5; b[2] = 6;

skprod(0 1 2);

skprod(0,1,2);

return 0;

}

3AC mit Basisblockzerlegung für die Prozedur main:

main:

a[0] := 1 a[0] := 1 a[1] := 2 b[0] := 4 b[1] := 5 b[2] := 6 param 0 param 1 param 2 call skprod

c

Prof. Dr. Arnd Poetzsch-Heffter Selected Topics in Compiler Construction 17

(18)

Intermediate languages 3-Address Code

Example: 3AC and basic blocks (2)

3AC with basic block partitioning for main procedure Beispiel: (3AC und Basisblöcke) Wir betrachten den 3AC für ein C-Programm:

int a[2];

int b[7];

int skprod(int i1, int i2, int lng) {... } int main( ) {

a[0] = 1; a[1] = 2;

b[0] = 4; b[1] = 5; b[2] = 6;

skprod(0 1 2);

skprod(0,1,2);

return 0;

}

3AC mit Basisblockzerlegung für die Prozedur main:

main:

a[0] := 1 a[0] := 1 a[1] := 2 b[0] := 4 b[1] := 5 b[2] := 6 param 0 param 1 param 2 call skprod return 0

(19)

Example: 3AC and basic blocks (3)

Procedure skprod:Prozedur skprod mit 3AC und Basisblockzerlegung:

int skprod(int i1, int i2, int lng) { int ix, res = 0;

for( ix=0; ix <= lng-1; ix++ ){

res += a[i1+ix] * b[i2+ix];

}

skprod:

}

return res;

}

res:= 0 ix := 0

t0 := lng-1 if ix<=t0

true false

t1 := i1+ix t2 := a[t1]

t1 := i2+ix t3 := b[t1]

t1 := t2*t3

return res t1 := t2*t3

res:= es+t1 ix := ix+1

c

(20)

Example: 3AC and basic blocks (4)

Procedure skprod as 3AC with basic blocks

int skprod(int i1, int i2, int lng) { int ix, res = 0;

for( ix=0; ix <= lng-1; ix++ ){

res += a[i1+ix] * b[i2+ix];

}

skprod:

}

return res;

}

res:= 0 ix := 0

true false

t1 := i1+ix t2 := a[t1]

t1 := i2+ix t3 := b[t1]

t1 := t2*t3

return res t1 := t2*t3

res:= es+t1 ix := ix+1

(21)

Intermediate Language Variations

3 AC after elimination of array operations (at above example)

Variation im Rahmen einer Zwischensprache:

3-Adress-Code nach Elimination von Feldoperationen anhand des obigen Beispiels:

skprod:p

res:= 0 ix := 0

t1 := i1+ix tx := t1*4 ta := a+tx

true false

return res t2 := *ta

t1 := i2+ix tx := t1*4 tb := b+tx t3 *tb t3 := *tb t1 := t2*t3 res:= res+t1

(22)

Characteristics of 3-Address Code

• Control flow is explicit.

• Only elementary operations

• Rearrangement and exchange of commands can be handled relatively easily.

(23)

4.1.2 Other Intermediate Languages

(24)

Further Intermediate Languages

We consider

• 3AC inStatic Single Assignment (SSA)representation

• Stack Machine Code

(25)

Single Static Assignment Form

If a variableais read at a program position, this is auseofa.

If a variableais written at a program position, this is adefinitionofa. For optimizations, the relationship between use and definition of variables is important.

In SSA representation, each variable has exactly one definition. Thus, relationship between use and definition in the intermediate language is explicit.

(26)

Single Static Assignment Form (2)

SSA is essentially a refinement of 3AC.

The different definitions of one variable are represented by indexing the variable.

For sequential command lists, this means that

• at each definition position, the variable gets a different index.

• at the use position, the variable has the index of its last definition.

(27)

Intermediate languages Other Intermediate Languages

Example: SSA

In SSA-Repräsentation besitzt jede Variable genau eine Definition. Dadurch wird der Zusammenhang

ischen An end ng nd Definition in der zwischen Anwendung und Definition in der Zwischensprache explizit, d.h. eine zusätzliche def-use-Verkettung oder use-def-Verkettung wird unnötig.

SSA ist im Wesentlichen eine Verfeinerung von 3AC.

Die Unterscheidung zwischen den Definitionsstellen wird häufig durch Indizierung der Variablen dargestellt wird häufig durch Indizierung der Variablen dargestellt.

Für sequentielle Befehlsfolgen bedeutet das:

• An jeder Definitionsstelle bekommt die Variable einen anderen Index

einen anderen Index.

• An der Anwendungsstelle wird die Variable mit dem Index der letzten Definitionsstelle notiert.

a := x + y Beispiel:

a := x + y

₁ ₀ ₀

b := a – 1

a := y + b b := x * 4 a := a + b

b := a - 1 a := y + b b := x * 4 a := a + b

1 1

2 2

0

0 1

a := a + b a := a + b

₃ ₂ ₂

(28)

SSA - Join Points of Control Flow

At join points of control flow, an additional mechanism is required:

An Stellen, an denen der Kontrollfluß zusammen- führt bedarf es eines zusätzlichen Mechanismus:

führt, bedarf es eines zusätzlichen Mechanismus:

3 2 2

a := x + y

₁ ₀ ₀

a := a – b

b := a

₃

?

...

Einführung der fiktiven Orakelfunktion“ ! die Einführung der fiktiven „Orakelfunktion ! , die

quasi den Wert der Variable im zutreffenden Zweig auswählt:

3 2 2

a := x + y

₁ ₀ ₀

a := a – b

a := ! _{(a ,a )} b := a

₃⁴ ₄ ¹ ³

...

c

(29)

SSA - Join Points of Control Flow (2)

Introduce an "oracle"Φthat selects the value of the variable of the use branch:

An Stellen, an denen der Kontrollfluß zusammen- führt bedarf es eines zusätzlichen Mechanismus:

führt, bedarf es eines zusätzlichen Mechanismus:

3 2 2

a := x + y

₁ ₀ ₀

a := a – b

b := a

₃ ?

...

Einführung der fiktiven Orakelfunktion“ ! die Einführung der fiktiven „Orakelfunktion ! , die quasi den Wert der Variable im zutreffenden Zweig auswählt:

3 2 2

a := x + y

₁ ₀ ₀

a := a – b

a := ! _{(a ,a )}

b := a

₃⁴ ₄ ¹ ³

...

(30)

SSA - Remarks

• The construction of an SSA representation with a minimal number of applications of theΦoracle is a non-trivial task.

(cf. Appel, Sect. 19.1. and 19.2)

• The termsingle static assignmentform reflects that for each variable in the program text, there is only one assignment.

Dynamically, a variable in SSA representation can be assigned arbitrarily often (e.g., in loops).

(31)

Further intermediate languages

While 3AC and SSA representation are mostly used as intermediate languages in compilers, intermediate languages and abstract

machines are more and more often used as connections between compilers and runtime environments.

Java Byte Code and CIL (Common Intermediate Language, cf. .NET) are examples for stack machine code, i.e., intermediate results are stored on a runtime stack.

Further intermediate languages are, for instance, used for optimizations.

(32)

Stack machine code as intermediate language

Homogeneous scenario for Java:Sprachlich homogenes Szenario bei Java:

C1.java

C2.java jikes C1.class

C2 class

Java ByteCode

C2.java

C3.java javac2

C2.class C3.class

JVM

Sprachlich ggf. inhomogenes Szenario bei .NET:

Programme

Intermediate

C# - C il

prog1.cs prog1.il

verschiedener Hochsprachen

Compiler

prog2.cs prog2.il

prog3.il

CLR Haskell -

Compiler prog3.hs

Java-ByteCode und die MS-Intermediate Language sind Beispiele für Kellermaschinencode, d.h.

Z i h b i d f i L f itk ll

Zwischenergebnisse werden auf einem Laufzeitkeller verwaltet.

c

(33)

Stack machine code as intermediate language (2)

Inhomogeneous scenario for .NET:

Sprachlich homogenes Szenario bei Java:

C1.java

C2.java jikes C1.class

C2 class

Java ByteCode

C2.java

C3.java javac2

C2.class C3.class

JVM

Sprachlich ggf. inhomogenes Szenario bei .NET:

Programme

Intermediate

C# - C il

prog1.cs prog1.il

verschiedener Hochsprachen

Compiler

prog2.cs prog2.il

prog3.il

CLR Haskell -

Compiler prog3.hs

Java-ByteCode und die MS-Intermediate Language sind Beispiele für Kellermaschinencode, d.h.

Z i h b i d f i L f itk ll

Zwischenergebnisse werden auf einem Laufzeitkeller

c

(34)

Example: Stack machine code

Beispiel: (Kellermaschinencode)

package beisp;

class Weltklasse extends Superklasse implements BesteBohnen { Qualifikation studieren ( Arbeit schweiss){

return new Qualifikation();

} } }

Compiled from Weltklasse.java

class beisp Weltklasse extends beisp Superklasse class beisp.Weltklasse extends beisp.Superklasse

implements beisp.BesteBohnen{

beisp.Weltklasse();

beisp.Qualifikation studieren( beisp.Arbeit);

}

Method beisp.Weltklasse() 0 aload_0

1 invokespecial #6 <Method beisp.Superklasse()>

4 return

Method beisp.Qualifikation studieren( beisp.Arbeit ) 0 new #2 <Class beisp.Qualifikation>

3 dup

4 invokespecial #5 <Method beisp.Qualifikation()>

7 areturn 7 areturn

Bemerkung:

Weitere Zwischensprachen werden insbesondere auch Weitere Zwischensprachen werden insbesondere auch im Zusammenhang mit Optimierungen eingesetzt.

c

(35)

Example: Stack machine code (2)

Beispiel: (Kellermaschinencode)

package beisp;

class Weltklasse extends Superklasse implements BesteBohnen { Qualifikation studieren ( Arbeit schweiss){

return new Qualifikation();

} } }

Compiled from Weltklasse.java

class beisp Weltklasse extends beisp Superklasse class beisp.Weltklasse extends beisp.Superklasse

implements beisp.BesteBohnen{

beisp.Weltklasse();

beisp.Qualifikation studieren( beisp.Arbeit);

}

Method beisp.Weltklasse() 0 aload_0

1 invokespecial #6 <Method beisp.Superklasse()>

4 return

Method beisp.Qualifikation studieren( beisp.Arbeit ) 0 new #2 <Class beisp.Qualifikation>

3 dup

4 invokespecial #5 <Method beisp.Qualifikation()>

7 areturn 7 areturn

Bemerkung:

(36)

4.2 Optimization

(37)

Optimization

Optimization refers to improving the code with the following goals:

• Runtime behavior

• Memory consumption

• Size of code

• Energy consumption

(38)

Optimization (2)

We distinguish the following kinds of optimizations:

• machine-independent optimizations

• machine-dependent optimizations (exploit properties of a particular real machine)

and

• local optimizations

• intra-procedural optimizations

• inter-procedural/global optimizations

(39)

Remark on Optimization

Appel (Chap. 17, p 350):

"In fact, there can never be a complete list [of optimizations]. "

"Computability theory shows that it will always be possible to invent new optimizing transformations."

(40)

4.2.1 Classical Optimization Techniques

(41)

Constant Propagation

If the value of a variable is constant, the variable can be replaced with the constant.

(42)

Constant Folding

Evaluate all expressions with constants as operands at compile time.

Iteration of Constant Folding and Propagation:

(43)

Non-local Constant Optimization

(44)

Copy Propagation

Eliminate all copies of variables, i.e., if there exist several variables x,y,z at a program position, that are known to have the same value, all uses of y and z are replaced by x.

(45)

Copy Propagation (2)

This can also be done at join points of control flow or for loops:

For each program point, the information which variables have the same

(46)

Common Subexpression Elimination

If an expression or a statement contains the same partial expression several times, the goal is to evaluate this subexpression only once.

(47)

Common Subexpression Elimination (2)

Optimization of a basic block is done after transformation to SSA and construction of a DAG:

(48)

Common Subexpression Elimination (3)

Remarks:

• The elimination of repeated computations is often done before transformation to 3AC, but can also be reasonable following other transformations.

• The DAG representation of expressions is also used as intermediate language by some authors.

(49)

Algebraic Optimizations

Algebraic laws can be applied in order to be able to use other optimizations. For example, use associativity and commutativity of addition:

Caution: For finite data type, common algebraic laws are not valid in general.

(50)

Strength Reduction

Replace expensive operations by more efficient operations (partially machine-dependent).

For example: y: = 2* xcan be replaced by y : = x + x

or by y: = x « 1

(51)

Inline Expansion of Procedure Calls

Replace call to non-recursive procedure by its body with appropriate substitution of parameters.

Note: This reduces execution time, but increases code size.

(52)

Inline Expansion of Procedure Calls (2)

Remarks:

• Expansion is in general more than text replacement:

(53)

Inline Expansion of Procedure Calls (3)

• In OO programs with relatively short methods, expansion is an important optimization technique. But, precise information about the target object is required.

• A refinement of inline expansion is the specialization of procedures/functions if some of the current parameters are known. This technique can also be applied to recursive procedures/functions.

(54)

Dead Code Elimination

Remove code that is not reached during execution or that has no influence on execution.

In one of the above examples, constant folding and propagation produced the following code:

(55)

Dead Code Elimination (2)

A typical example for non-reachable and thus, dead code that can be eliminated:

(56)

Dead Code Elimination (3)

Remarks:

• Dead code is often caused by optimizations.

• Another source of dead code are program modifications.

• In the first case, liveness information is the prerequiste for dead code elimination.

(57)

Code motion

Move commands over branching points in the control flow graph such that they end up in basic blocks that are less often executed.

We consider two cases:

• Move commands in succeeding or preceeding branches

• Move code out of loops

Optimization of loops is very profitable, because code inside loops is executed more often than code not contained in a loop.

(58)

Move code over branching points

If a sequential computation branches, the branches are less often executed than the sequence.

(59)

Move code over branching points (2)

Prerequisite for this optimization is that a defined variable is only used in one branch.

Moving the command over a preceeding joint point can be advisable, if the command can be eliminated by optimization from one of the branches.

(60)

Partial redundancy elimination

Definition (Partial Redundancy)

An assignment isredundantat a program positions, if it has already been executed on all paths tos.

An expressioneisredundantats, if the value ofehas already been calculated on all paths tos.

An assignment/expression ispartially redundantats, if it is redundant with respect to some execution paths leading tos.

(61)

Partial redundancy elimination (2)

Example:

(62)

Partial redundancy elimination (3)

Elimination of partial redundancy:

(63)

Partial redundancy elimination (4)

Remarks:

• PRE can be seen as a combination and extension of common subexpression elimination and code motion.

• Extension: Elimination of partial redundancy according to estimated probability for execution of specific paths.

(64)

Code motion from loops

Idea: Computations in loops whose operations are not changed inside the loop should be done outside the loop.

Provided, t1 is not live at the end of the top-most block on the left side.

(65)

Optimization of loop variables

Variables and expressions that are not changed during the execution of a loop are calledloop invariant.

Loops often have variables that are increased/decreased systematically in each loop execution, e.g., for-loops.

Often, a loop variable depends on another loop variable, e.g., a relative address depends on the loop counter variable.

(66)

Optimization of loop variables (2)

Definition (Loop Variables)

A variablei is calledexplicit loop variableof a loopS, if there is exactly one definition ofi inSof the formi :=i+cwherec is loop invariant.

A variablek is calledderived loop variableof a loopS, if there is exactly one definition ofk inSof the formk :=j∗cork :=j+d wherej is a loop variable andcandd are loop invariant.

(67)

Induction variable analysis

Compute derived loop variables inductively, i.e., instead of computing them from the value of the loop variable, compute them from the valued of the previous loop execution.

(68)

Loop unrolling

If the number of loop executions is known statically or properties about the number of loop executions (e.g., always an even number) can be inferred, the loop body can be copied several times to save comparisons and jumps.

(69)

Loop unrolling (2)

Remarks:

• Partial loop unrolling aims at obtaining larger basic blocks in loops to have more optimization options.

• Loop unrolling is in particular important for parallel processor architectures and pipelined processing (machine-dependent).

(70)

Optimization for other language classes

The discussed optimizations aim at imperative languages. For optimizing programs of other language classes, special techniques have been developed.

For example:

• Object-oriented languages: Optimization of dynamic binding (type analysis)

• Non-strict functional languages: Optimization of lazy function calls (strictness analysis)

• Logic programming languages: Optimization of unification

(71)

4.2.2 Potential of Optimizations

(72)

Potential of optimizations - Example

Consider procedureskprodfor the evaluation of the optimization techniques:

4.2.2 Optimierungspotential

Am Beispiel der Prozedur skprod demonstrieren

i i i d bi T h ik d d

wir einige der obigen Techniken und das Verbesserungspotential, das durch Optimierungen erzielt werden kann; dabei skizzieren wir auch dessen Bewertung.

k d

skprod:

res:= 0 ix := 0

true false

return res t1 := i1+ix

tx := t1*4 ta := a+tx t2 := *ta t1 := i2+ix t1 : i2+ix tx := t1*4 tb := b+tx t3 := *tb t1 := t2*t3 res:= res+t1 ix := ix+1

Evaluation:

Number of steps depending onlng:

2+2+13∗lng+1=13∗lng+5 lng=100: 1305

lng=1000: 13005

(73)

Potential of optimizations - Example (2)

Move computation of loop invariant out of loop:Herausziehen der Berechnung der Schleifeninvariante t0:

skprod:

res:= 0 res:= 0 ix := 0 t0 := lng-1

if i < t0

return res t1 := i1+ix

tx := t1*4 if ix<=t0

true false

ta := a+tx t2 := *ta t1 := i2+ix tx := t1*4 tb := b+tx tb : b+tx t3 := *tb t1 := t2*t3 res:= res+t1 ix := ix+1

(74)

Potential of optimizations - Example (3)

Optimization of loop variables: There are no derived loop variables, because t1 and tx have several definitions; transformation to SSA for t1 and tx yields that t11, tx1, ta, t12, tb become derived loop variables.

Optimierung von Schleifenvariablen (1):

Zunächst gibt es keine abgeleiteten Schleifenvariablen, da t1 und tx mehrere Definitionen besitzen; Einführen von SSA für t1 und tx macht t11, tx1, ta, t12, tx2, tb zu abgeleiteten Schleifenvariablen:

skprod:

res:= 0 res:= 0 ix := 0 t0 := lng-1

if i < t0

return res t11:= i1+ix

tx1:= t11*4 1 if ix<=t0

true false

ta := a+tx1 t2 := *ta t12:= i2+ix tx2:= t12*4 tb := b+tx2 tb : b t t3 := *tb t13:= t2*t3

(75)

Potential of optimizations - Example (4)

Optimization of loop variables(2): Inductive definition of loop variablesOptimierung von Schleifenvariablen (2):

Initialisierung und induktive Definition der S hl if i bl

Schleifenvariablen:

skprod:

res:= 0 res:= 0 ix := 0 t0 := lng-1 t11:= i1-1 tx1:= t11*4 ta := a+tx1 t12:= i2-1 tx2:= t12*4 tb := b+tx2

t11:= t11+1 if ix<=t0 true false

return res t11:= t11+1

tx1:= tx1+4 ta := ta+4 t2 := *ta t12:= t12+1 tx2:= tx2+4 tb := tb+4

(76)

Potential of optimizations - Example (5)

Dead Code Elimination: t11, tx1, t12, tx2 do not influence the result.Elimination toten Codes:

Die Zuweisungen an t11, tx1, t12, tx2sind toter Code da sie das Ergebnis nicht beeinflussen

skprod:

Code, da sie das Ergebnis nicht beeinflussen.

res:= 0 ix := 0 t0 := lng-1 t11:= i1-1 tx1:= t11*4 tx1: t11 4 ta := a+tx1 t12:= i2-1 tx2:= t12*4 tb := b+tx2

if ix<=t0

true false

return res ta := ta+4

t2 := *ta tb := tb+4 t3 := *tb t13:= t2*t3 t13: t2 t3 res:= res+t13 ix := ix+1

(77)

Potential of optimizations - Example (6)

Algebraic Optimizations: Use invariantsta=4∗(i1−1+ix) +afor the comparisonta≤4∗(i1Algebraische Optimierung:−1+t0) +a

Ausnutzen der Invarianten: ta = 4*(i1-1+ix)+ a für den Vergleich: ta < 4*(i1 1+t0)+ a für den Vergleich: ta <= 4*(i1-1+t0)+ a

skprod:

res:= 0 ix := 0 t0 := lng-1 t11:= i1-1 tx1:= t11*4 tx1: t11 4 ta := a+tx1 t12:= i2-1 tx2:= t12*4 tb := b+tx2 t4 := t11+t0 t5 := 4*t4 t6 := t5+a

ta := ta+4 t2 := *ta

if ta<=t6

true false

return res t2 : ta

tb := tb+4

(78)

Potential of optimizations - Example (7)

Dead Code Elimination: Assignment to ix is dead code and can be eliminated.Elimination toten Codes:

Durch die Transformation der Schleifenbedingung ist

di Z i C d d d k

die Zuweisung an ixtoter Code geworden und kann eliminiert werden:

skprod:

res:= 0 t0 := lng-1 t11:= i1-1 tx1:= t11*4 ta := a+tx1 ta := a+tx1 t12:= i2-1 tx2:= t12*4 tb := b+tx2 t4 := t11+t0 t5 := 4*t4 t6 := t5+a if ta<=t6

return res ta := ta+4

t2 := *ta tb := tb+4

if ta< t6

true false

tb : tb+4 t3 := *tb t13:= t2*t3 res:= res+t13

(79)

Potential of optimizations - Example (8)

Remarks:

• Reduction of execution steps by almost half, where the most significant reductions are achieved by loop optimization.

• Combination of optimization techniques is important. Determining the ordering of optimizations is in general difficult.

• We have only considered optimizations at examples. The difficulty is to find algorithms and heuristics for detecting optimization potential automatically and for executing the optimizing transformations.

(80)

4.2.3 Data flow analysis

(81)

Data flow analysis

For optimizations, data flow information is required that can be obtained by data flow analysis.

Goal: Explanation of basic concepts of data flow analysis at examples Outline:

• Liveness analysis (Typical example of data flow analysis)

• Data flow equations

• Important analyses classes

Each analysis has an exact specification which information it provides.

(82)

Liveness analysis

Definition (Liveness Analysis)

LetPbe a program. A variablev isliveat a program positionSinP if there is an execution pathπ fromSto a use ofv such that there is no definition ofv onπ.

Theliveness analysisdetermines for all positionsSinP which variables are live atS.

(83)

Liveness analysis (2)

Remarks:

• The definition of liveness of variables is static/syntactic. We have defined dead code dynamically/semantically.

• The result of the liveness analysis for a programmP can be represented as a functionlivemapping positions inP to bit vectors, where a bit vector contains an entry for each variable in P. Letibe the index of a variable inP, then it holds that:

live(S)[i] =1 iff v is live at positionS

(84)

Liveness analysis (3)

Idea:

• In a procedure-local analysis, exactly the global variables are live at the end of the exit block of the procedure.

• If the live variablesout(B)at the end of a basic blockBare known, the live variablesin(B)at the beginning ofBare computed by:

in(B) =gen(B)∪(out(B)\kill(B)) where

I gen(B)is the set of variablesv such thatv is applied inBwithout a prior definition ofv

I kill(B)is the set of variables that are defined inB

(85)

Liveness analysis (4)

As the setin(B)is computed from out(B), we have abackward analysis.

ForBnot the exit block of the procedure,out(B)is obtained by out(B) =[

in(B_i)for all successorsB_iofB

Thus, for a program without loops,in(B)andout(B)are defined for all basic blocksB. Otherwise, we obtain a system of recursive equations.

(86)

Liveness analysis - Example

(87)

Data flow equations

Theory:

• There is always a solution for equations of theconsidered form.

• There is always a smallest solution that is obtained by an iteration starting from emptyinandoutsets.

Note: The equations may have several solutions.

(88)

Ambiguity of solutions - Example

a := a

B0:

b := 7

B1:

out(B0) =in(B0)∪in(B1) out(B1) ={ }

in(B0) =gen(B0)∪(out(B0)\kill(B0))

={a} ∪out(B0)

in(B1) =gen(B1)∪(out(B1)\kill(B1))

={ }

Thus,out(B0) =in(B0), and hencein(B0) ={a} ∪in(B0).

Possible Solutions: in(B0) ={a}orin(B0) ={a,b}

(89)

Computation of smallest fixpoint

1. Computegen(B),kill(B)for allB.

2. Setout(B) =∅for allBexcept for the exit block. For the exit block, out(B)comes from the program context.

3. Whileout(B)orin(B)changes for anyB:

Computein(B)from currentout(B)for allB.

Computeout(B)fromin(B)of its successors.

(90)

Further analyses and classes of analyses

Many data flow analyses can be described as bit vector problems:

• Reaching definitions: Which definitions reach a positionS?

• Available expressions for elimination of repeated computations

• Very busy expressions: Which expression is needed for all subsequent computations?

The according analyses can be treated analogue to liveness analysis, but differ in

• the definition of the data flow information

• the definition ofgenandkill

• the direction of the analysis and the equations

(91)

Further analyses and classes of analyses (2)

For backward analyses, the data flow information at the entry of a basic blockBis obtained from the information at the exit ofB:

in(B) =gen(B)∪(out(B)\kill(B))

Analyses can be distinguished if they consider the conjunction or the intersection of the successor information:

out(B) = [

B_i∈succ(B)

in(B_i)

or

out(B) = \

in(B_i)

(92)

Further analyses and classes of analyses (3)

For forward analyses, the dependency is the other way round:

out(B) =gen(B)∪(in(B)\kill(B)) with

in(B) = [

B_i∈pred(B)

out(B_i)

or

in(B) = \

Bi∈pred(B)

out(B_i)

(93)

Further analyses and classes of analyses (4)

Overview of classes of analyses:

conjunction intersection forward reachable definitions available expressions backward live variables busy expressions

(94)

Further analyses and classes of analyses (5)

For bit vector problems, data flow information consists of subsets of finite sets.

For other analyses, the collected information is more complex, e.g., for constant propagation, we consider mappings from variables to values.

For interprocedural analyses, complexity increases because the flow graph is not static.

Formal basis for the development and correctness of optimizations is provided by the theory ofabstract interpretation.

(95)

4.2.4 Non-Local Program Analysis

(96)

Non-local program analysis

We use apoints-toanalysis to demonstrate:

• interprocedural aspects: The analysis crosses the borders of single procedures.

• constraints: Program analysis very often involves solving or refining constraints.

• complex analysis results: The analysis result cannot be represented locally for a statement.

• analysis as abstraction: The result of the analysis is an abstraction of all possible program executions.

(97)

Points-to analysis

Analysis for programs with pointers and for object-oriented programs Goal: Compute which references to which records/objects a variable can hold.

Applications of Analysis Results:

Basis for optimizations

• Alias information (e.g., important for code motion)

I Can p.f = x cause changes to an object referenced by q?

I Can z = p.f read information that is written by p.f = x?

• Call graph construction

• Resolution of virtual method calls

(98)

Optimization Non-Local Program Analysis

Alias Information

Beispiele: (Verwendung von Points-to- Analyseinformation)

Analyseinformation)

(1) p.f = x;

(2) f

A. Nutzen von Alias-Information:

(2) y = q.f;

(3) q.f = z;

p == q: (1)

(2) y = x;

(2) y x;

(3) q.f = z;

p != q: Erste Anweisung lässt sich mit den anderen beiden vertauschen

anderen beiden vertauschen.

B. Elimination dynamischer Bindung:

class A { class A {

void m( ... ) { ... } }

class B extends A { void m( ) { } void m( ... ) { ... } }

...

A p;

p = new B();

p.m(...) // Aufruf von B::m

First two statements can be switched.

c

(99)

Optimization Non-Local Program Analysis

Elimination of Dynamic Binding

Beispiele: (Verwendung von Points-to- Analyseinformation)

Analyseinformation)

(1) p.f = x;

(2) f

A. Nutzen von Alias-Information:

(2) y = q.f;

(3) q.f = z;

p == q: (1)

(2) y = x;

(2) y x;

(3) q.f = z;

p != q: Erste Anweisung lässt sich mit den anderen beiden vertauschen

anderen beiden vertauschen.

B. Elimination dynamischer Bindung:

class A { class A {

void m( ... ) { ... } }

class B extends A { void m( ) { } void m( ... ) { ... } }

...

A p;

p = new B();

p.m(...) // Aufruf von B::mCall of B::m

(100)

Escape Analysis

C. Escape-Analyse:

R m( A p ) {( p ) { B q;

q = new B(); // Kellerverwaltung möglich q.f = p;

q.g = p.n();

q g p ();

return q.g;

}

Eine Points-to-Analyse für Java:

Vereinfachungen:

• Gesamte Programm ist bekannt.

• Nur Zuweisungen und Methodenaufrufe der folgenden Form:

Di kt Z i

- Direkte Zuweisung: l = r - Schreiben auf Instanzvariablen: l.f = r - Lesen von Instanzvariablen: l = r.f

Objekterzeugung: l C()

- Objekterzeugung: l = new C() - Einfacher Methodenaufruf: l = r0.m(r1,..)

• Ausdrücke ohne Seiteneffekte

• Zusammengesetzte Anweisungen

Can be stored on stack

c

(101)

A Points-to Analysis for Java

Simplifications and assumptions about underlying language

• Complete program is known.

• Only assignments and method calls of the following form are used:

I Direct assignment:l = r

I Write to instance variables:l.f = r

I Read of instance variables:l = r.f

I Object creation:l = new C()

I Simple method call:l = r0.m(r1, ...)

• Expressions without side effects

• Compound statements

(102)

A Points-to Analysis for Java (2)

Analysis type

• Flow-insensitive:The control flow of the program has no influence on the analysis result. The states of the variables at different program points are combined.

• Context-insensitive: Method calls at different program points are not distinguished.

(103)

A Points-to Analysis for Java (3)

Points-to graph as abstraction

Result of the analysis is a so-calledpoints-to graphhaving

• abstract variablesandabstract objectsas nodes

• edges represent that anabstract variablemay have areference to an abstract object

Abstract variables V represent sets of concrete variables at runtime.

Abstract objects O represent sets of concrete objects at runtime.

An edge between V and O means that in a certain program state, a

(104)

Points-to Graph - Example

Beispiel: (Points-to-Graph)

class Y { ... } class X {

Y f;

void set( Y r ) { this.f = r; } static void main() {

X p = new X(); // s1 „erzeugt“ o1 Y q = new Y(); // s2 „erzeugt“ o2q (); // „ g p.set(q);

} }

p

o1 this

o1

f q

r

o2 r

c

(105)

Points-to Graph - Example (2)

(106)

Definition of the Points-to Graph

For all method implementations,

• create node o for each object creation

• create nodes for

I each local variablev

I each formal parameterpof any method (incl.thisand results (ret))

I each static variables

(Instance variables are modeled by labeled edges.)

(107)

Definition of the Points-to Graph (2)

Edges: Smallest Fixpoint off :PtGraph×Stmt→PtGraphwith

• f(G,l=new C()) =G∪ {(l,oi)}

• f(G,l=r) =G∪ {(l,oi)|oi ∈Pt(G,r)}

• f(G,l.f =r) =G∪ {(<o_i,f >,o_j)|o_i ∈Pt(G,l),o_j ∈Pt(G,r)}

• f(G,l=r.f) =G∪ {(l,oi)| ∃oj ∈Pt(G,r).oi ∈Pt(G, <oj,f >)}

• f(G,l=r0.m(r1, . . . ,rn)) = G∪S

oi∈Pt(G,r0)resolve(G,m,oi,r1, . . . ,rn,l) wherePt(G,x)is the points-to set of x in G,

resolve(G,m,oi,r1, . . . ,rn,l) =

letmj(p0,p1, . . . ,pn,retj) =dispatch(oi,m)in

{(p ,o)} ∪f(G,p =r )∪. . .∪f(G,l =ret)end

(108)

Definition of the Points-to Graph (3)

Remark:

The main problem for practical use of the analysis is the efficient implementation of thecomputationof the points-to graph.

Literature:

A. Rountev, A. Milanova, B. Ryder: Points-to Analysis for Java Using Annotated Constraints. OOPSLA 2001.

(109)

4.3 Register Allocation

(110)

Register allocation

Efficient code has to make good use of the available registers on the target machine: Accessing registers is much faster then accessing memory (the same holds for cache).

Register allocationhas two aspects:

• Determine which variables are implemented by registers at which positions.

• Determine which register implements which variable at which positions (register assignment).

(111)

Register allocation (2)

Goals of register allocation

1. Generate code that requires as little registers as possible

2. Avoid unnecessary memory accesses, i.e., not only temporaries, but also program variables are implemented by registers.

3. Allocate registers such for variables that are used often (do not use them for variables that are only rarely accessed).

4. Obey programmer’s requirements.

(112)

Register allocation (3)

Outline

• Algorithm interleaving code generation and register allocation for nested expressions (cf. Goal 1)

• Algorithm for procedure-local register allocation (cf. Goals 2 and 3)

• Combination and other aspects

(113)

4.3.1 Sethi-Ullmann Algorithm

(114)

Register Allocation Sethi-Ullmann Algorithm

Evaluation ordering with minimal registers

The algorithm by Sethi and Ullmann is an example of an integrated approach for register allocation and code generation.

(cf. Wilhelm, Maurer, Sect. 12.4.1, p. 584 ff) Input:

An assignment with a nested expression on the right hand side

4.3.1 Auswertungsordnung mit minimalem Registerbedarf minimalem Registerbedarf

Der Algorithmus von Sethi-Ullman ist ein Beispiel für eine integriertes Verfahren zur Registerzuteilung und Codeerzeugung.

Eingabe:

Eine Zuweisung mit zusammengesetztem Ausdruckg g auf der rechten Seite:

Assign ( Var, Exp ) Exp = BinExp | Var BinExp ( Exp Op Exp ) BinExp ( Exp, Op, Exp ) Var ( Ident )

Ausgabe:

Zugehörige Maschinencode bzw Zwischensprachen Zugehörige Maschinencode bzw. Zwischensprachen- code mit zugewiesenen Registern. Wir betrachten hier Zwei-Adresscode, d.h. Code mit maximal einem

Speicherzugriff:

i [ ]

Ri := M[V]

M[V] := Ri

Ri := Ri op M[V]

Ri := Ri op Rj

(vgl. Wilhelm/Maurer 12.4.1, Seite 584 ff)

c

(115)

Evaluation ordering with minimal registers (2)

Output:

Machine or intermediate language code with assigned registers.

We consider two-address code, i.e., code with one memory access at maximum. The machine hasr registers represented byR₀, . . . ,R_r−1.

4.3.1 Auswertungsordnung mit minimalem Registerbedarf minimalem Registerbedarf

Der Algorithmus von Sethi-Ullman ist ein Beispiel für eine integriertes Verfahren zur Registerzuteilung und Codeerzeugung.

Eingabe:

Eine Zuweisung mit zusammengesetztem Ausdruck g g auf der rechten Seite:

Assign ( Var, Exp ) Exp = BinExp | Var BinExp ( Exp Op Exp ) BinExp ( Exp, Op, Exp ) Var ( Ident )

Ausgabe:

Zugehörige Maschinencode bzw Zwischensprachen Zugehörige Maschinencode bzw. Zwischensprachen- code mit zugewiesenen Registern. Wir betrachten hier Zwei-Adresscode, d.h. Code mit maximal einem

Speicherzugriff:

i [ ]

Ri := M[V]

M[V] := Ri

Ri := Ri op M[V]

Ri := Ri op Rj

(vgl. Wilhelm/Maurer 12.4.1, Seite 584 ff)

(116)

Example: Code generation w/ register allocation

Considerf := (a+b)−(c−(d+e))

Assume that there are two registersR0 andR1 available for the translation.

Result of direct translation:

Beispiel: (Codeerzeugung mit Registerzuteil.)

Betrachte: f:= (a+b)-(c-(d+e)) Betrachte: f:= (a+b) (c (d+e))

Annahme: Zur Übersetzung stehen nur zwei Register zur Verfügung.

Ergebnis der direkten Übersetzung:

R0 := M[a]

R0 := R0 + M[b]

R1 := M[d]

R1 := R1 + M[e]

M[t1] := R1 R1 := M[c]

R1 := R1 – M[t1]

R0 := R0 – R1 M[f] := R0 Ergebnis von Sethi-Ullman:

R0 := M[c]

R1 := M[d]

R1 := R1 + M[e]

R0 := R0 – R1 R1 : M[a]

R1 := M[a]

R1 := R1 + M[b]

R1 := R1 – R0 M[f] := R1

Besser, weil ein Befehl weniger und keine Zwischen-

c

(117)

Example: Code generation w/ register allocation (2)

Result of Sethi-Ullmann algorithm:

Beispiel: (Codeerzeugung mit Registerzuteil.)

Betrachte: f:= (a+b)-(c-(d+e)) Betrachte: f:= (a+b) (c (d+e))

Annahme: Zur Übersetzung stehen nur zwei Register zur Verfügung.

Ergebnis der direkten Übersetzung:

R0 := M[a]

R0 := R0 + M[b]

R1 := M[d]

R1 := R1 + M[e]

M[t1] := R1 R1 := M[c]

R1 := R1 – M[t1]

R0 := R0 – R1 M[f] := R0 Ergebnis von Sethi-Ullman:

R0 := M[c]

R1 := M[d]

R1 := R1 + M[e]

R0 := R0 – R1 R1 : M[a]

R1 := M[a]

R1 := R1 + M[b]

R1 := R1 – R0 M[f] := R1

Besser, weil ein Befehl weniger und keine Zwischen- Speicherung nötig.

More efficient, because it uses one instruction less and does not need

(118)

Sethi-Ullmann algorithm

Goal: Minimize number of registers and number of temporaries.

Idea: Generate code for subexpression requiring more registers first.

Procedure:

• Define functionregbed that computes the number of registers needed for an expression

• Generate code for an expressionE = BinExp(L,OP,R);

(119)

Sethi-Ullmann algorithm (2)

We use the following notations:

• v_reg(E): the set of available registers for the translation of E

• v_tmp(E): the set of addresses where values can be stored temporarily when translating E

• cell(E): register/memory cell where the result of E is stored Now, let

• E be an expression

• L the left subexpression of E

• R the right subexpression of E

• vr abbreviate |v_reg(E)|

(120)

Sethi-Ullmann algorithm (3)

We distinguish the following cases:

1. regbed(L)<vr

2. regbed(L)≥vr and regbed(R)<vr 3. regbed(L)≥vr and redbed(R)≥vr

(121)

Sethi-Ullmann algorithm (4)

Case 1: regbed(L)<vr

• Generate code for R using v_reg(E) and v_tmp(E) with result in cell(R)

• Generate code for L using v_reg(E)\{cell(R)}and v_tmp(E) with result in cell(L)

• Generate code for the operation cell(L) := cell(L) OP cell(R)

• Set cell(E) = cell(L)

(122)

Sethi-Ullmann algorithm (5)

Case 2: regbed(L)≥vr and regbed(R)<vr

• Generate code for L using v_reg(E) and v_tmp(E) with result in cell(L)

• Generate code for R using v_reg(E)\{cell(L)}and v_tmp(E) with result in cell(R)

• Generate code for the operation cell(L) := cell(L) OP cell(R)

• Set cell(E) = cell(L)

4. Selected Topics in Compiler

Compilers and Language Processing Tools

Content of Lecture

4. Selected Topics in Compiler

Construction

Chapter Outline

Selected topics in compiler construction

Selected topics in compiler construction (2)

4.1 Intermediate languages

Intermediate languages

Intermediate languages (2)

4.1.1 3-Address Code

3-address code

3-address code (2)

3AC commands

3AC commands (2)

Basic blocks

Basic blocks (2)

Example: 3AC and basic blocks

Example: 3AC and basic blocks (2)

Example: 3AC and basic blocks (3)

Example: 3AC and basic blocks (4)

Intermediate Language Variations

Characteristics of 3-Address Code

4.1.2 Other Intermediate Languages

Further Intermediate Languages

Single Static Assignment Form

Single Static Assignment Form (2)

Example: SSA

In SSA-Repräsentation besitzt jede Variable genau eine Definition. Dadurch wird der Zusammenhang

ischen An end ng nd Definition in der zwischen Anwendung und Definition in der Zwischensprache explizit, d.h. eine zusätzliche def-use-Verkettung oder use-def-Verkettung wird unnötig.

SSA ist im Wesentlichen eine Verfeinerung von 3AC.

Die Unterscheidung zwischen den Definitionsstellen wird häufig durch Indizierung der Variablen dargestellt wird häufig durch Indizierung der Variablen dargestellt.

Für sequentielle Befehlsfolgen bedeutet das:

• An jeder Definitionsstelle bekommt die Variable einen anderen Index

einen anderen Index.

• An der Anwendungsstelle wird die Variable mit dem Index der letzten Definitionsstelle notiert.

a := x + y Beispiel:

a := x + y

b := a – 1

a := y + b b := x * 4 a := a + b

b := a - 1 a := y + b b := x * 4 a := a + b

a := a + b a := a + b

SSA - Join Points of Control Flow

An Stellen, an denen der Kontrollfluß zusammen- führt bedarf es eines zusätzlichen Mechanismus:

führt, bedarf es eines zusätzlichen Mechanismus:

a := x + y

a := a – b

b := a

?

...

Einführung der fiktiven Orakelfunktion“ ! die Einführung der fiktiven „Orakelfunktion ! , die

quasi den Wert der Variable im zutreffenden Zweig auswählt:

a := x + y

a := a – b

a := ! (a ,a ) b := a

...

SSA - Join Points of Control Flow (2)

An Stellen, an denen der Kontrollfluß zusammen- führt bedarf es eines zusätzlichen Mechanismus:

führt, bedarf es eines zusätzlichen Mechanismus:

a := x + y

a := a – b

b := a

...

Einführung der fiktiven Orakelfunktion“ ! die Einführung der fiktiven „Orakelfunktion ! , die quasi den Wert der Variable im zutreffenden Zweig auswählt:

a := x + y

a := a – b

a := ! (a ,a )

b := a

...

SSA - Remarks

Further intermediate languages

Stack machine code as intermediate language

Stack machine code as intermediate language (2)

Sprachlich homogenes Szenario bei Java:

Sprachlich ggf. inhomogenes Szenario bei .NET:

Java-ByteCode und die MS-Intermediate Language sind Beispiele für Kellermaschinencode, d.h.

Z i h b i d f i L f itk ll

Zwischenergebnisse werden auf einem Laufzeitkeller

Example: Stack machine code

a := ! _{(a ,a )} b := a

a := ! _{(a ,a )}