Compilers and Language Processing Tools
Summer Term 2013
Arnd Poetzsch-Heffter Annette Bieniusa
Software Technology Group TU Kaiserslautern
Content of Lecture
1. Introduction
2. Syntax and Type Analysis 2.1 Lexical Analysis
2.2 Context-Free Syntax Analysis
2.3 Context-Dependent Analysis (Semantic Analysis) 3. Translation to Intermediate Representation
3.1 Languages for Intermediate Representation 3.2 Translation of Imperative Language Constructs 3.3 Translation of Object-Oriented Language Constructs 3.4 Translation of Procedures
4. Optimization and Code Generation 4.1 Assembly and Machine Code 4.2 Optimization
4.3 Register Allocation 4.4 Further Aspects
Content of Lecture (2)
5. Selected Topics in Compiler Construction 5.1 Just-in-time Compilation
5.2 Garbage Collection
5.3 XML Processing (DOM, SAX, XSLT)
3. Translation to Intermediate
Representation
Translation of Imperative Language Constructs
3.1 Translation of Imperative Language Constructs
Introduction
Focus:
• Differences between source languages and target languages/target machines
• Most important translation techniques for different programing paradigms (procedural/object-oriented)
Learning Objectives:
• Overview of imperative and procedural language constructs
• Languages for intermediate representation
• Translation of object-oriented language constructs
• Translation techniques for procedural language constructs
Translation of Imperative Language Constructs
Language Constructs of Procedural Languages
From a conceptional and semantical view point, procedural languages have the following constructs:
• Domains with operations (often typed)
I pre-defined: int, boolean, ...
I user-defined: records, classes, ...
I implicitly defined: field types, address types, function types
• Variables
I simple and compound types
I global, local, statically/dynamically allocated
I define memory state
• Expressions
I computation of values with implicit intermediate results
I possibly in combination with execution control and state modification
Language Constructs of Procedural Languages (2)
• Statements
I simple and combined statements
I define execution control and state modification
• Procedures
I abstraction of parametrized statements
I may be recursive
I may be nested
Modules usually do not have a semantic meaning and are only relevant for the name analysis and for binding and loading.
Translation of Imperative Language Constructs
Nested Procedures
Example from [Wilhelm, Maurer; Fig. 2.9]
Übersetzung geschachtelter Prozeduren Geschachtelte/lokale Prozeduren werden z.B.
von Pascal und Ada unterstützt
Beispiel: (geschachtelte Prozeduren) von Pascal und Ada unterstützt.
proc P(a) var b
Abb. 2.9)
var b var c proc Q
var a proc R
elm/Maurer,
var b begin ... b ...
... a ...
c
mt aus Wilhe
... c ...
end begin ... a ...
... b ...
spiel stamm
... call Q ...
end proc S
var a begin
(das Beis
begin ... a ...
... call Q ...
end begin ... a ...
... call Q ...
end
3.1 Languages for Intermediate Representation
Languages for Intermediate Representation
Motivation
We could go directly from the AST to machine code, but ...
Java C C++
Pascal ML
Sparc MIPS Pentium
Java C C++
Pascal ML
IR
Sparc MIPS Pentium
Intermediate representation
• front end: lexical analysis, parsing, semantic analysis
• back end: machine specific optimization, translation to machine language
• intermediate code: machine and language independent optimization
Specifics of intermediate representation
A good IR is
• convenient to produce from source language
• convenient to translate into machine language
• small, with clear and simple semantics Design of IR:
• IR languages are comparable to data structures in algorithm design, i.e., for each task, an intermediate language is more or less suitable.
• Intermediate languages can conceptually be seen as abstract machines.
Languages for Intermediate Representation
Typical differences: Source language vs. IR
• Data types and memory:
array and field deferencing vs. load/store on heap or stack
• Expressions:
simpler
• Statements:
compound statements vs. (conditional) jumps
• Method calls:
I various number of arguments vs. simple call
I explicit management of recursion (→stack frames)
3.1.1 SIRL: A simple IR language
Languages for Intermediate Representation SIRL: A simple IR language
SIRL: Introduction
SIRL is very similar to the IR language Piglet of compiler project Data types and memory:
• Values in SIRL are integers and addresses
• SIRL programs work on a byte-addressable memory
SIRL: Expressions
CONST(i) integer constanti or addressi NAME(n) symbolic constantn[code label]
TEMP(t) temporaryt, similar to a machine register BINOP(o,e1,e2) binary operatorowith operandse1ande2 MEM(e) contents of a word of memory at addresse CALL(f,[e1, . . . ,en]) procedure call
ESEQ(s,e) statement expression; execute statements for side-effects, expressionefor result
Languages for Intermediate Representation SIRL: A simple IR language
SIRL: Operators
Binary arithmetic and logical operators:
PLUS, MINUS, MUL, DIV integer arithmetic operators AND, OR, XOR integer bitwise logical operators LSHIFT, RSHIFT integer logical shift operators ARSHIFT integer arithmetic right-shift Relational operators:
EQ, NE integer equality and non-equality (for both signed and unsigned)
LT, GT, LE, GE integer inequalities (signed) ULT, UGT, ULE, UGE integer inequalities (unsigned)
SIRL: Statements
MOVE(TEMP(t),e) Evaluateeand move it intot.
MOVE(MEM (e1),e2) Evaluatee1yielding addressa; evaluatee2 and move it intoa.
EXP(e) Evaluateeand discard result.
JUMP(e,[l1, . . . ,ln]) Transfer control (jump) to address e;
l1, . . . ,ln are all possible values for e. Of- ten used: JUMP(l).
CJUMP(o,e1,e2,t,f) Evaluatee1, thene2; compare their results using relational operatoro. If true, jump to labelt, else jump to labelf.
SEQ(s1,s2) Statements1followed by statements2. LABEL(n) Define constant value of name n as current
code address. NAME(n) can then be used as targets of jumps, calls, etc.
NOOP skip statement
Languages for Intermediate Representation SIRL: A simple IR language
SIRL: Program structure
Program ::= MAIN StmtList END Procedure*
StmtList ::= ( Label? Stmt )*
Procedure ::= Label [ IntLiteral ] ESEQ whereStmt,ESEQ,Labelare as defined above
Examples
Translate the following MiniJava statements to SIRL:
1. if (x < y) x = y; else x = 0;
2. y = z[5];
Languages for Intermediate Representation SIRL: A simple IR language
Examples
1. if (x < y) x = y; else x = 0;
• Assume,x corresponds to TEMP 5, y corresponds to TEMP 27.
• Define three (new) label namesL1,L2, andL3.
CJUMP (LT, TEMP 5, TEMP 27, L1, L2) L1 MOVE (TEMP 5, TEMP 27)
JUMP L3
L2 MOVE (TEMP 5, CONST 0) L3 ...
Examples
2. y = z[5];
• Assumey corresponds to TEMP 27, and the base address of arrayz isa.
• Letw be the word size of MiniJava (e.g. 4 bytes).
• Calculate the offset for the array at index 5
MOVE (TEMP 27, MEM (+(a, *(CONST 5, CONST w)))) Here, we useo(e1,e2)as abbreviation forBINOP(o,e1,e2).
Languages for Intermediate Representation 3-Address Code
3.1.2 3-Address Code
3-address code
3-address code (3AC) is a common intermediate language with many variants.
Properties:
• only elementary data types (but often arrays)
• no nested expressions
• sequential execution, jumps and procedure calls as statements
• named variables as in a high level language
• unbounded number of temporary variables
Languages for Intermediate Representation 3-Address Code
3-address code (2)
A program in 3AC consists of
• a list of global variables
• a list of procedures with parameters and local variables
• a main procedure
• each procedure has a sequence of 3AC commands as body
3AC commands
Syntax Explanation
x := y bop z x : = uop z x:= y
x: variable (global, local, parameter, temporary) y,z: variable or constant
bop: binary operator uop: unary operator goto L
if x cop y goto L
jump or conditional jump to label L cop: comparison operator
only procedure-local jumps x:= a[i]
a[i]:= y a one-dimensional array x : = & a
x:= *y
*x := y
a global, local variable or parameter
& a address of a
* dereferencing operator
Languages for Intermediate Representation 3-Address Code
3AC commands (2)
Syntax Explanation
param x call p return y
call p(x1, ..., xn) is encoded as:
(block is considered as one command)
param x1 ...
param xn call p
return y causes jump to return address with (optional) result y
We assume that 3AC only contains labels for which jumps are used in the program.
Basic blocks
A sequence of 3AC commands can be uniquely partitioned into basic blocks.
Abasic block Bis a maximal sequence of commands such that
• at the end of B, exactly one jump, procedure call, or return command occurs
• labels only occur at the first command of a basic block
Languages for Intermediate Representation 3-Address Code
Basic blocks (2)
Remarks:
• The commands of a basic block are always executed sequentially, there are no jumps to the inside
• Often, a designated exit-block for a procedure containing the return jump at its end is required. This is handled by additional transformations.
• The transitions between basic blocks are often denoted by flow charts.
Example: 3AC and basic blocks
Consider the following C program:
Beispiel: (3AC und Basisblöcke)
Wir betrachten den 3AC für ein C-Programm:
int a[2];
int b[7];
int skprod(int i1, int i2, int lng) {... } int main( ) {
a[0] = 1; a[1] = 2;
b[0] = 4; b[1] = 5; b[2] = 6;
skprod(0 1 2);
skprod(0,1,2);
return 0;
}
3AC mit Basisblockzerlegung für die Prozedur main:
main:
a[0] := 1 a[0] := 1 a[1] := 2 b[0] := 4 b[1] := 5 b[2] := 6 param 0 param 1 param 2 call skprod return 0
c
Arnd Poetzsch-Heffter Translation to Intermediate Representation 30
Languages for Intermediate Representation 3-Address Code
Example: 3AC and basic blocks (2)
3AC with basic block partitioning for main procedure Beispiel: (3AC und Basisblöcke) Wir betrachten den 3AC für ein C-Programm:
int a[2];
int b[7];
int skprod(int i1, int i2, int lng) {... } int main( ) {
a[0] = 1; a[1] = 2;
b[0] = 4; b[1] = 5; b[2] = 6;
skprod(0 1 2);
skprod(0,1,2);
return 0;
}
3AC mit Basisblockzerlegung für die Prozedur main:
main:
a[0] := 1 a[0] := 1 a[1] := 2 b[0] := 4 b[1] := 5 b[2] := 6 param 0 param 1 param 2 call skprod
28.06.2007 © A. Poetzsch-Heffter, TU Kaiserslautern 296
return 0
Example: 3AC and basic blocks (3)
Procedure skprod:Prozedur skprod mit 3AC und Basisblockzerlegung:
int skprod(int i1, int i2, int lng) { int ix, res = 0;
for( ix=0; ix <= lng-1; ix++ ){
res += a[i1+ix] * b[i2+ix];
}
skprod:
}
return res;
}
res:= 0 ix := 0
t0 := lng-1 if ix<=t0
true false
t1 := i1+ix t2 := a[t1]
t1 := i2+ix t3 := b[t1]
t1 := t2*t3
return res t1 := t2*t3
res:= es+t1 ix := ix+1
c
Arnd Poetzsch-Heffter Translation to Intermediate Representation 32
Languages for Intermediate Representation 3-Address Code
Example: 3AC and basic blocks (4)
Procedure skprod as 3AC with basic blocks
Prozedur skprod mit 3AC und Basisblockzerlegung:
int skprod(int i1, int i2, int lng) { int ix, res = 0;
for( ix=0; ix <= lng-1; ix++ ){
res += a[i1+ix] * b[i2+ix];
}
skprod:
}
return res;
}
res:= 0 ix := 0
t0 := lng-1 if ix<=t0
true false
t1 := i1+ix t2 := a[t1]
t1 := i2+ix t3 := b[t1]
t1 := t2*t3
return res t1 := t2*t3
res:= es+t1 ix := ix+1
28.06.2007 © A. Poetzsch-Heffter, TU Kaiserslautern 297
Intermediate Language Variations
3 AC after elimination of array operations (at above example)
Variation im Rahmen einer Zwischensprache:
3-Adress-Code nach Elimination von Feldoperationen anhand des obigen Beispiels:
skprod:p
res:= 0 ix := 0
t0 := lng-1 if ix<=t0
t1 := i1+ix tx := t1*4 ta := a+tx
true false
return res t2 := *ta
t1 := i2+ix tx := t1*4 tb := b+tx t3 *tb t3 := *tb t1 := t2*t3 res:= res+t1 ix := ix+1
Languages for Intermediate Representation 3-Address Code
Characteristics of 3-Address Code
• Control flow is explicit.
• Only elementary operations
• Rearrangement and exchange of commands can be handled relatively easily.
3.1.3 Other Intermediate Languages
Languages for Intermediate Representation Other Intermediate Languages
Further Intermediate Languages
We consider
• 3AC inStatic Single Assignment (SSA)representation
• Stack Machine Code
Single Static Assignment Form
If a variableais read at a program position, this is auseofa.
If a variableais written at a program position, this is adefinitionofa.
For optimizations, the relationship between use and definition of variables is important.
In SSA representation, each variable has exactly one definition. Thus, relationship between use and definition in the intermediate language is explicit.
Languages for Intermediate Representation Other Intermediate Languages
Single Static Assignment Form (2)
SSA is essentially a refinement of 3AC.
The different definitions of one variable are represented by indexing the variable.
For sequential command lists, this means that
• at each definition position, the variable gets a different index.
• at the use position, the variable has the index of its last definition.
Languages for Intermediate Representation Other Intermediate Languages
Example: SSA
In SSA-Repräsentation besitzt jede Variable genau eine Definition. Dadurch wird der Zusammenhang
ischen An end ng nd Definition in der zwischen Anwendung und Definition in der Zwischensprache explizit, d.h. eine zusätzliche def-use-Verkettung oder use-def-Verkettung wird unnötig.
SSA ist im Wesentlichen eine Verfeinerung von 3AC.
Die Unterscheidung zwischen den Definitionsstellen wird häufig durch Indizierung der Variablen dargestellt wird häufig durch Indizierung der Variablen dargestellt.
Für sequentielle Befehlsfolgen bedeutet das:
• An jeder Definitionsstelle bekommt die Variable einen anderen Index
einen anderen Index.
• An der Anwendungsstelle wird die Variable mit dem Index der letzten Definitionsstelle notiert.
a := x + y Beispiel:
a := x + y
1 0 0b := a – 1
a := y + b b := x * 4 a := a + b
b := a - 1 a := y + b b := x * 4 a := a + b
1 1
2 2
0
0 1
28.06.2007 © A. Poetzsch-Heffter, TU Kaiserslautern 300
a := a + b a := a + b
3 2 2Languages for Intermediate Representation Other Intermediate Languages
SSA - Join Points of Control Flow
At join points of control flow, an additional mechanism is required:
An Stellen, an denen der Kontrollfluß zusammen- führt bedarf es eines zusätzlichen Mechanismus:
führt, bedarf es eines zusätzlichen Mechanismus:
3 2 2
a := x + y
1 0 0a := a – b
b := a
3?
...
Einführung der fiktiven Orakelfunktion“ ! die Einführung der fiktiven „Orakelfunktion ! , die
quasi den Wert der Variable im zutreffenden Zweig auswählt:
3 2 2
a := x + y
1 0 0a := a – b
a := ! (a ,a ) b := a
34 4 1 3...
c
Arnd Poetzsch-Heffter Translation to Intermediate Representation 41
Languages for Intermediate Representation Other Intermediate Languages
SSA - Join Points of Control Flow (2)
Introduce an "oracle"Φthat selects the value of the variable of the use branch:
An Stellen, an denen der Kontrollfluß zusammen- führt bedarf es eines zusätzlichen Mechanismus:
führt, bedarf es eines zusätzlichen Mechanismus:
3 2 2
a := x + y
1 0 0a := a – b
b := a
3 ?...
Einführung der fiktiven Orakelfunktion“ ! die Einführung der fiktiven „Orakelfunktion ! , die quasi den Wert der Variable im zutreffenden Zweig auswählt:
3 2 2
a := x + y
1 0 0a := a – b
a := ! (a ,a )
b := a
34 4 1 3...
Languages for Intermediate Representation Other Intermediate Languages
SSA - Remarks
• The construction of an SSA representation with a minimal number of applications of theΦoracle is a non-trivial task.
(cf. Appel, Sect. 19.1. and 19.2)
• The termsingle static assignmentform reflects that for each variable in the program text, there is only one assignment.
Dynamically, a variable in SSA representation can be assigned arbitrarily often (e.g., in loops).
Further intermediate languages
While 3AC and SSA representation are mostly used as intermediate languages in compilers, intermediate languages and abstract
machines are more and more often used as connections between compilers and runtime environments.
Java Byte Code and CIL (Common Intermediate Language, cf. .NET) are examples for stack machine code, i.e., intermediate results are stored on a runtime stack.
Further intermediate languages are, for instance, used for optimizations.
Languages for Intermediate Representation Other Intermediate Languages
Stack machine code as intermediate language
Homogeneous scenario for Java:Sprachlich homogenes Szenario bei Java:
C1.java
C2.java jikes C1.class
C2 class
Java ByteCode
C2.java
C3.java javac2
C2.class C3.class
JVM
Sprachlich ggf. inhomogenes Szenario bei .NET:
Programme
Intermediate
C# - C il
prog1.cs prog1.il
verschiedener Hochsprachen
Intermediate Language
Compiler
prog2.cs prog2.il
prog3.il
CLR Haskell -
Compiler prog3.hs
Java-ByteCode und die MS-Intermediate Language sind Beispiele für Kellermaschinencode, d.h.
Z i h b i d f i L f itk ll
Zwischenergebnisse werden auf einem Laufzeitkeller verwaltet.
c
Arnd Poetzsch-Heffter Translation to Intermediate Representation 45
Languages for Intermediate Representation Other Intermediate Languages
Stack machine code as intermediate language (2)
Inhomogeneous scenario for .NET:
Sprachlich homogenes Szenario bei Java:
C1.java
C2.java jikes C1.class
C2 class
Java ByteCode
C2.java
C3.java javac2
C2.class C3.class
JVM
Sprachlich ggf. inhomogenes Szenario bei .NET:
Programme
Intermediate
C# - C il
prog1.cs prog1.il
verschiedener Hochsprachen
Intermediate Language
Compiler
prog2.cs prog2.il
prog3.il
CLR Haskell -
Compiler prog3.hs
Java-ByteCode und die MS-Intermediate Language sind Beispiele für Kellermaschinencode, d.h.
Z i h b i d f i L f itk ll
Zwischenergebnisse werden auf einem Laufzeitkeller verwaltet.
c
Arnd Poetzsch-Heffter Translation to Intermediate Representation 46
Languages for Intermediate Representation Other Intermediate Languages
Example: Stack machine code
Beispiel: (Kellermaschinencode)
package beisp;
class Weltklasse extends Superklasse implements BesteBohnen { Qualifikation studieren ( Arbeit schweiss){
return new Qualifikation();
} } }
Compiled from Weltklasse.java
class beisp Weltklasse extends beisp Superklasse class beisp.Weltklasse extends beisp.Superklasse
implements beisp.BesteBohnen{
beisp.Weltklasse();
beisp.Qualifikation studieren( beisp.Arbeit);
}
Method beisp.Weltklasse() 0 aload_0
1 invokespecial #6 <Method beisp.Superklasse()>
4 return
Method beisp.Qualifikation studieren( beisp.Arbeit ) 0 new #2 <Class beisp.Qualifikation>
3 dup
4 invokespecial #5 <Method beisp.Qualifikation()>
7 areturn 7 areturn
Bemerkung:
Weitere Zwischensprachen werden insbesondere auch Weitere Zwischensprachen werden insbesondere auch im Zusammenhang mit Optimierungen eingesetzt.
c
Arnd Poetzsch-Heffter Translation to Intermediate Representation 47
Languages for Intermediate Representation Other Intermediate Languages
Example: Stack machine code (2)
Beispiel: (Kellermaschinencode)
package beisp;
class Weltklasse extends Superklasse implements BesteBohnen { Qualifikation studieren ( Arbeit schweiss){
return new Qualifikation();
} } }
Compiled from Weltklasse.java
class beisp Weltklasse extends beisp Superklasse class beisp.Weltklasse extends beisp.Superklasse
implements beisp.BesteBohnen{
beisp.Weltklasse();
beisp.Qualifikation studieren( beisp.Arbeit);
}
Method beisp.Weltklasse() 0 aload_0
1 invokespecial #6 <Method beisp.Superklasse()>
4 return
Method beisp.Qualifikation studieren( beisp.Arbeit ) 0 new #2 <Class beisp.Qualifikation>
3 dup
4 invokespecial #5 <Method beisp.Qualifikation()>
7 areturn 7 areturn
Bemerkung:
Weitere Zwischensprachen werden insbesondere auch Weitere Zwischensprachen werden insbesondere auch im Zusammenhang mit Optimierungen eingesetzt.
Translation of Imperative Language Constructs
3.2 Translation of Imperative Language Constructs
3.2.1 Basic Concepts and Memory Organization
Translation of Imperative Language Constructs Basic Concepts and Memory Organization
Introduction
Difficulties of learning about translation:
• Translation is source language dependent
• Translation is target language dependent Explanation approach:
• Basic concepts in detail using
I a TOYC procedural language as source
I SIRL as target language
• Other language features in a less detailed manner
TOYC - a sublanguage of C (Decls & Statements)
Program ( GlobDeclList ) GlobDeclList * GlobDecl GlobDecl = Var | Array | Proc Var ( Ident id )
Array ( Ident id, int size )
Proc ( Ident id, ParamList parl, LocDeclList ldl, Stmt body ) ParamrList * Param
Param ( Ident id ) LocDeclList * Var
Stmt = VarAssign | ArrAssign | Call | StmtList | If | While VarAssign ( UsedId uid, Exp rhs )
ArrAssign ( UsedId uid, Exp ixe, Exp rhs ) Call ( UsedId uid, ExpList )
StmtList * Stmt
If ( Exp c, Stmt then, Stmt else ) While ( Exp c, Stmt body )
Translation of Imperative Language Constructs Basic Concepts and Memory Organization
TOYC - a sublanguage of C (Expressions)
Exp = ArtihmExp | Relation | BoolExp | IntConst
| ArrayAccess | VarExp ArithmExp = Add | Sub
Add, Sub ( Exp left, Exp right ) Relation = Lt | Eq
Lt, Eq ( Exp left, Exp right ) BoolExp = And | Or | Not
And, Or ( Exp left, Exp right ) Not ( Exp e )
IntConst ( int i )
ArrayAccess ( UsedId uid, Exp e ) VarExp ( UsedId uid )
ExpList * Exp
UsedId ( Ident id )
TOYC - Context conditions
Every TOYC program declares
• an arrayinputcontaining the program input at the start of program
• an arrayoutputfor the program results
• a parameterless proceduremain
Translation of Imperative Language Constructs Basic Concepts and Memory Organization
Translation of TOYC to SIRL
Main aspects of the translation:
• mapping global variables, arrays, parameters, local variables to the “storing facilities” of SIRL:
I main memory
I temporaries
• realizing array accesses by address computations
• translation of expressions
• translation of statements
• translation of procedures
Memory organization - basic ideas
Memory is organized into segments:
• static data: for global variables and arrays (this segment also stores constant strings and other static data)
• stack: because of recursion, actual parameters, local variables and arrays need storage for every activeprocedure incarnation:
I stack grows with a call: a newstack frameis pushed
I stack shrinks after a call: last stack frame is popped (managed by SIRL’s CALL expression)
• heap: for dynamically allocated variables (not needed for TOYC)
• code: for storing the program code (not needed for SIRL)
Translation of Imperative Language Constructs Basic Concepts and Memory Organization
Memory organization - example
stack
free memory
heap static data
code
high address
low address
Layout of typical stack frame
For procedures with results, additional memory is needed (where?)
Translation of Imperative Language Constructs Basic Concepts and Memory Organization
Memory organization for TOYC to SIRL
• global variables and arrays are stored in memory
• parameters and local variable are stored in temporaries;
procedure withnparameters uses
I TEMP(0),..., TEMP(n−1) for the parameters
I TEMP(n), ... for the local variables
• CALL expression implicitly manages the different copies of the temporaries
3.2.2 Translation of Variables and Data Types
Translation of Imperative Language Constructs Translation of Variables and Data Types
Translation of variables and data types
The translation of variables and data types comprises:
• handling of primitive data types
• conversion of data types (e.g. int→float)
• memory organisation
• translation of arrays
• translation of records and classes
• implementation of dynamic objects
Primitive data types
Usually, the primitive data types of source languages are supported by the target language:
• int, long→4 byte word with integer arithmetic
• float, double→accordingly
Potentially, data types have to be encoded:
• boolean→1 byte or 4 byte words
Problem, if target language does not comply to requirements of source language, e.g.
• floating point arithmetic is not handled according to IEEE standard
• overflows are not dealt with correctly (cmp. Java FP-strict expressions)
• operations for conversion are missing on target machine
Translation of Imperative Language Constructs Translation of Variables and Data Types
Primitive data types in TOYC
• TOYC only supports four byte integers as primitive values
• trueandfalseare handled by 1 and 0
Translation of arrays
Efficient translation of arrays is important for many tasks.
One-dimensional static arrays
• Allocate memory in the segment for static data (starting atsd)
• Address computation with base address of array, index of array element and size of element type
Consider the array declarationT tarr[57]:
• LetRrel contain the relative address of the arraytarr
• LetRi contain the indexiof the array component
Ifk =sizeof(T), then the address oftarr[i]issd+Rrel +k ∗Ri.
Translation of Imperative Language Constructs Translation of Variables and Data Types
Translation of arrays for TOYC
• every array in TOYC has an attributebaseAddr holding its base addresssd+Rrel
• an accessmyarr[i]to an arraymyarr is translated to MEM (+(baseAddr(myarr), *(e(i), CONST 4)))) wheree(i)is the expression fori
• Code example for array access in SIRL:see lecture
• Translation example from TOYCarray to SIRL:see lecture
TOYCarray - a sublanguage of TOYC
Program ( GlobDeclList ) GlobDeclList * GlobDecl GlobDecl = Var | Array | Proc Var ( Ident id )
Array ( Ident id, int size ) Proc ( Ident id, StmtList body ) Stmt = VarAssign | ArrAssign VarAssign ( UsedId uid, Exp rhs )
ArrAssign ( UsedId uid, Exp ixe, Exp rhs ) StmtList * Stmt
Exp = IntConst | ArrayAccess | VarExp IntConst ( int i )
ArrayAccess ( UsedId uid, Exp e ) VarExp ( UsedId uid )
UsedId ( Ident id )
Translation of Imperative Language Constructs Translation of Variables and Data Types
More about translation of arrays
Multi-dimensional static arrays
Consider as example the Pascal declaration var a:array[-5..5][1..9] of integer;
which corresponds to 99 integer variables:
a[-5, 1] ... a[-5,9]
...
a[5,1] ... a[5,9]
Matrix is stored in rows in memory. Storing in rows is more efficient than storing columns as second index is often incremented in inner loops.
More about translation of arrays(2)
Translation of access toa[E1,E2]:
Assume results of evaluating E1 and E2 are stored int1 andt2.
Asais a static array, we know the dimensions at compile time.
a[t1,t2] is the r-th component of a linear array with r = (t1−(−5))∗((9−1) +1) + (t2−1)
= 9∗t1+45+t2−1
= 9∗t1+t2+44 Translation:
Store the address of the 44-th component as base address of the array in symbol table. Then it suffices to add 9∗t1+t2 to base address.
Translation of Imperative Language Constructs Translation of Variables and Data Types
General Translation of Arrays
General array declaration of dimension k
var a: array [u1..o1], ...., [uk..uk] of T;
Storing rows yields the following adress for accessing a[R1, ..., Rk]:
r = (R1−u1)∗size(array[u2..o2, ...,uk..ok]of T) + (R2−u2)∗size(array[u3..o3, ...,uk..ok]of T)
+ . . .
+ (Rk−uk)∗size(T)
General Translation of Arrays (2)
Fori=1, . . . ,k −1, it holds that
size(i) :=size(array[u{i+1}..o{i+1}, ...,uk..ok]of T) size(k) =size(T)
This implies
size(i−1) =size(i)∗(oi−ui+1) Simplification yields:
r =
k
X
i=1
Ri∗size(i)−
k
X
i=1
ui∗size(i)
At runtime, only the first summand has to be computed for which code has to be generated.
Translation of Imperative Language Constructs Translation of Variables and Data Types
Array Access
Remarks:
• Computation of array indices offers great potential for optimizations.
• For translation of dynamic arrays, addressing has to be generalized appropriately. (cf. Wilhelm/Maurer, Sect. 2.6.2)
Translation of Records
Translation of records is similar to translation of arrays:
• Determine size and memory layout
• Compute adresses for selection of record components and pointer dereferencing
• Translation of record operations, e.g. assignments to record components
Recommended Reading: Wilhelm, Maurer, Section 2.6.2
Translation of Imperative Language Constructs Translation of Variables and Data Types
Implementation of Dynamic Objects
Dynamic objects = dynamically allocated variables and objects in the sense of OO programing
Dynamic objects are stored on the heap:
• number/size of dynamic objects is not known at compile time, objects are created at runtime
• dynamic objects often have a designated lifetime which disallows handling them on the stack
Memory representation and addressing of components is similar to static records.
Translation of Imperative Language Constructs Translation of Variables and Data Types
Implementation of Dynamic Objects (2)
Example:
Implementierung dynamischer Objekte
Dynamische Objekte werden hier als Sammelbegriff für Dynamische Objekte werden hier als Sammelbegriff für dynamisch allozierte Variable und Objekte im Sinne der OO-Programmierung verwendet.
Dynamische Objekte werden auf der Halde verwaltet:
Dynamische Objekte werden auf der Halde verwaltet:
• Ihre Anzahl ist im Allg. zur Übersetzungszeit nicht bekannt. Deshalb werden sie erst zur Laufzeit erzeugt.
• Sie haben eine Lebensdauer die eine kellerartigeSie haben eine Lebensdauer, die eine kellerartige Behandlung im Allg. nicht zulässt.
Beispiel: (dynamische Objekte) Beispiel: (dynamische Objekte)
typedef struct listelem { int head;
struct listelem* tail; }* list;
# define listelemSIZE sizeof(struct listelem{
int h; struct listelem* t;}) list append( int i list l ) {
list append( int i, list l ) {
list lvar = (list) calloc(1,listelemSIZE);
lvar->head = i;
lvar->tail = l;
return lvar;
} ...
Translation of Imperative Language Constructs Translation of Variables and Data Types
Dynamic Memory Management
Dynamic memory management
• is handled by runtime environment
• can be supported by compiler
• can partially be handled by user program
Runtime environment provides operations for dynamic memory management:
• for the programmer, e.g. in C malloc, calloc, realloc, free
• for the compiler as in Pascal, Java, Ada
• no memory deallocation by programer possible, but garbage collection by runtime environment e.g. in Java
Dynamic Memory Management (2)
General Problem: Provide memory blocks of different sizes from a linear memory and reuse memory after it has been freed
Simple memory management by linear list of free memory areas Structure of free memory area of variable length:
user data size
header freelist
Translation of Imperative Language Constructs Translation of Variables and Data Types
Dynamic Memory Management (3)
List of free memory areas:
user data size
header
free used
used free used
freelist
Procedure to allocate and deallocate memory:
• Allocate memory
I Search memory area B of appropriate size
I Update references:
• If area has exactly required size, remove it from list.
• Else update header of area, create header for rest of free memory and add this area instead of the old area to list.
Dynamic Memory Management (4)
I Return pointer to memory cell after header (size information has to be kept.)
I If no memory area of required size is found, new memory has to be requested from the OS
• Free memory
I Find header for memory area to be freed by pointer to this area
I If previous or next memory areas are free, join the areas
I Add resulting memory area to list
Translation of Imperative Language Constructs Translation of Variables and Data Types
Dynamic Memory Management (5)
Remarks:
• If program writes over assigned memory area, references or size information can be destroyed with bad consequences.
• If memory cannot be allocated in bytes, alignment restrictions have to be obeyed.
• For practical use the above principle can be improved by
I non linear search
I search for exact memory areas, avoiding defragmentation
I support for joining memory areas after deallocation
3.2.3 Translation of Expressions and Statements
Translation of Imperative Language Constructs Translation of Expressions and Statements
Translation of Expressions
Aspects for translation of expressions
• Management of intermediate results
• Translation of source language operations
I no counterpart in target language
I addressing
I context-dependent, e.g.: boolean expression in condition is handled differently from boolean expression in assignment
• Treatment of procedure calls (next subsection)
Translation of Statements
Aspects for translation of statements
• Translation of compound statements to jumps
• Generation of unique labels
• Treatment of procedure calls (next subsection)
Translation of Imperative Language Constructs Translation of Expressions and Statements
Example: Translation of simplified TOYC - fragment
Stmt = VarAssign | ArrAssign | Call | StmtList | If | While VarAssign ( UsedId uid, Exp rhs )
ArrAssign ( UsedId uid, Exp ixe, Exp rhs ) Call ( UsedId uid, ExpList )
StmtList * Stmt
If ( Exp c, Stmt then, Stmt else ) While ( Exp c, Stmt body )
Exp = ArtihmExp | Relation | BoolExp | IntConst
| ArrayAccess | VarExp ArithmExp = Add | Sub
Add, Sub ( Exp left, Exp right ) Relation = Lt | Eq
Lt, Eq ( Exp left, Exp right ) BoolExp = And | Or | Not
And, Or ( Exp left, Exp right ) Not ( Exp e )
Example: Translation of simplified TOYC - fragment (2)
In the first version, we consider a simplified translation where
• TOYC expressions are translated to SIRL expressions
• boolean expression for “and” (&&), “or” (||), “not” (!) are omitted
• the context of the expression is not considered The attributions for
IntConst ( int i )
ArrayAccess ( UsedId uid, Exp e ) UsedId ( Ident id )
are as described in the previous subsection.
For the extended attribution of VarExp ( UsedId uid )
and of the expressions: see lecture
Translation of Imperative Language Constructs Translation of Expressions and Statements
Discussion and extension
Shortcomings:
• bad code for conditionals
• we haven’t covered boolean expression
Approach to overcome shortcomings:
• context-dependent translation for relational expressions
• translation of non-strict boolean expression using jumps
Context-dependent translation
We illustrate a translation that uses context information by the translation of non-strict boolean expressions.
More precisely, we translate TOYC to SIRL’ where
SIRL’ = SIRL -{CJUMP(o,e1,e2,t,f)}+{CJUMP(e,label)} i.e., we use a different conditional jump instruction
Semantics: (similar to Piglet)
CJUMP(e,label) Jump ifeevaluates to 0; otherwise continue execution with next statement
Translation of Imperative Language Constructs Translation of Expressions and Statements
Context-dependent translation of TOYC
The translation of expressions has to distinguish two contexts:
• In contexts: If, While, BoolExp
generate a conditional jump (jcx =true:jump context)
• In contexts: VarAssign, ArrAssign, ArithmExp, Relation, ArrayAccess, or ExpList
generate a SIRL-expression that returns a value (jcx =false:
value context)
wherejcx is the inherited attribute denoting the kind of context.
Examples for context-dependent translation
Let’s assumea,b, andcare stored in temporariesTEMP(a),TEMP(b), andTEMP(c)resp.
Translate:
a = (a || b) && (c+1);
if( a && ((!b) || c) ) Stmt1 else Stmt2
Result: see lecture
Translation of Imperative Language Constructs Translation of Expressions and Statements
Attribute definitions
Attribution for VarExp
Translation of Imperative Language Constructs Translation of Expressions and Statements
Attribution for IntConst
Attribution for ArrayAccess
Translation of Imperative Language Constructs Translation of Expressions and Statements
Attribution for Not
Attribution for And
Translation of Imperative Language Constructs Translation of Expressions and Statements
Attribution for Or
Attribution for Add
Translation of Imperative Language Constructs Translation of Expressions and Statements
Attribution for Lt
Attribution for If
Translation of Imperative Language Constructs Translation of Expressions and Statements
Attribution for While
Attribution for VarAssign
Translation of Imperative Language Constructs Translation of Expressions and Statements
Attribution for ArrAssign
Recommended Reading:
• Wilhelm, Maurer: Sec. 2.4, pp. 12 –16
• Appel: Sec. 7.2
Translation of Imperative Language Constructs Simplifying the Intermediate Representation
3.2.4 Simplifying the Intermediate Representation
Goals and techniques
Goals:
• Simplification of the IR for later phases
• Translation of one IR language to another one Techniques:
• Attribute grammars
• Term rewriting:
I Define rules of how source patterns are replaced by target patterns
I Apply the rules as long as possible
Translation of Imperative Language Constructs Simplifying the Intermediate Representation
Example: Source language
Stmt = Move | CJump | Label | StmtList Move ( Temp Exp )
CJump ( Exp String ) Label ( String) StmtList * Stmt
Exp = Temp | BinExp | StmtExp Temp ( String )
BinExp ( Exp Exp )
StmtExp ( StmtList Exp )
Example: Target language
The simpler language has:
• no recursive expressions, simplified jump
• no statement expression
SStmt = SMove | SCJump | Label | SStmtList SMove ( Temp SExp )
SCJump ( SExp String ) SExp = Temp | SBinExp SBinExp ( Temp Temp )
Translation of Imperative Language Constructs Simplifying the Intermediate Representation
Example: Attributes
Idea:
• Statements have an attribute of typeSStmtList
• Expressions have an attribute of type
I SStmtList: for the statements needed for the evaluation
I SExp: for the expression evaluating the result
I String: for the generation of unique temporary names More precisely:
syn SStmt Stmt.code() syn SStmt Exp.code() syn SExp Exp.sexp()
syn String Exp.uniqueStr()
Example: Attribute rules
The following slides present the attribute rules forMove,CJump, StmtExp,BinExp.
The rules forStmtList,Label, andTempare straightforward.
Notation: We use
• @as infix operator to append two statement lists
• [e1, ...,en]to construct a list from elementse1, ...,en
Translation of Imperative Language Constructs Simplifying the Intermediate Representation
Attribution for
Attribution for
Translation of Imperative Language Constructs Simplifying the Intermediate Representation
Attribution for
Attribution for
Translation of Object-Oriented Language Constructs
3.3 Translation of Object-Oriented Language
Constructs
3.3.1 Concepts of Object-Oriented Programming
Languages
Translation of Object-Oriented Language Constructs Concepts of Object-Oriented Programming Languages
Concepts of Object-Oriented Programming Languages
We consider a class-based language and use Java as an example.
Important Concepts:
• Classes and Object Creation
• Encapsulation
• Subtyping and Inheritance
• Dynamic Method Binding
Example: Object-Oriented Language Concepts Beispiel: (objektorientierte Sprachkonzepte)
class Person { String name;
String name;
int gebdatum; /* in der Form JJJJMMTT */
Person( String n, int gd ) { name = n;
gebdatum = gd;
gebdatum gd;
}
public void drucken() {
System.out.println("Name:"+ this.name);
System.out.println("Geb:"+ this.gebdatum);
}
boolean hat_geburtstag ( int datum ) { return (this.gebdatum%10000) ==
(datum%10000);
} } }
class Student extends Person { int matrikelnr;
int semester;
int semester;
Student(String n,int gd,int mnr,int sem) { super( n, gd );
matrikelnr = mnr;
semester = sem;
semester sem;
}
public void drucken() { super.drucken();
System.out.println( "Mnr:"+ matrikelnr);
i
System.out.println( "Sem:" + semester);
} }
c
Arnd Poetzsch-Heffter Translation to Intermediate Representation 116
Translation of Object-Oriented Language Constructs Concepts of Object-Oriented Programming Languages
Example: Object-Oriented Language Concepts (2) Beispiel: (objektorientierte Sprachkonzepte)
class Person { String name;
String name;
int gebdatum; /* in der Form JJJJMMTT */
Person( String n, int gd ) { name = n;
gebdatum = gd;
gebdatum gd;
}
public void drucken() {
System.out.println("Name:"+ this.name);
System.out.println("Geb:"+ this.gebdatum);
}
boolean hat_geburtstag ( int datum ) { return (this.gebdatum%10000) ==
(datum%10000);
} } }
class Student extends Person { int matrikelnr;
int semester;
int semester;
Student(String n,int gd,int mnr,int sem) { super( n, gd );
matrikelnr = mnr;
semester = sem;
semester sem;
}
public void drucken() { super.drucken();
System.out.println( "Mnr:"+ matrikelnr);
i
System.out.println( "Sem:" + semester);
} }
Example: Object-Oriented Language Concepts (3)
class Test {
public static void main( String[] argv ) { int i;
Person[] pf = new Person[3];
Person[] pf new Person[3];
pf[0] = new Person( "Meyer", 19631007 );
pf[1] = new Student("M\"uller",19641223,758475,5);
pf[2] = new Student("Planck",18580423,3454545,47);
for( i = 0; i<3; i = i+1 ) { pf[i].drucken();
pf[i].drucken();
} } }
Das Beispiel zeigt Klassen, Objekterzeugung,
Vererbung (mit Subtyping und Spezialisierung) sowie Vererbung (mit Subtyping und Spezialisierung) sowie dynamisches Binden von Methoden.
Anhand des obigen Beispiels erläutern wir die
3.2.2 Umsetzung mit
prozeduralen Sprachen
Anhand des obigen Beispiels erläutern wir die grundlegenden Übersetzungsschemata:
Klassen, Klassentypen ! Verbundtypen, Zeigertypen Objekterzeugung ! Allokation dyn. Variablen/Objekte Objekterzeugung ! Allokation dyn. Variablen/Objekte Methoden, Konstruktoren ! Prozeduren
dyn. Bindung ! Verwendung von Prozedurzeigern mit Selektion von Verbundkomponenten mit Selektion von Verbundkomponenten Als Zielsprache verwenden wir hier C.
The example demonstrates classes, object creation, inheritance (with subtyping and specialization) and dynamic method binding.
c
Arnd Poetzsch-Heffter Translation to Intermediate Representation 118
Translation of Object-Oriented Language Constructs Translation into Procedural Languages
3.3.2 Translation into Procedural Languages
Translation into Procedural Languages
Translation Schemes:
• Classes, class types→record types, pointer types
• Object creation→Allocation of dynamic variables/objects
• Methods, constructors→procedures
• Dynamic binding→Use of procedure pointers with selection of record components
We illustrate these schemes at the above example. The considered target language is C.
Translation of Object-Oriented Language Constructs Translation into Procedural Languages
Translation of Types and Methods
• Basis data types of Java→basis data types of C, for example:
I int→int
I boolean→int
(typedef int boolean;)
• Reference types of Java→pointer types of C, for example:
I String→String*
I Person→Person*
where String and Person are record types in C.