• Keine Ergebnisse gefunden

3. Translation to Intermediate

N/A
N/A
Protected

Academic year: 2022

Aktie "3. Translation to Intermediate"

Copied!
197
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Compilers and Language Processing Tools

Summer Term 2013

Arnd Poetzsch-Heffter Annette Bieniusa

Software Technology Group TU Kaiserslautern

(2)

Content of Lecture

1. Introduction

2. Syntax and Type Analysis 2.1 Lexical Analysis

2.2 Context-Free Syntax Analysis

2.3 Context-Dependent Analysis (Semantic Analysis) 3. Translation to Intermediate Representation

3.1 Languages for Intermediate Representation 3.2 Translation of Imperative Language Constructs 3.3 Translation of Object-Oriented Language Constructs 3.4 Translation of Procedures

4. Optimization and Code Generation 4.1 Assembly and Machine Code 4.2 Optimization

4.3 Register Allocation 4.4 Further Aspects

(3)

Content of Lecture (2)

5. Selected Topics in Compiler Construction 5.1 Just-in-time Compilation

5.2 Garbage Collection

5.3 XML Processing (DOM, SAX, XSLT)

(4)

3. Translation to Intermediate

Representation

(5)

Translation of Imperative Language Constructs

3.1 Translation of Imperative Language Constructs

(6)

Introduction

Focus:

Differences between source languages and target languages/target machines

Most important translation techniques for different programing paradigms (procedural/object-oriented)

Learning Objectives:

Overview of imperative and procedural language constructs

Languages for intermediate representation

Translation of object-oriented language constructs

Translation techniques for procedural language constructs

(7)

Translation of Imperative Language Constructs

Language Constructs of Procedural Languages

From a conceptional and semantical view point, procedural languages have the following constructs:

Domains with operations (often typed)

I pre-defined: int, boolean, ...

I user-defined: records, classes, ...

I implicitly defined: field types, address types, function types

Variables

I simple and compound types

I global, local, statically/dynamically allocated

I define memory state

Expressions

I computation of values with implicit intermediate results

I possibly in combination with execution control and state modification

(8)

Language Constructs of Procedural Languages (2)

Statements

I simple and combined statements

I define execution control and state modification

Procedures

I abstraction of parametrized statements

I may be recursive

I may be nested

Modules usually do not have a semantic meaning and are only relevant for the name analysis and for binding and loading.

(9)

Translation of Imperative Language Constructs

Nested Procedures

Example from [Wilhelm, Maurer; Fig. 2.9]

Übersetzung geschachtelter Prozeduren Geschachtelte/lokale Prozeduren werden z.B.

von Pascal und Ada unterstützt

Beispiel: (geschachtelte Prozeduren) von Pascal und Ada unterstützt.

proc P(a) var b

Abb. 2.9)

var b var c proc Q

var a proc R

elm/Maurer,

var b begin ... b ...

... a ...

c

mt aus Wilhe

... c ...

end begin ... a ...

... b ...

spiel stamm

... call Q ...

end proc S

var a begin

(das Beis

begin ... a ...

... call Q ...

end begin ... a ...

... call Q ...

end

(10)

3.1 Languages for Intermediate Representation

(11)

Languages for Intermediate Representation

Motivation

We could go directly from the AST to machine code, but ...

Java C C++

Pascal ML

Sparc MIPS Pentium

Java C C++

Pascal ML

IR

Sparc MIPS Pentium

Intermediate representation

front end: lexical analysis, parsing, semantic analysis

back end: machine specific optimization, translation to machine language

intermediate code: machine and language independent optimization

(12)

Specifics of intermediate representation

A good IR is

convenient to produce from source language

convenient to translate into machine language

small, with clear and simple semantics Design of IR:

IR languages are comparable to data structures in algorithm design, i.e., for each task, an intermediate language is more or less suitable.

Intermediate languages can conceptually be seen as abstract machines.

(13)

Languages for Intermediate Representation

Typical differences: Source language vs. IR

Data types and memory:

array and field deferencing vs. load/store on heap or stack

Expressions:

simpler

Statements:

compound statements vs. (conditional) jumps

Method calls:

I various number of arguments vs. simple call

I explicit management of recursion (→stack frames)

(14)

3.1.1 SIRL: A simple IR language

(15)

Languages for Intermediate Representation SIRL: A simple IR language

SIRL: Introduction

SIRL is very similar to the IR language Piglet of compiler project Data types and memory:

Values in SIRL are integers and addresses

SIRL programs work on a byte-addressable memory

(16)

SIRL: Expressions

CONST(i) integer constanti or addressi NAME(n) symbolic constantn[code label]

TEMP(t) temporaryt, similar to a machine register BINOP(o,e1,e2) binary operatorowith operandse1ande2 MEM(e) contents of a word of memory at addresse CALL(f,[e1, . . . ,en]) procedure call

ESEQ(s,e) statement expression; execute statements for side-effects, expressionefor result

(17)

Languages for Intermediate Representation SIRL: A simple IR language

SIRL: Operators

Binary arithmetic and logical operators:

PLUS, MINUS, MUL, DIV integer arithmetic operators AND, OR, XOR integer bitwise logical operators LSHIFT, RSHIFT integer logical shift operators ARSHIFT integer arithmetic right-shift Relational operators:

EQ, NE integer equality and non-equality (for both signed and unsigned)

LT, GT, LE, GE integer inequalities (signed) ULT, UGT, ULE, UGE integer inequalities (unsigned)

(18)

SIRL: Statements

MOVE(TEMP(t),e) Evaluateeand move it intot.

MOVE(MEM (e1),e2) Evaluatee1yielding addressa; evaluatee2 and move it intoa.

EXP(e) Evaluateeand discard result.

JUMP(e,[l1, . . . ,ln]) Transfer control (jump) to address e;

l1, . . . ,ln are all possible values for e. Of- ten used: JUMP(l).

CJUMP(o,e1,e2,t,f) Evaluatee1, thene2; compare their results using relational operatoro. If true, jump to labelt, else jump to labelf.

SEQ(s1,s2) Statements1followed by statements2. LABEL(n) Define constant value of name n as current

code address. NAME(n) can then be used as targets of jumps, calls, etc.

NOOP skip statement

(19)

Languages for Intermediate Representation SIRL: A simple IR language

SIRL: Program structure

Program ::= MAIN StmtList END Procedure*

StmtList ::= ( Label? Stmt )*

Procedure ::= Label [ IntLiteral ] ESEQ whereStmt,ESEQ,Labelare as defined above

(20)

Examples

Translate the following MiniJava statements to SIRL:

1. if (x < y) x = y; else x = 0;

2. y = z[5];

(21)

Languages for Intermediate Representation SIRL: A simple IR language

Examples

1. if (x < y) x = y; else x = 0;

Assume,x corresponds to TEMP 5, y corresponds to TEMP 27.

Define three (new) label namesL1,L2, andL3.

CJUMP (LT, TEMP 5, TEMP 27, L1, L2) L1 MOVE (TEMP 5, TEMP 27)

JUMP L3

L2 MOVE (TEMP 5, CONST 0) L3 ...

(22)

Examples

2. y = z[5];

Assumey corresponds to TEMP 27, and the base address of arrayz isa.

Letw be the word size of MiniJava (e.g. 4 bytes).

Calculate the offset for the array at index 5

MOVE (TEMP 27, MEM (+(a, *(CONST 5, CONST w)))) Here, we useo(e1,e2)as abbreviation forBINOP(o,e1,e2).

(23)

Languages for Intermediate Representation 3-Address Code

3.1.2 3-Address Code

(24)

3-address code

3-address code (3AC) is a common intermediate language with many variants.

Properties:

only elementary data types (but often arrays)

no nested expressions

sequential execution, jumps and procedure calls as statements

named variables as in a high level language

unbounded number of temporary variables

(25)

Languages for Intermediate Representation 3-Address Code

3-address code (2)

A program in 3AC consists of

a list of global variables

a list of procedures with parameters and local variables

a main procedure

each procedure has a sequence of 3AC commands as body

(26)

3AC commands

Syntax Explanation

x := y bop z x : = uop z x:= y

x: variable (global, local, parameter, temporary) y,z: variable or constant

bop: binary operator uop: unary operator goto L

if x cop y goto L

jump or conditional jump to label L cop: comparison operator

only procedure-local jumps x:= a[i]

a[i]:= y a one-dimensional array x : = & a

x:= *y

*x := y

a global, local variable or parameter

& a address of a

* dereferencing operator

(27)

Languages for Intermediate Representation 3-Address Code

3AC commands (2)

Syntax Explanation

param x call p return y

call p(x1, ..., xn) is encoded as:

(block is considered as one command)

param x1 ...

param xn call p

return y causes jump to return address with (optional) result y

We assume that 3AC only contains labels for which jumps are used in the program.

(28)

Basic blocks

A sequence of 3AC commands can be uniquely partitioned into basic blocks.

Abasic block Bis a maximal sequence of commands such that

at the end of B, exactly one jump, procedure call, or return command occurs

labels only occur at the first command of a basic block

(29)

Languages for Intermediate Representation 3-Address Code

Basic blocks (2)

Remarks:

The commands of a basic block are always executed sequentially, there are no jumps to the inside

Often, a designated exit-block for a procedure containing the return jump at its end is required. This is handled by additional transformations.

The transitions between basic blocks are often denoted by flow charts.

(30)

Example: 3AC and basic blocks

Consider the following C program:

Beispiel: (3AC und Basisblöcke)

Wir betrachten den 3AC für ein C-Programm:

int a[2];

int b[7];

int skprod(int i1, int i2, int lng) {... } int main( ) {

a[0] = 1; a[1] = 2;

b[0] = 4; b[1] = 5; b[2] = 6;

skprod(0 1 2);

skprod(0,1,2);

return 0;

}

3AC mit Basisblockzerlegung für die Prozedur main:

main:

a[0] := 1 a[0] := 1 a[1] := 2 b[0] := 4 b[1] := 5 b[2] := 6 param 0 param 1 param 2 call skprod return 0

c

Arnd Poetzsch-Heffter Translation to Intermediate Representation 30

(31)

Languages for Intermediate Representation 3-Address Code

Example: 3AC and basic blocks (2)

3AC with basic block partitioning for main procedure Beispiel: (3AC und Basisblöcke) Wir betrachten den 3AC für ein C-Programm:

int a[2];

int b[7];

int skprod(int i1, int i2, int lng) {... } int main( ) {

a[0] = 1; a[1] = 2;

b[0] = 4; b[1] = 5; b[2] = 6;

skprod(0 1 2);

skprod(0,1,2);

return 0;

}

3AC mit Basisblockzerlegung für die Prozedur main:

main:

a[0] := 1 a[0] := 1 a[1] := 2 b[0] := 4 b[1] := 5 b[2] := 6 param 0 param 1 param 2 call skprod

28.06.2007 © A. Poetzsch-Heffter, TU Kaiserslautern 296

return 0

(32)

Example: 3AC and basic blocks (3)

Procedure skprod:Prozedur skprod mit 3AC und Basisblockzerlegung:

int skprod(int i1, int i2, int lng) { int ix, res = 0;

for( ix=0; ix <= lng-1; ix++ ){

res += a[i1+ix] * b[i2+ix];

}

skprod:

}

return res;

}

res:= 0 ix := 0

t0 := lng-1 if ix<=t0

true false

t1 := i1+ix t2 := a[t1]

t1 := i2+ix t3 := b[t1]

t1 := t2*t3

return res t1 := t2*t3

res:= es+t1 ix := ix+1

c

Arnd Poetzsch-Heffter Translation to Intermediate Representation 32

(33)

Languages for Intermediate Representation 3-Address Code

Example: 3AC and basic blocks (4)

Procedure skprod as 3AC with basic blocks

Prozedur skprod mit 3AC und Basisblockzerlegung:

int skprod(int i1, int i2, int lng) { int ix, res = 0;

for( ix=0; ix <= lng-1; ix++ ){

res += a[i1+ix] * b[i2+ix];

}

skprod:

}

return res;

}

res:= 0 ix := 0

t0 := lng-1 if ix<=t0

true false

t1 := i1+ix t2 := a[t1]

t1 := i2+ix t3 := b[t1]

t1 := t2*t3

return res t1 := t2*t3

res:= es+t1 ix := ix+1

28.06.2007 © A. Poetzsch-Heffter, TU Kaiserslautern 297

(34)

Intermediate Language Variations

3 AC after elimination of array operations (at above example)

Variation im Rahmen einer Zwischensprache:

3-Adress-Code nach Elimination von Feldoperationen anhand des obigen Beispiels:

skprod:p

res:= 0 ix := 0

t0 := lng-1 if ix<=t0

t1 := i1+ix tx := t1*4 ta := a+tx

true false

return res t2 := *ta

t1 := i2+ix tx := t1*4 tb := b+tx t3 *tb t3 := *tb t1 := t2*t3 res:= res+t1 ix := ix+1

(35)

Languages for Intermediate Representation 3-Address Code

Characteristics of 3-Address Code

Control flow is explicit.

Only elementary operations

Rearrangement and exchange of commands can be handled relatively easily.

(36)

3.1.3 Other Intermediate Languages

(37)

Languages for Intermediate Representation Other Intermediate Languages

Further Intermediate Languages

We consider

3AC inStatic Single Assignment (SSA)representation

Stack Machine Code

(38)

Single Static Assignment Form

If a variableais read at a program position, this is auseofa.

If a variableais written at a program position, this is adefinitionofa.

For optimizations, the relationship between use and definition of variables is important.

In SSA representation, each variable has exactly one definition. Thus, relationship between use and definition in the intermediate language is explicit.

(39)

Languages for Intermediate Representation Other Intermediate Languages

Single Static Assignment Form (2)

SSA is essentially a refinement of 3AC.

The different definitions of one variable are represented by indexing the variable.

For sequential command lists, this means that

at each definition position, the variable gets a different index.

at the use position, the variable has the index of its last definition.

(40)

Languages for Intermediate Representation Other Intermediate Languages

Example: SSA

In SSA-Repräsentation besitzt jede Variable genau eine Definition. Dadurch wird der Zusammenhang

ischen An end ng nd Definition in der zwischen Anwendung und Definition in der Zwischensprache explizit, d.h. eine zusätzliche def-use-Verkettung oder use-def-Verkettung wird unnötig.

SSA ist im Wesentlichen eine Verfeinerung von 3AC.

Die Unterscheidung zwischen den Definitionsstellen wird häufig durch Indizierung der Variablen dargestellt wird häufig durch Indizierung der Variablen dargestellt.

Für sequentielle Befehlsfolgen bedeutet das:

• An jeder Definitionsstelle bekommt die Variable einen anderen Index

einen anderen Index.

• An der Anwendungsstelle wird die Variable mit dem Index der letzten Definitionsstelle notiert.

a := x + y Beispiel:

a := x + y

1 0 0

b := a – 1

a := y + b b := x * 4 a := a + b

b := a - 1 a := y + b b := x * 4 a := a + b

1 1

2 2

0

0 1

28.06.2007 © A. Poetzsch-Heffter, TU Kaiserslautern 300

a := a + b a := a + b

3 2 2

(41)

Languages for Intermediate Representation Other Intermediate Languages

SSA - Join Points of Control Flow

At join points of control flow, an additional mechanism is required:

An Stellen, an denen der Kontrollfluß zusammen- führt bedarf es eines zusätzlichen Mechanismus:

führt, bedarf es eines zusätzlichen Mechanismus:

3 2 2

a := x + y

1 0 0

a := a – b

b := a

3

?

...

Einführung der fiktiven Orakelfunktion“ ! die Einführung der fiktiven „Orakelfunktion ! , die

quasi den Wert der Variable im zutreffenden Zweig auswählt:

3 2 2

a := x + y

1 0 0

a := a – b

a := ! (a ,a ) b := a

34 4 1 3

...

c

Arnd Poetzsch-Heffter Translation to Intermediate Representation 41

(42)

Languages for Intermediate Representation Other Intermediate Languages

SSA - Join Points of Control Flow (2)

Introduce an "oracle"Φthat selects the value of the variable of the use branch:

An Stellen, an denen der Kontrollfluß zusammen- führt bedarf es eines zusätzlichen Mechanismus:

führt, bedarf es eines zusätzlichen Mechanismus:

3 2 2

a := x + y

1 0 0

a := a – b

b := a

3 ?

...

Einführung der fiktiven Orakelfunktion“ ! die Einführung der fiktiven „Orakelfunktion ! , die quasi den Wert der Variable im zutreffenden Zweig auswählt:

3 2 2

a := x + y

1 0 0

a := a – b

a := ! (a ,a )

b := a

34 4 1 3

...

(43)

Languages for Intermediate Representation Other Intermediate Languages

SSA - Remarks

The construction of an SSA representation with a minimal number of applications of theΦoracle is a non-trivial task.

(cf. Appel, Sect. 19.1. and 19.2)

The termsingle static assignmentform reflects that for each variable in the program text, there is only one assignment.

Dynamically, a variable in SSA representation can be assigned arbitrarily often (e.g., in loops).

(44)

Further intermediate languages

While 3AC and SSA representation are mostly used as intermediate languages in compilers, intermediate languages and abstract

machines are more and more often used as connections between compilers and runtime environments.

Java Byte Code and CIL (Common Intermediate Language, cf. .NET) are examples for stack machine code, i.e., intermediate results are stored on a runtime stack.

Further intermediate languages are, for instance, used for optimizations.

(45)

Languages for Intermediate Representation Other Intermediate Languages

Stack machine code as intermediate language

Homogeneous scenario for Java:Sprachlich homogenes Szenario bei Java:

C1.java

C2.java jikes C1.class

C2 class

Java ByteCode

C2.java

C3.java javac2

C2.class C3.class

JVM

Sprachlich ggf. inhomogenes Szenario bei .NET:

Programme

Intermediate

C# - C il

prog1.cs prog1.il

verschiedener Hochsprachen

Intermediate Language

Compiler

prog2.cs prog2.il

prog3.il

CLR Haskell -

Compiler prog3.hs

Java-ByteCode und die MS-Intermediate Language sind Beispiele für Kellermaschinencode, d.h.

Z i h b i d f i L f itk ll

Zwischenergebnisse werden auf einem Laufzeitkeller verwaltet.

c

Arnd Poetzsch-Heffter Translation to Intermediate Representation 45

(46)

Languages for Intermediate Representation Other Intermediate Languages

Stack machine code as intermediate language (2)

Inhomogeneous scenario for .NET:

Sprachlich homogenes Szenario bei Java:

C1.java

C2.java jikes C1.class

C2 class

Java ByteCode

C2.java

C3.java javac2

C2.class C3.class

JVM

Sprachlich ggf. inhomogenes Szenario bei .NET:

Programme

Intermediate

C# - C il

prog1.cs prog1.il

verschiedener Hochsprachen

Intermediate Language

Compiler

prog2.cs prog2.il

prog3.il

CLR Haskell -

Compiler prog3.hs

Java-ByteCode und die MS-Intermediate Language sind Beispiele für Kellermaschinencode, d.h.

Z i h b i d f i L f itk ll

Zwischenergebnisse werden auf einem Laufzeitkeller verwaltet.

c

Arnd Poetzsch-Heffter Translation to Intermediate Representation 46

(47)

Languages for Intermediate Representation Other Intermediate Languages

Example: Stack machine code

Beispiel: (Kellermaschinencode)

package beisp;

class Weltklasse extends Superklasse implements BesteBohnen { Qualifikation studieren ( Arbeit schweiss){

return new Qualifikation();

} } }

Compiled from Weltklasse.java

class beisp Weltklasse extends beisp Superklasse class beisp.Weltklasse extends beisp.Superklasse

implements beisp.BesteBohnen{

beisp.Weltklasse();

beisp.Qualifikation studieren( beisp.Arbeit);

}

Method beisp.Weltklasse() 0 aload_0

1 invokespecial #6 <Method beisp.Superklasse()>

4 return

Method beisp.Qualifikation studieren( beisp.Arbeit ) 0 new #2 <Class beisp.Qualifikation>

3 dup

4 invokespecial #5 <Method beisp.Qualifikation()>

7 areturn 7 areturn

Bemerkung:

Weitere Zwischensprachen werden insbesondere auch Weitere Zwischensprachen werden insbesondere auch im Zusammenhang mit Optimierungen eingesetzt.

c

Arnd Poetzsch-Heffter Translation to Intermediate Representation 47

(48)

Languages for Intermediate Representation Other Intermediate Languages

Example: Stack machine code (2)

Beispiel: (Kellermaschinencode)

package beisp;

class Weltklasse extends Superklasse implements BesteBohnen { Qualifikation studieren ( Arbeit schweiss){

return new Qualifikation();

} } }

Compiled from Weltklasse.java

class beisp Weltklasse extends beisp Superklasse class beisp.Weltklasse extends beisp.Superklasse

implements beisp.BesteBohnen{

beisp.Weltklasse();

beisp.Qualifikation studieren( beisp.Arbeit);

}

Method beisp.Weltklasse() 0 aload_0

1 invokespecial #6 <Method beisp.Superklasse()>

4 return

Method beisp.Qualifikation studieren( beisp.Arbeit ) 0 new #2 <Class beisp.Qualifikation>

3 dup

4 invokespecial #5 <Method beisp.Qualifikation()>

7 areturn 7 areturn

Bemerkung:

Weitere Zwischensprachen werden insbesondere auch Weitere Zwischensprachen werden insbesondere auch im Zusammenhang mit Optimierungen eingesetzt.

(49)

Translation of Imperative Language Constructs

3.2 Translation of Imperative Language Constructs

(50)

3.2.1 Basic Concepts and Memory Organization

(51)

Translation of Imperative Language Constructs Basic Concepts and Memory Organization

Introduction

Difficulties of learning about translation:

Translation is source language dependent

Translation is target language dependent Explanation approach:

Basic concepts in detail using

I a TOYC procedural language as source

I SIRL as target language

Other language features in a less detailed manner

(52)

TOYC - a sublanguage of C (Decls & Statements)

Program ( GlobDeclList ) GlobDeclList * GlobDecl GlobDecl = Var | Array | Proc Var ( Ident id )

Array ( Ident id, int size )

Proc ( Ident id, ParamList parl, LocDeclList ldl, Stmt body ) ParamrList * Param

Param ( Ident id ) LocDeclList * Var

Stmt = VarAssign | ArrAssign | Call | StmtList | If | While VarAssign ( UsedId uid, Exp rhs )

ArrAssign ( UsedId uid, Exp ixe, Exp rhs ) Call ( UsedId uid, ExpList )

StmtList * Stmt

If ( Exp c, Stmt then, Stmt else ) While ( Exp c, Stmt body )

(53)

Translation of Imperative Language Constructs Basic Concepts and Memory Organization

TOYC - a sublanguage of C (Expressions)

Exp = ArtihmExp | Relation | BoolExp | IntConst

| ArrayAccess | VarExp ArithmExp = Add | Sub

Add, Sub ( Exp left, Exp right ) Relation = Lt | Eq

Lt, Eq ( Exp left, Exp right ) BoolExp = And | Or | Not

And, Or ( Exp left, Exp right ) Not ( Exp e )

IntConst ( int i )

ArrayAccess ( UsedId uid, Exp e ) VarExp ( UsedId uid )

ExpList * Exp

UsedId ( Ident id )

(54)

TOYC - Context conditions

Every TOYC program declares

an arrayinputcontaining the program input at the start of program

an arrayoutputfor the program results

a parameterless proceduremain

(55)

Translation of Imperative Language Constructs Basic Concepts and Memory Organization

Translation of TOYC to SIRL

Main aspects of the translation:

mapping global variables, arrays, parameters, local variables to the “storing facilities” of SIRL:

I main memory

I temporaries

realizing array accesses by address computations

translation of expressions

translation of statements

translation of procedures

(56)

Memory organization - basic ideas

Memory is organized into segments:

static data: for global variables and arrays (this segment also stores constant strings and other static data)

stack: because of recursion, actual parameters, local variables and arrays need storage for every activeprocedure incarnation:

I stack grows with a call: a newstack frameis pushed

I stack shrinks after a call: last stack frame is popped (managed by SIRL’s CALL expression)

heap: for dynamically allocated variables (not needed for TOYC)

code: for storing the program code (not needed for SIRL)

(57)

Translation of Imperative Language Constructs Basic Concepts and Memory Organization

Memory organization - example

stack

free memory

heap static data

code

high address

low address

(58)

Layout of typical stack frame

For procedures with results, additional memory is needed (where?)

(59)

Translation of Imperative Language Constructs Basic Concepts and Memory Organization

Memory organization for TOYC to SIRL

global variables and arrays are stored in memory

parameters and local variable are stored in temporaries;

procedure withnparameters uses

I TEMP(0),..., TEMP(n1) for the parameters

I TEMP(n), ... for the local variables

CALL expression implicitly manages the different copies of the temporaries

(60)

3.2.2 Translation of Variables and Data Types

(61)

Translation of Imperative Language Constructs Translation of Variables and Data Types

Translation of variables and data types

The translation of variables and data types comprises:

handling of primitive data types

conversion of data types (e.g. int→float)

memory organisation

translation of arrays

translation of records and classes

implementation of dynamic objects

(62)

Primitive data types

Usually, the primitive data types of source languages are supported by the target language:

int, long→4 byte word with integer arithmetic

float, double→accordingly

Potentially, data types have to be encoded:

boolean→1 byte or 4 byte words

Problem, if target language does not comply to requirements of source language, e.g.

floating point arithmetic is not handled according to IEEE standard

overflows are not dealt with correctly (cmp. Java FP-strict expressions)

operations for conversion are missing on target machine

(63)

Translation of Imperative Language Constructs Translation of Variables and Data Types

Primitive data types in TOYC

TOYC only supports four byte integers as primitive values

trueandfalseare handled by 1 and 0

(64)

Translation of arrays

Efficient translation of arrays is important for many tasks.

One-dimensional static arrays

Allocate memory in the segment for static data (starting atsd)

Address computation with base address of array, index of array element and size of element type

Consider the array declarationT tarr[57]:

LetRrel contain the relative address of the arraytarr

LetRi contain the indexiof the array component

Ifk =sizeof(T), then the address oftarr[i]issd+Rrel +k ∗Ri.

(65)

Translation of Imperative Language Constructs Translation of Variables and Data Types

Translation of arrays for TOYC

every array in TOYC has an attributebaseAddr holding its base addresssd+Rrel

an accessmyarr[i]to an arraymyarr is translated to MEM (+(baseAddr(myarr), *(e(i), CONST 4)))) wheree(i)is the expression fori

Code example for array access in SIRL:see lecture

Translation example from TOYCarray to SIRL:see lecture

(66)

TOYCarray - a sublanguage of TOYC

Program ( GlobDeclList ) GlobDeclList * GlobDecl GlobDecl = Var | Array | Proc Var ( Ident id )

Array ( Ident id, int size ) Proc ( Ident id, StmtList body ) Stmt = VarAssign | ArrAssign VarAssign ( UsedId uid, Exp rhs )

ArrAssign ( UsedId uid, Exp ixe, Exp rhs ) StmtList * Stmt

Exp = IntConst | ArrayAccess | VarExp IntConst ( int i )

ArrayAccess ( UsedId uid, Exp e ) VarExp ( UsedId uid )

UsedId ( Ident id )

(67)

Translation of Imperative Language Constructs Translation of Variables and Data Types

More about translation of arrays

Multi-dimensional static arrays

Consider as example the Pascal declaration var a:array[-5..5][1..9] of integer;

which corresponds to 99 integer variables:

a[-5, 1] ... a[-5,9]

...

a[5,1] ... a[5,9]

Matrix is stored in rows in memory. Storing in rows is more efficient than storing columns as second index is often incremented in inner loops.

(68)

More about translation of arrays(2)

Translation of access toa[E1,E2]:

Assume results of evaluating E1 and E2 are stored int1 andt2.

Asais a static array, we know the dimensions at compile time.

a[t1,t2] is the r-th component of a linear array with r = (t1−(−5))∗((9−1) +1) + (t2−1)

= 9∗t1+45+t2−1

= 9∗t1+t2+44 Translation:

Store the address of the 44-th component as base address of the array in symbol table. Then it suffices to add 9∗t1+t2 to base address.

(69)

Translation of Imperative Language Constructs Translation of Variables and Data Types

General Translation of Arrays

General array declaration of dimension k

var a: array [u1..o1], ...., [uk..uk] of T;

Storing rows yields the following adress for accessing a[R1, ..., Rk]:

r = (R1−u1)∗size(array[u2..o2, ...,uk..ok]of T) + (R2−u2)∗size(array[u3..o3, ...,uk..ok]of T)

+ . . .

+ (Rk−uk)∗size(T)

(70)

General Translation of Arrays (2)

Fori=1, . . . ,k −1, it holds that

size(i) :=size(array[u{i+1}..o{i+1}, ...,uk..ok]of T) size(k) =size(T)

This implies

size(i−1) =size(i)∗(oi−ui+1) Simplification yields:

r =

k

X

i=1

Ri∗size(i)−

k

X

i=1

ui∗size(i)

At runtime, only the first summand has to be computed for which code has to be generated.

(71)

Translation of Imperative Language Constructs Translation of Variables and Data Types

Array Access

Remarks:

Computation of array indices offers great potential for optimizations.

For translation of dynamic arrays, addressing has to be generalized appropriately. (cf. Wilhelm/Maurer, Sect. 2.6.2)

(72)

Translation of Records

Translation of records is similar to translation of arrays:

Determine size and memory layout

Compute adresses for selection of record components and pointer dereferencing

Translation of record operations, e.g. assignments to record components

Recommended Reading: Wilhelm, Maurer, Section 2.6.2

(73)

Translation of Imperative Language Constructs Translation of Variables and Data Types

Implementation of Dynamic Objects

Dynamic objects = dynamically allocated variables and objects in the sense of OO programing

Dynamic objects are stored on the heap:

number/size of dynamic objects is not known at compile time, objects are created at runtime

dynamic objects often have a designated lifetime which disallows handling them on the stack

Memory representation and addressing of components is similar to static records.

(74)

Translation of Imperative Language Constructs Translation of Variables and Data Types

Implementation of Dynamic Objects (2)

Example:

Implementierung dynamischer Objekte

Dynamische Objekte werden hier als Sammelbegriff für Dynamische Objekte werden hier als Sammelbegriff für dynamisch allozierte Variable und Objekte im Sinne der OO-Programmierung verwendet.

Dynamische Objekte werden auf der Halde verwaltet:

Dynamische Objekte werden auf der Halde verwaltet:

• Ihre Anzahl ist im Allg. zur Übersetzungszeit nicht bekannt. Deshalb werden sie erst zur Laufzeit erzeugt.

• Sie haben eine Lebensdauer die eine kellerartigeSie haben eine Lebensdauer, die eine kellerartige Behandlung im Allg. nicht zulässt.

Beispiel: (dynamische Objekte) Beispiel: (dynamische Objekte)

typedef struct listelem { int head;

struct listelem* tail; }* list;

# define listelemSIZE sizeof(struct listelem{

int h; struct listelem* t;}) list append( int i list l ) {

list append( int i, list l ) {

list lvar = (list) calloc(1,listelemSIZE);

lvar->head = i;

lvar->tail = l;

return lvar;

} ...

(75)

Translation of Imperative Language Constructs Translation of Variables and Data Types

Dynamic Memory Management

Dynamic memory management

is handled by runtime environment

can be supported by compiler

can partially be handled by user program

Runtime environment provides operations for dynamic memory management:

for the programmer, e.g. in C malloc, calloc, realloc, free

for the compiler as in Pascal, Java, Ada

no memory deallocation by programer possible, but garbage collection by runtime environment e.g. in Java

(76)

Dynamic Memory Management (2)

General Problem: Provide memory blocks of different sizes from a linear memory and reuse memory after it has been freed

Simple memory management by linear list of free memory areas Structure of free memory area of variable length:

user data size

header freelist

(77)

Translation of Imperative Language Constructs Translation of Variables and Data Types

Dynamic Memory Management (3)

List of free memory areas:

user data size

header

free used

used free used

freelist

Procedure to allocate and deallocate memory:

Allocate memory

I Search memory area B of appropriate size

I Update references:

If area has exactly required size, remove it from list.

Else update header of area, create header for rest of free memory and add this area instead of the old area to list.

(78)

Dynamic Memory Management (4)

I Return pointer to memory cell after header (size information has to be kept.)

I If no memory area of required size is found, new memory has to be requested from the OS

Free memory

I Find header for memory area to be freed by pointer to this area

I If previous or next memory areas are free, join the areas

I Add resulting memory area to list

(79)

Translation of Imperative Language Constructs Translation of Variables and Data Types

Dynamic Memory Management (5)

Remarks:

If program writes over assigned memory area, references or size information can be destroyed with bad consequences.

If memory cannot be allocated in bytes, alignment restrictions have to be obeyed.

For practical use the above principle can be improved by

I non linear search

I search for exact memory areas, avoiding defragmentation

I support for joining memory areas after deallocation

(80)

3.2.3 Translation of Expressions and Statements

(81)

Translation of Imperative Language Constructs Translation of Expressions and Statements

Translation of Expressions

Aspects for translation of expressions

Management of intermediate results

Translation of source language operations

I no counterpart in target language

I addressing

I context-dependent, e.g.: boolean expression in condition is handled differently from boolean expression in assignment

Treatment of procedure calls (next subsection)

(82)

Translation of Statements

Aspects for translation of statements

Translation of compound statements to jumps

Generation of unique labels

Treatment of procedure calls (next subsection)

(83)

Translation of Imperative Language Constructs Translation of Expressions and Statements

Example: Translation of simplified TOYC - fragment

Stmt = VarAssign | ArrAssign | Call | StmtList | If | While VarAssign ( UsedId uid, Exp rhs )

ArrAssign ( UsedId uid, Exp ixe, Exp rhs ) Call ( UsedId uid, ExpList )

StmtList * Stmt

If ( Exp c, Stmt then, Stmt else ) While ( Exp c, Stmt body )

Exp = ArtihmExp | Relation | BoolExp | IntConst

| ArrayAccess | VarExp ArithmExp = Add | Sub

Add, Sub ( Exp left, Exp right ) Relation = Lt | Eq

Lt, Eq ( Exp left, Exp right ) BoolExp = And | Or | Not

And, Or ( Exp left, Exp right ) Not ( Exp e )

(84)

Example: Translation of simplified TOYC - fragment (2)

In the first version, we consider a simplified translation where

TOYC expressions are translated to SIRL expressions

boolean expression for “and” (&&), “or” (||), “not” (!) are omitted

the context of the expression is not considered The attributions for

IntConst ( int i )

ArrayAccess ( UsedId uid, Exp e ) UsedId ( Ident id )

are as described in the previous subsection.

For the extended attribution of VarExp ( UsedId uid )

and of the expressions: see lecture

(85)

Translation of Imperative Language Constructs Translation of Expressions and Statements

Discussion and extension

Shortcomings:

bad code for conditionals

we haven’t covered boolean expression

Approach to overcome shortcomings:

context-dependent translation for relational expressions

translation of non-strict boolean expression using jumps

(86)

Context-dependent translation

We illustrate a translation that uses context information by the translation of non-strict boolean expressions.

More precisely, we translate TOYC to SIRL’ where

SIRL’ = SIRL -{CJUMP(o,e1,e2,t,f)}+{CJUMP(e,label)} i.e., we use a different conditional jump instruction

Semantics: (similar to Piglet)

CJUMP(e,label) Jump ifeevaluates to 0; otherwise continue execution with next statement

(87)

Translation of Imperative Language Constructs Translation of Expressions and Statements

Context-dependent translation of TOYC

The translation of expressions has to distinguish two contexts:

In contexts: If, While, BoolExp

generate a conditional jump (jcx =true:jump context)

In contexts: VarAssign, ArrAssign, ArithmExp, Relation, ArrayAccess, or ExpList

generate a SIRL-expression that returns a value (jcx =false:

value context)

wherejcx is the inherited attribute denoting the kind of context.

(88)

Examples for context-dependent translation

Let’s assumea,b, andcare stored in temporariesTEMP(a),TEMP(b), andTEMP(c)resp.

Translate:

a = (a || b) && (c+1);

if( a && ((!b) || c) ) Stmt1 else Stmt2

Result: see lecture

(89)

Translation of Imperative Language Constructs Translation of Expressions and Statements

Attribute definitions

(90)

Attribution for VarExp

(91)

Translation of Imperative Language Constructs Translation of Expressions and Statements

Attribution for IntConst

(92)

Attribution for ArrayAccess

(93)

Translation of Imperative Language Constructs Translation of Expressions and Statements

Attribution for Not

(94)

Attribution for And

(95)

Translation of Imperative Language Constructs Translation of Expressions and Statements

Attribution for Or

(96)

Attribution for Add

(97)

Translation of Imperative Language Constructs Translation of Expressions and Statements

Attribution for Lt

(98)

Attribution for If

(99)

Translation of Imperative Language Constructs Translation of Expressions and Statements

Attribution for While

(100)

Attribution for VarAssign

(101)

Translation of Imperative Language Constructs Translation of Expressions and Statements

Attribution for ArrAssign

(102)

Recommended Reading:

Wilhelm, Maurer: Sec. 2.4, pp. 12 –16

Appel: Sec. 7.2

(103)

Translation of Imperative Language Constructs Simplifying the Intermediate Representation

3.2.4 Simplifying the Intermediate Representation

(104)

Goals and techniques

Goals:

Simplification of the IR for later phases

Translation of one IR language to another one Techniques:

Attribute grammars

Term rewriting:

I Define rules of how source patterns are replaced by target patterns

I Apply the rules as long as possible

(105)

Translation of Imperative Language Constructs Simplifying the Intermediate Representation

Example: Source language

Stmt = Move | CJump | Label | StmtList Move ( Temp Exp )

CJump ( Exp String ) Label ( String) StmtList * Stmt

Exp = Temp | BinExp | StmtExp Temp ( String )

BinExp ( Exp Exp )

StmtExp ( StmtList Exp )

(106)

Example: Target language

The simpler language has:

no recursive expressions, simplified jump

no statement expression

SStmt = SMove | SCJump | Label | SStmtList SMove ( Temp SExp )

SCJump ( SExp String ) SExp = Temp | SBinExp SBinExp ( Temp Temp )

(107)

Translation of Imperative Language Constructs Simplifying the Intermediate Representation

Example: Attributes

Idea:

Statements have an attribute of typeSStmtList

Expressions have an attribute of type

I SStmtList: for the statements needed for the evaluation

I SExp: for the expression evaluating the result

I String: for the generation of unique temporary names More precisely:

syn SStmt Stmt.code() syn SStmt Exp.code() syn SExp Exp.sexp()

syn String Exp.uniqueStr()

(108)

Example: Attribute rules

The following slides present the attribute rules forMove,CJump, StmtExp,BinExp.

The rules forStmtList,Label, andTempare straightforward.

Notation: We use

@as infix operator to append two statement lists

[e1, ...,en]to construct a list from elementse1, ...,en

(109)

Translation of Imperative Language Constructs Simplifying the Intermediate Representation

Attribution for

(110)

Attribution for

(111)

Translation of Imperative Language Constructs Simplifying the Intermediate Representation

Attribution for

(112)

Attribution for

(113)

Translation of Object-Oriented Language Constructs

3.3 Translation of Object-Oriented Language

Constructs

(114)

3.3.1 Concepts of Object-Oriented Programming

Languages

(115)

Translation of Object-Oriented Language Constructs Concepts of Object-Oriented Programming Languages

Concepts of Object-Oriented Programming Languages

We consider a class-based language and use Java as an example.

Important Concepts:

Classes and Object Creation

Encapsulation

Subtyping and Inheritance

Dynamic Method Binding

(116)

Example: Object-Oriented Language Concepts Beispiel: (objektorientierte Sprachkonzepte)

class Person { String name;

String name;

int gebdatum; /* in der Form JJJJMMTT */

Person( String n, int gd ) { name = n;

gebdatum = gd;

gebdatum gd;

}

public void drucken() {

System.out.println("Name:"+ this.name);

System.out.println("Geb:"+ this.gebdatum);

}

boolean hat_geburtstag ( int datum ) { return (this.gebdatum%10000) ==

(datum%10000);

} } }

class Student extends Person { int matrikelnr;

int semester;

int semester;

Student(String n,int gd,int mnr,int sem) { super( n, gd );

matrikelnr = mnr;

semester = sem;

semester sem;

}

public void drucken() { super.drucken();

System.out.println( "Mnr:"+ matrikelnr);

i

System.out.println( "Sem:" + semester);

} }

c

Arnd Poetzsch-Heffter Translation to Intermediate Representation 116

(117)

Translation of Object-Oriented Language Constructs Concepts of Object-Oriented Programming Languages

Example: Object-Oriented Language Concepts (2) Beispiel: (objektorientierte Sprachkonzepte)

class Person { String name;

String name;

int gebdatum; /* in der Form JJJJMMTT */

Person( String n, int gd ) { name = n;

gebdatum = gd;

gebdatum gd;

}

public void drucken() {

System.out.println("Name:"+ this.name);

System.out.println("Geb:"+ this.gebdatum);

}

boolean hat_geburtstag ( int datum ) { return (this.gebdatum%10000) ==

(datum%10000);

} } }

class Student extends Person { int matrikelnr;

int semester;

int semester;

Student(String n,int gd,int mnr,int sem) { super( n, gd );

matrikelnr = mnr;

semester = sem;

semester sem;

}

public void drucken() { super.drucken();

System.out.println( "Mnr:"+ matrikelnr);

i

System.out.println( "Sem:" + semester);

} }

(118)

Example: Object-Oriented Language Concepts (3)

class Test {

public static void main( String[] argv ) { int i;

Person[] pf = new Person[3];

Person[] pf new Person[3];

pf[0] = new Person( "Meyer", 19631007 );

pf[1] = new Student("M\"uller",19641223,758475,5);

pf[2] = new Student("Planck",18580423,3454545,47);

for( i = 0; i<3; i = i+1 ) { pf[i].drucken();

pf[i].drucken();

} } }

Das Beispiel zeigt Klassen, Objekterzeugung,

Vererbung (mit Subtyping und Spezialisierung) sowie Vererbung (mit Subtyping und Spezialisierung) sowie dynamisches Binden von Methoden.

Anhand des obigen Beispiels erläutern wir die

3.2.2 Umsetzung mit

prozeduralen Sprachen

Anhand des obigen Beispiels erläutern wir die grundlegenden Übersetzungsschemata:

Klassen, Klassentypen ! Verbundtypen, Zeigertypen Objekterzeugung ! Allokation dyn. Variablen/Objekte Objekterzeugung ! Allokation dyn. Variablen/Objekte Methoden, Konstruktoren ! Prozeduren

dyn. Bindung ! Verwendung von Prozedurzeigern mit Selektion von Verbundkomponenten mit Selektion von Verbundkomponenten Als Zielsprache verwenden wir hier C.

The example demonstrates classes, object creation, inheritance (with subtyping and specialization) and dynamic method binding.

c

Arnd Poetzsch-Heffter Translation to Intermediate Representation 118

(119)

Translation of Object-Oriented Language Constructs Translation into Procedural Languages

3.3.2 Translation into Procedural Languages

(120)

Translation into Procedural Languages

Translation Schemes:

Classes, class types→record types, pointer types

Object creation→Allocation of dynamic variables/objects

Methods, constructors→procedures

Dynamic binding→Use of procedure pointers with selection of record components

We illustrate these schemes at the above example. The considered target language is C.

(121)

Translation of Object-Oriented Language Constructs Translation into Procedural Languages

Translation of Types and Methods

Basis data types of Java→basis data types of C, for example:

I intint

I booleanint

(typedef int boolean;)

Reference types of Java→pointer types of C, for example:

I StringString*

I PersonPerson*

where String and Person are record types in C.

Referenzen

ÄHNLICHE DOKUMENTE

Analogue to parameters, also procedure-local variables have to be stored in the stack frame, because there is one instance of the local variables for each procedure

Analoge to parameters, also procedure-local variables have to be stored in the stack frame, because there is one instance of the local variables for each procedure

Analoge to parameters, also procedure-local variables have to be stored in the stack frame, because there is one instance of the local variables for each procedure

In this approach the PCA model is used for a coarse holistic shape representation and details are represented by the LFA-based local models.. This object representation provides

a certain graph, is shown, and he wants to understand what it means — this corre- sponds to reception, though it involves the understanding of a non-linguistic sign;

T he MAD model does not require any specific type of return distributions, which enabled its application to portfolio optimization for mortgage-backed securities

The crisis in eastern Ukraine has not changed the Czech Republic, Hungary and Slovakia’s perception of their relations with Russia, which continues to be defined primarily

While former empirical work with German data has extensively analyzed the wage effect of training, none of them has accounted for the likely possibility that worker selection