4. Optimization and Code Generation

(1)

Compilers and Language Processing Tools

Summer Term 2013

Arnd Poetzsch-Heffter Annette Bieniusa

Software Technology Group TU Kaiserslautern

c

Arnd Poetzsch-Heffter 1

(2)

Content of Lecture

1. Introduction

2. Syntax and Type Analysis 2.1 Lexical Analysis

2.2 Context-Free Syntax Analysis

2.3 Context-Dependent Analysis (Semantic Analysis) 3. Translation to Intermediate Representation

3.1 Languages for Intermediate Representation 3.2 Translation of Imperative Language Constructs 3.3 Translation of Object-Oriented Language Constructs 3.4 Translation of Procedures

4. Optimization and Code Generation 4.1 Assembly and Machine Code 4.2 Optimization

4.3 Register Allocation 4.4 Further Aspects

(3)

Content of Lecture (2)

5. Selected Topics in Compiler Construction 5.1 Garbage Collection

5.2 Just-in-time Compilation

5.3 XML Processing (DOM, SAX, XSLT)

c

Arnd Poetzsch-Heffter 3

(4)

4. Optimization and Code Generation

(5)

Chapter Outline

4. Optimization and Code Generation 4.1 Assembly and Machine Languages 4.2 Optimization

4.3 Register Allocation 4.4 Further Aspects

c

Arnd Poetzsch-Heffter Optimization and Code Generation 5

(6)

Learning objectives

• Introduction to assembly and machine languages

• Different optimization techniques

• Different static analysis techniques

• Register allocation

• Further aspects of code generation

(7)

4.1 Assembly and Machine Languages

c

(8)

Assembly and Machine Languages

Introduction

Assembly languages have the following language constructs:

• Finite sequences of bits of various length: byte, word, halfword, ...

• Global memory

I register, flags (addressing by name)

I indexed, mostly word-addressed main memory

• Instructions

I load, store

I arithmetic and boolean operations

I execution control (jumps, procedures)

I simple, not combined statements

I possibly complex addressing of operands

• Initialization instructions

(9)

The MIPS Assembler

MIPS -Microprocessor withoutinterlockedpipelinestages

• RISC Architecture, originally 32 bit (since 1991 64bit)

• developed by John Hennessy (Stanford) starting 1981

• SPIM Simulator

c

(10)

MIPS Architecture

• Arithmetic-Logic Unit (ALU)

• Floating-Point Unit (FPU)

• 32 Registers (inkl. stack pointer, frame pointer, global pointer, return address)

• Main memory, 2³⁰ memory words (4 byte)

• 5-stage pipeline

(11)

MIPS Architecture

Memory PC

Adder

Register File

Sign Extend

IF / ID ID / EX

Imm RS1

RS2

Zero?

ALU

MUX EX / MEM

Memory

MUX

MEM / WB

MUX

Next SEQ PC Next SEQ PC

WB Data Branch

taken

IR

Instruction Fetch

Next PC

Instruction Decode

Register Fetch Execute

Address Calc. Memory Access Write Back

IF ID EX MEM WB

image: Wikipedia

c

(12)

Memory Structure

Reserved for OS

Stack Segment

free

Heap Segment

Data Segment

Text Segment

Reserved 0xFFFFFFFF

0x80000000 0x7FFFFFFF

0x10000000

0x00400000

0x00000000

$sp

(13)

Data Types and Literals in MIPS Assembly Language

Data Types

• Instructions are all 32 bits

• byte (8 bits), halfword (2 bytes), word (4 bytes)

• integer (1 word storage)

• single precision floats (1 word storage)

• double precision floats (2 word storage)

Literals

• Integers (e.g. 4, 2, -236, 0x44)

• Floats (e.g. 3.41, -0.323e5)

• Characters in single quotes, e.g. ’b’

• Strings in double quotes, e.g. "Hello World"

c

(14)

MIPS Registers

No Name P* Description

0 $zero - the constant 0

1 $at - assembler temporary (reserved by the assembler) 2-3 $v0, $v1 no values for function results and expression evaluation 4-7 $a0 - $a3 no arguments for subroutine calls

8-15 $t0 - $t7 no temporaries 16-23 $s0 - $s7 yes saved temporaries 24-25 $t8 - $t9 no additional temporaries 26-27 $k0, $k1 no reserved for OS kernel

28 $gp yes global pointer

29 $sp yes stack pointer

30 $fp yes frame pointer

31 $ra yes return address

*callee must preserve value

(15)

MIPS Instruction Format

• Instructions are always 32 bit

• Opcode in first 6 bits

• 3 types of instructions: R-, I-, and J-instructions R-Instructions

opcode (6) rs (5) rt (5) rd (5) shamt (5) funct (6) I-Instructions

opcode (6) rs (5) rt (5) immediate (16) J-Instructions

opcode (6) address (26)

c

(16)

MIPS Instructions

In the following let, r1, r2, r3, be registers (e.g. $s1, $t3) and let c be constant values (e.g. 4, 100, -4).

Arithmetic

add add r1, r2, r3 r1 = r2 + r3 subtract sub r1, r2, r3 r1 = r2 - r3 add immediate addi r1, r2, c r1 = r2 + c multiply mult r1, r2, r3 r1 = r2 * r3

(lower 32 bits of result) move move r1, r2 addi r1, r2, 0

(17)

MIPS Instructions (2)

Data Transfer

load word lw r1, c(r2) r1 = Memory[r2 + c]

store word sw r1, c(r2) Memory[r2 + c] = r1 load immediate li r1, c r1 = c

load half lh r1, c(r2) r1 = Memory[r2 + c]

store half sh r1, c(r2) Memory[r2 + c] = r1 load byte lb r1, c(r2) r1 = Memory[r2 + c]

store byte sb r1, c(r2) Memory[r2 + c] = r1

c

(18)

MIPS Instructions (3)

Logical

and and r1, r2, r3 r1 = r2 & r3 or or r1, r2, r3 r1 = r2 | r3 nor nor r1, r2, r3 r1 =¬( r2 | r3 ) and immediate andi r1, r2, c r1 = r2 & c or immediate ori r1, r2, c r1 = r2 | c shift left logical sll r1, r2, c r1 = r2 « c shift right logical srl r1, r2, c r1 = r2 » c

(19)

MIPS Instructions (4)

Conditional Branches

branch on equal beq r1, r2, label if (r1 == r2) goto label branch on not equal bne r1, r2, label if (r1 != r2)

goto label set on less than slt r1, r2, r3 if (r2<r3)

r1 := 1 else r1 := 0 set o.l.t. immediate slti r1, r2, c if (r2<c)

r1 := 1 else r1 := 0

Unconditional Branches

jump j label goto label jump register jr r1 goto r1

jump and link jal label $ra = PC + 4; goto label

c

(20)

Subroutine Calls

Subroutine call (jump and link)

jal label # jump and link

• copy program counter to $ra

• jump to label

• Note: before call store $ra on stack Subroutine return (jump register)

jr $ra # jump register

• jump to return address in $ra

(21)

Working with the Stack

Push data on the stack

sw $ra, ($sp) # save return address on stack addi $sp, $sp, -4 # decrement stack pointer sw $fp, ($sp) # save frame pointer on stack addi $sp, $sp, -4 # decrement stack pointer

Pop data from the stack

addi $sp, $sp, 4 # increment stack pointer lw $fp, ($sp) # pop saved frame pointer addi $sp, $sp, 4 # increment stack pointer lw $ra, ($sp) # pop saved return address

c

(22)

Adressing in MIPS

• Immediate: Operand is a constant, e.g. 25

• Register: Operand is a register, e.g. $s2

• Base or Displacement Addressing:Operand is a memory location whose address is the sum of the register and a constant, e.g. 8($sp)

• PC relative: Address is the sum of PC and a constant

• Pseudodirect Addressing:Jump address is the 26 bit of the instruction with the upper bits of the PC

(23)

Syscalls for MARS/SPIM Simulators

How to use System Calls:

• load service number into register $v0

• load argument values, if any into $a0, $a1, $a2

• issue call instructionsyscall

• retrieve return values, if any Example:

li $v0, 1 # print integer

move $a0, $t0 # load value into $a0 syscall

c

(24)

List of System Services

Service Code in $v0 Arguments

print integer 1 $a0 = integer to print

print string 4 $a0 = address of

null-terminated string to print exit (terminate execution) 10

print character 11 $a0 = character to print

exit2 (terminate with value) 17 $a0 = termination result

(25)

MIPS Assembly Program Structure

.data # data declarations follow this line

# ...

.text # instructions follow this line

# ...

main: # indicates the first instruction to execute

# ...

c

(26)

Data Declarations

Format

<name>: <type> (<initial values> | <allocated space>)

Example

.data # data declarations follow

var: .word 3 # integer variable with initial value 3 array1: .byte ’a’,’b’ # 2-element character array initialized

# with ’a’ and ’b’

array2: .space 40 # allocate 40 bytes, uninitialized

(27)

Example: Translation to MIPS

The example illustrates the MIPS assembler and typical translation tasks.

Code quality is not considered.

Source Code in C

1 char a[3], b[3];

2 int i;

3 char res;

4 int main() { 5 i = 2;

6 res = 1;

7 while( -1 < i ) { 8 if( res ) {

9 res = (a[i]==b[i]);

10 i = i-1;

11 } else { 12 i = i-1;

13 }

14 }

15 return res;

16 }

c

(28)

Source Code in C with Labels

1 char a[3], b[3];

2 int i;

3 char res;

4 int main() {

5 main: i = 2;

6 res = 1;

7 loop: while( -1 < i ) {

8 if( res ) {

9 res = (a[i]==b[i]);

10 after: i = i-1;

11 } else {

12 elseif: i = i-1;

13 } // afterif:

14 }

15 exit: return res;

16 }

(29)

Source Code in C with Gotos

1 char a[3], b[3];

2 int i;

3 char res;

4 int main() {

5 i = 2;

6 res = 1;

7 loop: if (! (-1 < i ))

8 goto exit;

9 if( !res )

10 goto elseif;

11 if (a[i]==b[i])

12 goto equal;

13 res = 0;

14 goto after;

15 equal: res = 1;

16 after: i = i-1;

17 goto afterif;

18 elseif: i = i-1;

19 afterif: goto loop;

20 exit: return res;

21 }

c

(30)

MIPS Program

# sp + 0 : i

# sp + 4 : res

# sp + 5 : base address of a[3]

# sp + 8 : base address of b[3]

main:

addi $sp, $sp, -12 # make space for the variables li $t1, 2

sw $t1, 0($sp) # i = 2 li $t1, 1

sb $t1, 4($sp) # set res at sp +4

(31)

MIPS Program (2)

loop:

lw $t2, 0($sp) # load i into $t2 li $t3, -1 # load -1 into $t3 slt $t0, $t3, $t2 # -1 < i ?

beq $t0, $zero, exit # if not -1 < i goto exit lb $t1, 4($sp) # load res from stack into $t1 beq $t1, $zero, elseif # if res == 0 goto else if add $t4, $sp, 5 # base address of array a add $t4, $t4, $t2 # add offset/ array index lb $t0, 0($t4) # load a[i]

add $t4, $sp, 8 # base address of array b add $t4, $t4, $t2 # add offset/ array index lb $t1, 0($t4) # load b[i]

beq $t0, $t1, equal # if a[i] == b[i]

sb $zero, 4($sp) # set res to 0

j after

c

(32)

MIPS Program (3)

equal:

addi $t3, $zero, 1 # $t3 = 1 sb $t3, 4($sp) # res = $t3 after:

subi $t2, $t2, 1 # i = i-1

sw $t2, 0($sp) # store i to $sp +4

j afterif # goto end of if statement elseif:

subi $t2, $t2, 1 # i = i-1

sw $t2, 0($sp) # store i to $sp +4 afterif:

j loop # return to loop

exit:

lw $a0, 4($sp) # terminate with exit code res addi $sp, $sp, 12 # reset stack pointer

li $v0, 17 syscall

(33)

Translation to MIPS

Remarks:

The example illustrates typical translation tasks:

• Translation of data types, memory management, addressing

• Translation of expressions, management of intermediate results, mapping of operations of the source language to operations of the target language

• Translation of statements by implementation with jumps

• Bad code quality with simple systematic approach

c

(34)

MIPS Abstract Syntax

Prog * Instruction

Instruction = ADD (Register reg0, Register reg1, Register reg2)

| ADDI (Register reg0, Register reg1, Const const0)

| BEQ (Register reg0, Register reg1, Label label0)

| SLT (Register reg0, Register reg1, Register reg2)

| SLTI (Register reg0, Register reg1, Const const0)

| J (Label label0)

| JR (Register reg0)

| JAL (Label label0) ...

Const ( Integer value ) Label ( Integer labelId )

| KReg | GP () | SP () | FP () | RA () VReg = V0 () | V1 ()

AReg = A0 () | A1 () | A2 () | A3 () ...

(35)

4.2 Optimization

c

(36)

Optimization

Optimization refers to improving the code with the following goals:

• Runtime behavior

• Memory consumption

• Size of code

• Energy consumption

(37)

Optimization (2)

We distinguish the following kinds of optimizations:

• machine-independent optimizations

• machine-dependent optimizations (exploit properties of a particular real machine)

and

• local optimizations

• intra-procedural optimizations

• inter-procedural/global optimizations

c

(38)

Optimization

Remark on Optimization

Appel (Chap. 17, p 350):

"In fact, there can never be a complete list [of optimizations]. "

"Computability theory shows that it will always be possible to invent new optimizing transformations."

(39)

4.2.1 Classical Optimization Techniques

c

(40)

Optimization Classical Optimization Techniques

Constant propagation

If the value of a variable is constant, the variable can be replaced with the constant.

(41)

Constant folding

Evaluate all expressions with constants as operands at compile time.

Iteration of Constant Folding and Propagation:

c

(42)

Non-local constant optimization

For each program position, the possible values for each variable are required. If the set of possible values is infinite, it has to be abstracted appropriately.

(43)

Copy propagation

Eliminate all copies of variables, i.e., if there exist several variables x,y,z at a program position, that are known to have the same value, all uses of y and z are replaced by x.

c

(44)

Copy propagation (2)

This can also be done at join points of control flow or for loops:

For each program point, the information which variables have the same value is required.

(45)

Common subexpression elimination

If an expression or a statement contains the same partial expression several times, the goal is to evaluate this subexpression only once.

c

(46)

Common subexpression elimination (2)

Optimization of a basic block is done after transformation to SSA and construction of a DAG:

(47)

Common subexpression elimination (3)

Remarks:

• The elimination of repeated computations is often done before transformation to 3AC, but can also be reasonable following other transformations.

• The DAG representation of expressions is also used as intermediate language by some authors.

c

(48)

Algebraic optimizations

Algebraic laws can be applied in order to be able to use other optimizations. For example, use associativity and commutativity of addition:

Caution: For finite data type, common algebraic laws are not valid in general.

(49)

Strength reduction

Replace expensive operations by more efficient operations (partially machine-dependent).

For example: y: = 2 * x can be replaced by y : = x + x

or by

y: = x « 1

c

(50)

Inline expansion of procedure calls

Replace call to non-recursive procedure by its body with appropriate substitution of parameters.

Note: This reduces execution time, but increases code size.

(51)

Inline expansion of procedure calls (2)

Remarks:

• Expansion is in general more than text replacement:

c

(52)

Inline expansion of procedure calls (3)

• In OO programs with relatively short methods, expansion is an important optimization technique. But, precise information about the target object is required.

• A refinement of inline expansion is the specialization of

procedures/functions if some of the actual parameters are known.

This technique can also be applied to recursive procedures/functions.

(53)

Dead code elimination

Remove code that is not reached during execution or that has no influence on execution.

In one of the above examples, constant folding and propagation produced the following code:

Provided, t3 and t4 are no longer used after the basic block (not live).

c

(54)

Dead code elimination (2)

A typical example for non-reachable and thus, dead code that can be eliminated:

(55)

Dead code elimination (3)

Remarks:

• Dead code is often caused by optimizations.

• Another source of dead code are program modifications.

• In the first case, liveness information is the prerequiste for dead code elimination.

c

(56)

Code motion

Move commands over branching points in the control flow graph such that they end up in basic blocks that are less often executed.

We consider two cases:

• Move commands in succeeding or preceeding branches

• Move code out of loops

Optimization of loops is very profitable, because code inside loops is executed more often than code not contained in a loop.

(57)

Move code over branching points

If a sequential computation branches, the branches are less often executed than the sequence.

c

(58)

Move code over branching points (2)

Prerequisite for this optimization is that a defined variable is only used in one branch.

Moving the command over a preceeding joint point can be advisable, if the command can be eliminated by optimization from one of the branches.

(59)

Partial redundancy elimination

Definition (Partial Redundancy)

An assignment isredundantat a program positions, if it has already been executed on all paths tos.

An expressioneisredundantats, if the value ofehas already been calculated on all paths tos.

An assignment/expression ispartially redundantats, if it is redundant with respect to some execution paths leading tos.

c

(60)

Partial redundancy elimination (2)

Example:

(61)

Partial redundancy elimination (3)

Elimination of partial redundancy:

c

(62)

Partial redundancy elimination (4)

Remarks:

• PRE can be seen as a combination and extension of common subexpression elimination and code motion.

• Extension: Elimination of partial redundancy according to estimated probability for execution of specific paths.

(63)

Code motion from loops

Idea: Computations in loops whose operations are not changed inside the loop should be done outside the loop.

Provided, t1 is not live at the end of the top-most block on the left side.

c

(64)

Optimization of loop variables

Variables and expressions that are not changed during the execution of a loop are calledloop invariant.

Loops often have variables that are increased/decreased systematically in each loop execution, e.g., for-loops.

Often, a loop variable depends on another loop variable, e.g., a relative address depends on the loop counter variable.

(65)

Optimization of loop variables (2)

Definition (Loop Variables)

A variablei is calledexplicit loop variableof a loopS, if there is exactly one definition ofi inSof the formi :=i+cwherec is loop invariant.

A variablek is calledderived loop variableof a loopS, if there is exactly one definition ofk inSof the formk :=j∗cork :=j+d wherej is a loop variable andcandd are loop invariant.

c

(66)

Induction variable analysis

Compute derived loop variables inductively, i.e., instead of computing them from the value of the loop variable, compute them from the valued of the previous loop execution.

Note: For optimization of derived loop variables, the dependencies between variable definitions have to be precisely understood.

(67)

Loop unrolling

If the number of loop executions is known statically or properties about the number of loop executions (e.g., always an even number) can be inferred, the loop body can be copied several times to save comparisons and jumps.

Provided,ix is dead at the end of the fragment.

Note, the static computation ofix’s values in the unrolled loop.

c

(68)

Loop unrolling (2)

Remarks:

• Partial loop unrolling aims at obtaining larger basic blocks in loops to have more optimization options.

• Loop unrolling is in particular important for parallel processor architectures and pipelined processing (machine-dependent).

(69)

Optimization for other language classes

The discussed optimizations aim at imperative languages. For optimizing programs of other language classes, special techniques have been developed.

For example:

• Object-oriented languages: Optimization of dynamic binding (type analysis)

• Non-strict functional languages: Optimization of lazy function calls (strictness analysis)

• Logic programming languages: Optimization of unification

c

(70)

Optimization Potential of Optimizations

4.2.2 Potential of Optimizations

(71)

Potential of optimizations - Example

Consider procedureskprodfor the evaluation of the optimization techniques:

4.2.2 Optimierungspotential

Am Beispiel der Prozedur skprod demonstrieren

i i i d bi T h ik d d

wir einige der obigen Techniken und das Verbesserungspotential, das durch Optimierungen erzielt werden kann; dabei skizzieren wir auch dessen Bewertung.

k d

skprod:

res:= 0 ix := 0

t0 := lng-1 if ix<=t0

true false

return res t1 := i1+ix

tx := t1*4 ta := a+tx t2 := *ta t1 := i2+ix t1 : i2+ix tx := t1*4 tb := b+tx t3 := *tb t1 := t2*t3 res:= res+t1 ix := ix+1

Bewertung: Anzahl der Befehlsschritte in Abhängigkeit

28.06.2007 © A. Poetzsch-Heffter, TU Kaiserslautern 322

Bewertung: Anzahl der Befehlsschritte in Abhängigkeit von lng: 2 + 2 + 13*lng + 1 = 13*lng + 5

( lng = 100: 1305, lng = 1000: 13005 )

Evaluation:

Number of steps depending onlng:

2+2+13∗lng+1=13∗lng+5 lng=100: 1305

lng=1000: 13005

c

(72)

Potential of optimizations - Example (2)

Move computation of loop invariant out of loop:Herausziehen der Berechnung der Schleifeninvariante t0:

skprod:

res:= 0 res:= 0 ix := 0 t0 := lng-1

if i < t0

return res t1 := i1+ix

tx := t1*4 if ix<=t0

true false

ta := a+tx t2 := *ta t1 := i2+ix tx := t1*4 tb := b+tx tb : b+tx t3 := *tb t1 := t2*t3 res:= res+t1 ix := ix+1

Bewertung: 3 + 1 + 12*lng + 1 = 12*lng + 5g g g ( lng = 100: 1205, lng = 1000: 12005 )

Evaluation: 3+1+12*lng+1 = 12 *lng + 5

c

(73)

Potential of optimizations - Example (3)

Optimization of loop variables: There are no derived loop variables, because t1 and tx have several definitions; transformation to SSA for t1 and tx yields that t11, tx1, ta, t12, tb become derived loop variables.

Optimierung von Schleifenvariablen (1):

Zunächst gibt es keine abgeleiteten Schleifenvariablen, da t1 und tx mehrere Definitionen besitzen; Einführen von SSA für t1 und tx macht t11, tx1, ta, t12, tx2, tb zu abgeleiteten Schleifenvariablen:

skprod:

res:= 0 res:= 0 ix := 0 t0 := lng-1

if i < t0

return res t11:= i1+ix

tx1:= t11*4 1 if ix<=t0

true false

ta := a+tx1 t2 := *ta t12:= i2+ix tx2:= t12*4 tb := b+tx2 tb : b t t3 := *tb t13:= t2*t3 res:= res+t13 ix := ix+1

28.06.2007 © A. Poetzsch-Heffter, TU Kaiserslautern 324 c

(74)

Potential of optimizations - Example (4)

Optimization of loop variables(2): Inductive definition of loop variablesOptimierung von Schleifenvariablen (2):

Initialisierung und induktive Definition der S hl if i bl

Schleifenvariablen:

skprod:

res:= 0 res:= 0 ix := 0 t0 := lng-1 t11:= i1-1 tx1:= t11*4 ta := a+tx1 t12:= i2-1 tx2:= t12*4 tb := b+tx2

t11:= t11+1 if ix<=t0 true false

return res t11:= t11+1

tx1:= tx1+4 ta := ta+4 t2 := *ta t12:= t12+1 tx2:= tx2+4 tb := tb+4 t3 := *tb t13:= t2*t3 res:= res+t13

28.06.2007 © A. Poetzsch-Heffter, TU Kaiserslautern 325 res: res+t13

ix := ix+1

(75)

Potential of optimizations - Example (5)

Dead Code Elimination: t11, tx1, t12, tx2 do not influence the result.Elimination toten Codes:

Die Zuweisungen an t11, tx1, t12, tx2sind toter Code da sie das Ergebnis nicht beeinflussen

skprod:

Code, da sie das Ergebnis nicht beeinflussen.

res:= 0 ix := 0 t0 := lng-1 t11:= i1-1 tx1:= t11*4 tx1: t11 4 ta := a+tx1 t12:= i2-1 tx2:= t12*4 tb := b+tx2

if ix<=t0

true false

return res ta := ta+4

t2 := *ta tb := tb+4 t3 := *tb t13:= t2*t3 t13: t2 t3 res:= res+t13 ix := ix+1

Bewertung: 9 + 1 + 8*lng + 1 = 8*lng + 11 ( lng = 100: 811, lng = 1000: 8011 )

Evaluation: 9 + 1 + 8 * lng +1 = 8 * lng +11

c

(76)

Potential of optimizations - Example (6)

Algebraic Optimizations: Use invariantsta=4∗(i1−1+ix) +afor the comparisonta≤4∗(i1Algebraische Optimierung:−1+t0) +a

Ausnutzen der Invarianten: ta = 4*(i1-1+ix)+ a für den Vergleich: ta < 4*(i1 1+t0)+ a für den Vergleich: ta <= 4*(i1-1+t0)+ a

skprod:

res:= 0 ix := 0 t0 := lng-1 t11:= i1-1 tx1:= t11*4 tx1: t11 4 ta := a+tx1 t12:= i2-1 tx2:= t12*4 tb := b+tx2 t4 := t11+t0 t5 := 4*t4 t6 := t5+a

ta := ta+4 t2 := *ta

if ta<=t6

true false

return res t2 : ta

tb := tb+4 t3 := *tb t13:= t2*t3 res:= res+t13

28.06.2007 © A. Poetzsch-Heffter, TU Kaiserslautern 327 ix := ix+1

(77)

Potential of optimizations - Example (7)

Dead Code Elimination: Assignment to ix is dead code and can be eliminated.Elimination toten Codes:

Durch die Transformation der Schleifenbedingung ist

di Z i C d d d k

die Zuweisung an ixtoter Code geworden und kann eliminiert werden:

skprod:

res:= 0 t0 := lng-1 t11:= i1-1 tx1:= t11*4 ta := a+tx1 ta := a+tx1 t12:= i2-1 tx2:= t12*4 tb := b+tx2 t4 := t11+t0 t5 := 4*t4 t6 := t5+a if ta<=t6

return res ta := ta+4

t2 := *ta tb := tb+4

if ta< t6

true false

tb : tb+4 t3 := *tb t13:= t2*t3 res:= res+t13

28.06.2007 © A. Poetzsch-Heffter, TU Kaiserslautern 328 Bewertung: 11 + 1 + 7*lng + 1 = 7*lng + 13 ( lng = 100: 713, lng = 1000: 7013 )

Evaluation: 11 + 1 + 7 * Ing +1 = 7 * lng + 13

c

(78)

Potential of optimizations - Example (8)

Remarks:

• Reduction of execution steps by almost half, where the most significant reductions are achieved by loop optimization.

• Combination of optimization techniques is important. Determining the ordering of optimizations is in general difficult.

• We have only considered optimizations at examples. The difficulty is to find algorithms and heuristics for detecting optimization potential automatically and for executing the optimizing transformations.

(79)

4.2.3 Data flow analysis

c

(80)

Optimization Data flow analysis

Data flow analysis: Introduction

For optimizations, data flow information is required that can be obtained by data flow analysis.

Goal: Explanation of basic concepts of data flow analysis at examples Outline:

• Liveness analysis (Typical example of data flow analysis)

• Data flow equations

• Important analysis classes

Each analysis has an exact specification which information it provides.

(81)

Control flow graphs

Data flow analyses are usually defined based on a representation of a program or procedure as acontrol flow graph,CFGfor short:

• nodes are individual program statements or basic blocks

• an edge fromnton⁰ represents a potential control transfer from (the end of)nto (the beginning of)n⁰

Out-edges from nodenlead tosuccessor nodes,succ(n) In-edgesto nodencome frompredecessor nodes, pred(n)

Data flow information is mostly computed for(CF-)positionsbefore and after nodes.

c

(82)

Liveness analysis

A temporary or variable is live at a position of a CFG if it holds a value that may be needed in the future. More precisely:

Definition (Liveness Analysis)

LetPbe a program. A variablev isliveat a CF-positionSinP if there is an execution pathπ fromSto a use ofv such that there is no definition ofv onπ.

Theliveness analysisdetermines for all positionsSinP which variables are live atS.

(83)

Liveness analysis (2)

Remarks:

• The definition of liveness of variables is static/syntactic. (In contrast, dead code was defined dynamically/semantically.)

• The result of the liveness analysis for a programmP can be represented as a functionlivemapping positions inP to bit vectors, where a bit vector contains an entry for each variable in P. Letibe the index of a variable inP, then it holds that:

live(S)[i] =1 iff v is live at positionS

c

(84)

Liveness analysis (3)

Idea:

• In a procedure-local analysis, exactly the global variables are live at the end of the exit block of the procedure.

• If the live variablesout(n)after a nodenare known, the live variablesin(n)beforenare computed by:

in(n) =gen(n)∪(out(n)\kill(n)) where

I gen(n)is the set of variablesv such thatv is applied innwithout a prior definition ofv

I kill(n)is the set of variables that are defined inn

(85)

Liveness analysis (4)

As the setin(n)is computed fromout(n), we have abackward analysis.

Fornnot the exit block of the procedure,out(n)is obtained by out(n) =[

in(n_i)for all successorsn_iofn

Thus, for a program without loops,in(n)andout(n)are defined for all nodesn. Otherwise, we obtain a system of recursive equations.

c

(86)

Liveness analysis - Example

Question: How do we compute out(B2)?

(87)

Data flow equations

Theory:

• There is always a solution for equations of the considered form.

• There is always a smallest solution that is obtained by an iteration starting from emptyinandoutsets.

Note: The equations may have several solutions.

c

(88)

Ambiguity of solutions - Example

a := a B0:

b := 7 B1:

out(B0) =in(B0)∪in(B1) out(B1) ={ }

in(B0) =gen(B0)∪(out(B0)\kill(B0))

={a} ∪out(B0)

in(B1) =gen(B1)∪(out(B1)\kill(B1))

={ }

Thus,out(B0) =in(B0), and hencein(B0) ={a} ∪in(B0).

Possible Solutions: in(B0) ={a}orin(B0) ={a,b}

(89)

Computation of smallest fixpoint

foreach n gen(n) := ...

kill(n) := ...

in(n) := ∅ out(n) := ∅

if nis exit node then out(n) := ....

repeat

foreach n in⁰(n) := in(n) out⁰(n) := out(n)

in(n) := gen(n)∪(out(n)\kill(n) ) out(n) := S

s∈succ(n)in(s)

until ∀n.in⁰(n) =in(n)∧out⁰(n) =out(n)

c

(90)

Complexity

LetNbe the size of the input program

• ≤N nodes in CFG

⇒ ≤Nvariables

⇒N elements perin/out

⇒ O(N)time per set-union

• forloop performs constant number of set operations per node

⇒O(N²)time forforloop

• each iteration ofrepeatloop can only add to each set sets can contain at most every variable

⇒ sizes of all in and out sets sum to 2N²,

bounding the number of iterations of therepeatloop

⇒ worst-case complexity ofO(N⁴)

• ordering can cutrepeatloop down to 2-3 iterations

⇒O(N)orO(N²)in practice

(91)

Further analyses and classes of analyses

Many data flow analyses can be described as bit vector problems:

• Reaching definitions: Which definitions reach a positionS?

• Available expressions for elimination of repeated computations

• Very busy expressions: Which expression is needed for all subsequent computations?

The according analyses can be treated analogue to liveness analysis, but differ in

• the definition of the data flow information

• the definition ofgenandkill

• the direction of the analysis and the equations

c

(92)

Further analyses and classes of analyses (2)

For backward analyses, the data flow information before a nodenis obtained from the information aftern:

in(n) =gen(n)∪(out(n)\kill(n))

Analyses can be distinguished according to whether they consider the conjunction or intersection of the successor information:

out(n) = [

n_i∈succ(n)

in(n_i)

or

out(n) = \

n_i∈succ(n)

in(n_i)

(93)

Further analyses and classes of analyses (3)

For forward analyses, the dependency is the other way round:

out(n) =gen(n)∪(in(n)\kill(n)) with

in(n) = [

n_i∈pred(n)

out(n_i)

or

in(n) = \

ni∈pred(n)

out(n_i)

c

(94)

Further analyses and classes of analyses (4)

Examples for each class of analysis:

conjunction intersection forward reachable definitions available expressions backward live variables busy expressions

(95)

Further analyses and classes of analyses (5)

For bit vector problems, data flow information consists of subsets of finite sets.

For other analyses, the collected information is more complex, e.g., for constant propagation, we consider mappings from variables to values.

For interprocedural analyses, complexity increases because the flow graph is not static.

Formal basis for the development andcorrectnessof optimizations is provided by the theory ofabstract interpretation.

c

(96)

Literature

Recommended reading:

• Flemming Nielson, Hanne R. Nielson, Chris Hankin: Principles of Program Analysis (Springer-Verlag, corrected 2nd printing, 2005)

(97)

4.2.4 Non-Local Program Analysis

c

(98)

Optimization Non-Local Program Analysis

Non-local program analysis

We use apoints-toanalysis to demonstrate:

• interprocedural aspects: The analysis crosses the borders of single procedures.

• constraints: Program analysis very often involves solving or refining constraints.

• complex analysis results: The analysis result cannot be represented locally for a statement.

• analysis as abstraction: The result of the analysis is an abstraction of all possible program executions.

(99)

Points-to analysis

Analysis for programs with pointers and for object-oriented programs Goal: Compute which references to which records/objects a variable can hold.

Applications of Analysis Results:

Basis for optimizations

• Alias information (e.g., important for code motion)

I Can p.f = x cause changes to an object referenced by q?

I Can z = p.f read information that is written by p.f = x?

• Call graph construction

• Resolution of virtual method calls

• Escape analysis

c

(100)

Alias information

Beispiele: (Verwendung von Points-to- Analyseinformation)

Analyseinformation)

(1) p.f = x;

(2) f

A. Nutzen von Alias-Information:

(2) y = q.f;

(3) q.f = z;

p == q: (1)

(2) y = x;

(2) y x;

(3) q.f = z;

p != q: Erste Anweisung lässt sich mit den anderen beiden vertauschen

anderen beiden vertauschen.

B. Elimination dynamischer Bindung:

class A { class A {

void m( ... ) { ... } }

class B extends A { void m( ) { } void m( ... ) { ... } }

...

A p;

p = new B();

p.m(...) // Aufruf von B::m

First two statements can be switched.

c

(101)

Elimination of dynamic binding

Beispiele: (Verwendung von Points-to- Analyseinformation)

Analyseinformation)

(1) p.f = x;

(2) f

A. Nutzen von Alias-Information:

(2) y = q.f;

(3) q.f = z;

p == q: (1)

(2) y = x;

(2) y x;

(3) q.f = z;

p != q: Erste Anweisung lässt sich mit den anderen beiden vertauschen

anderen beiden vertauschen.

B. Elimination dynamischer Bindung:

class A { class A {

void m( ... ) { ... } }

class B extends A { void m( ) { } void m( ... ) { ... } }

...

A p;

p = new B();

p.m(...) // Aufruf von B::mCall of B::m

c

(102)

Escape analysis

C. Escape-Analyse:

R m( A p ) {( p ) { B q;

q = new B(); // Kellerverwaltung möglich q.f = p;

q.g = p.n();

q g p ();

return q.g;

}

Eine Points-to-Analyse für Java:

Vereinfachungen:

• Gesamte Programm ist bekannt.

• Nur Zuweisungen und Methodenaufrufe der folgenden Form:

Di kt Z i

- Direkte Zuweisung: l = r - Schreiben auf Instanzvariablen: l.f = r - Lesen von Instanzvariablen: l = r.f

Objekterzeugung: l C()

- Objekterzeugung: l = new C() - Einfacher Methodenaufruf: l = r0.m(r1,..)

• Ausdrücke ohne Seiteneffekte

• Zusammengesetzte Anweisungen

Can be stored on stack

c

(103)

Points-to analysis for Java

Simplifications and assumptions about underlying language

• Complete program is known.

• Only assignments and method calls of the following form are used:

I Direct assignment:l = r

I Write to instance variables:l.f = r

I Read of instance variables:l = r.f

I Object creation:l = new C()

I Simple method call:l = r0.m(r1, ...)

• Expressions without side effects

• Compound statements

c

(104)

Points-to analysis for Java (2)

Analysis type

• Flow-insensitive:The control flow of the program has no influence on the analysis result. The states of the variables at different program points are combined.

• Context-insensitive: Method calls at different program points are not distinguished.

(105)

Points-to analysis for Java (3)

Points-to graph as abstraction

Result of the analysis is a so-calledpoints-to graphhaving

• abstract variables and abstract objects as nodes

• edges represent that an abstract variable may have a reference to an abstract object

Abstract variables V represent sets of concrete variables at runtime.

Abstract objects O represent sets of concrete objects at runtime.

An edge between V and O means that in a certain program state, a concrete variable in V may reference an object in O.

c

(106)

Points-to graph - Example

Beispiel: (Points-to-Graph)

class Y { ... } class X {

Y f;

void set( Y r ) { this.f = r; } static void main() {

X p = new X(); // s1 „erzeugt“ o1 Y q = new Y(); // s2 „erzeugt“ o2q (); // „ g p.set(q);

} }

p

o1 this

o1

f q

r

o2

r

c