Register Allocation
Efficient code has to use the available registers on the target machine as much as possible: Accessing registers is much faster then
accessing memory (the same holds for cache).
Two Aspects:
• Register Allocation: Determine which variables are implemented by registers at which positions.
• Register Assignment: Determine which register implements which variable at which positions.
With register allocation, we mean both aspects.
Ina Schaefer Selected Aspects of Compilers 100
Register Allocation
Register Allocation (2)
Goals of Register Allocation
1. Generate code that requires as little registers as possible
2. Avoid unnecessary memory accesses, i.e., not only temporaries, but also program variables are implemented by registers.
3. Allocate registers such that they can be used as much as
possible, i.e., registers should not be used for variables that are only rarely accessed.
4. Obey programmer’s requirements.
Ina Schaefer Selected Aspects of Compilers 101
Register Allocation (3)
Outline
• Algorithm interleaving code generation and register allocation for nested expressions (cf. Goal 1)
• Algorithm for procedure-local register allocation (cf. Goals 2 and 3)
• Combination and other aspects
Ina Schaefer Selected Aspects of Compilers 102
Register Allocation Evaluation Ordering with Minimal Registers
Evaluation Ordering with Minimal Registers
The algorithm by Sethi and Ullmann is an example of an integrated approach for register allocation and code generation.
(cf. Wilhelm, Maurer, Sect. 12.4.1, p. 584 ff) Input:
An assignment with a nested expression on the right hand side
4.3.1 Auswertungsordnung mit minimalem Registerbedarf minimalem Registerbedarf
Der Algorithmus von Sethi-Ullman ist ein Beispiel für eine integriertes Verfahren zur Registerzuteilung und Codeerzeugung.
Eingabe:
Eine Zuweisung mit zusammengesetztem Ausdruck g g auf der rechten Seite:
Assign ( Var, Exp ) Exp = BinExp | Var BinExp ( Exp Op Exp ) BinExp ( Exp, Op, Exp ) Var ( Ident )
Ausgabe:
Zugehörige Maschinencode bzw Zwischensprachen Zugehörige Maschinencode bzw. Zwischensprachen- code mit zugewiesenen Registern. Wir betrachten hier Zwei-Adresscode, d.h. Code mit maximal einem
Speicherzugriff:
i [ ]
Ri := M[V]
M[V] := Ri
Ri := Ri op M[V]
Ri := Ri op Rj
28.06.2007 © A. Poetzsch-Heffter, TU Kaiserslautern 346
(vgl. Wilhelm/Maurer 12.4.1, Seite 584 ff)
Ina Schaefer Selected Aspects of Compilers 103
Register Allocation Evaluation Ordering with Minimal Registers
Evaluation Ordering with Minimal Registers (2)
Output:
Machine or intermediate language code with assigned registers.
We consider two-address code, i.e., code with one memory access at maximum. The machine has r registers represented by R0, . . . ,Rr−1.
minimalem Registerbedarf minimalem Registerbedarf
Der Algorithmus von Sethi-Ullman ist ein Beispiel für eine integriertes Verfahren zur Registerzuteilung und Codeerzeugung.
Eingabe:
Eine Zuweisung mit zusammengesetztem Ausdruck g g auf der rechten Seite:
Assign ( Var, Exp ) Exp = BinExp | Var BinExp ( Exp Op Exp ) BinExp ( Exp, Op, Exp ) Var ( Ident )
Ausgabe:
Zugehörige Maschinencode bzw Zwischensprachen Zugehörige Maschinencode bzw. Zwischensprachen- code mit zugewiesenen Registern. Wir betrachten hier Zwei-Adresscode, d.h. Code mit maximal einem
Speicherzugriff:
i [ ]
Ri := M[V]
M[V] := Ri
Ri := Ri op M[V]
Ri := Ri op Rj
28.06.2007 © A. Poetzsch-Heffter, TU Kaiserslautern 346
(vgl. Wilhelm/Maurer 12.4.1, Seite 584 ff)
Ina Schaefer Selected Aspects of Compilers 104
Register Allocation Evaluation Ordering with Minimal Registers
Example: Code Generation w/Register Allocation
Consider f := (a+b)− (c −(d +e))
Assume that there are two registers R0 and R1 available for the translation.
Result of direct translation:
Beispiel: (Codeerzeugung mit Registerzuteil.)
Betrachte: f:= (a+b)-(c-(d+e)) Betrachte: f:= (a+b) (c (d+e))
Annahme: Zur Übersetzung stehen nur zwei Register zur Verfügung.
Ergebnis der direkten Übersetzung:
R0 := M[a]
R0 := R0 + M[b]
R1 := M[d]
R1 := M[d]
R1 := R1 + M[e]
M[t1] := R1 R1 := M[c]
R1 := R1 – M[t1]
R0 := R0 – R1 M[f] := R0 Ergebnis von Sethi-Ullman:
R0 := M[c]
R1 := M[d]
R1 := R1 + M[e]
R0 := R0 – R1 R1 : M[a]
R1 := M[a]
R1 := R1 + M[b]
R1 := R1 – R0 M[f] := R1
28.06.2007 © A. Poetzsch-Heffter, TU Kaiserslautern 347
Besser, weil ein Befehl weniger und keine Zwischen- Speicherung nötig.
Ina Schaefer Selected Aspects of Compilers 105
Register Allocation Evaluation Ordering with Minimal Registers
Example: Code Generation w/Register Allocation (2)
Result of Sethi-Ullmann-Algorithm:
Betrachte: f:= (a+b)-(c-(d+e)) Betrachte: f:= (a+b) (c (d+e))
Annahme: Zur Übersetzung stehen nur zwei Register zur Verfügung.
Ergebnis der direkten Übersetzung:
R0 := M[a]
R0 := R0 + M[b]
R1 := M[d]
R1 := M[d]
R1 := R1 + M[e]
M[t1] := R1 R1 := M[c]
R1 := R1 – M[t1]
R0 := R0 – R1 M[f] := R0
Ergebnis von Sethi-Ullman:
R0 := M[c]
R1 := M[d]
R1 := R1 + M[e]
R0 := R0 – R1 R1 : M[a]
R1 := M[a]
R1 := R1 + M[b]
R1 := R1 – R0 M[f] := R1
28.06.2007 © A. Poetzsch-Heffter, TU Kaiserslautern 347
Besser, weil ein Befehl weniger und keine Zwischen- Speicherung nötig.
More efficient, because it uses one instruction less and does not need to store intermediate results.
Ina Schaefer Selected Aspects of Compilers 106
Register Allocation Evaluation Ordering with Minimal Registers
Sethi-Ullmann Algorithm
Goal: Minimize number of registers and number of temporaries.
Idea: Generate code for subexpression requiring more registers first.
Procedure:
• Define function regbed that computes the number of registers needed for an expression
• Generate code for an expression E = BinExp(L,OP,R);
Ina Schaefer Selected Aspects of Compilers 107
Sethi-Ullmann Algorithm (2)
We use the following notations:
• v_reg(E): the set of available registers for the translation of E
• v_tmp(E): the set of addresses where values can be stored temporarily when translating E
• cell(E): register/memory cell where the result of E is stored
• vr = | v_reg(E)| denotes the number of available registers
Ina Schaefer Selected Aspects of Compilers 108
Register Allocation Evaluation Ordering with Minimal Registers
Sethi-Ullmann Algorithm (3)
We distinguish the following cases:
1. regbed(L)< vr
2. regbed(L)≥ vr and regbed(R) < vr 3. regbed(L)≥ vr and redbed(R) ≥ vr
Ina Schaefer Selected Aspects of Compilers 109
Sethi-Ullmann Algorithm (4)
Case 1: regbed(L) < vr
• Generate code for R using v_reg(E) and v_tmp(E) with result in cell(R)
• Generate code for L using v_reg(E)\{ cell(R) } and v_tmp(E) with result in cell(L)
• Generate code for the operation cell(L) := cell(L) OP cell(R)
• Set cell(E) = cell(L)
Ina Schaefer Selected Aspects of Compilers 110
Register Allocation Evaluation Ordering with Minimal Registers
Sethi-Ullmann Algorithm (5)
Case 2: regbed(L) ≥ vr and regbed(R) < vr
• Generate code for L using v_reg(E) and v_tmp(E) with result in cell(L)
• Generate code for R using v_reg(E)\{ cell(L) } and v_tmp(E) with result in cell(R)
• Generate code for the operation cell(L) := cell(L) OP cell(R)
• Set cell(E) = cell(L)
Ina Schaefer Selected Aspects of Compilers 111
Sethi-Ullmann Algorithm (6)
Case 3: regbed(L) ≥ vr and redbed(R) ≥ vr
• Generate code for R using v_reg(E) and v_tmp(E) with result in cell(R)
• Generate code M[first(v_tmp(E))] := cell(R)
• Generate code for L using v_reg(E) and rest(v_tmp(E)) with result in cell(L)
• Generate code for the operation cell(L) := cell(L) OP M[first(v_tmp(E))]
• Set cell(E) = cell(L)
Ina Schaefer Selected Aspects of Compilers 112
Register Allocation Evaluation Ordering with Minimal Registers
Sethi-Ullmann Algorithm (7)
Function regbed in MAX Notation (can be realized by S-Attribution):
3. Fall: regbed( L ) ! vr und regbed( R ) ! vr Generiere zunächst Code für R
Generiere zunächst Code für R
unter Verwendung von v_reg(E) und v_tmp(E) mit Ergebnis in zelle(R)
Generiere Code: M[ first(v_tmp(E)) ] := zelle(R) Generiere Code für L
unter Verwendung von v_reg(E) und rest( v_tmp(E) ) mit Ergebnis in zelle(L)
G i C d fü di O ti
Generiere Code für die Operation:
zelle(L) := zelle(L) OP M[ first(v_tmp(E)) ] Setze zelle(E) = zelle(L)
Die Funktion regbed (in MAX-Notation):
ATT regbed( Exp@ E ) Nat:
IF Assign@< Var@ E> : 0 IF Assign@<_,Var@ E> : 0
| BinExp@< Var@ E,_,_> : 1
| BinExp@<_,_,Var@ E > : 0
| BinExp@< L,_, R > E : IF regbed(L)=regbed(R)
THEN regbed(L) + 1
ELSE max( regbed(L), regbed(R) ) ELSE nil // Fall kommt nicht vor
28.06.2007 © A. Poetzsch-Heffter, TU Kaiserslautern 350
(In ML wäre die Definition von regbed etwas
aufwendiger, da der Kontext von Var-Ausdrücken nicht direkt berücksichtigt werden kann.)
Ina Schaefer Selected Aspects of Compilers 113
Example: Sethi-Ullman Algorithm
Consider f:= (( a + b ) - (c + d)) * (a - (d+e)) Attributes:
Beispiel: (Ablauf Sethi-Ullman)
Betrachte: f:= ((a+b)-(c+d)) * (a-(d+e)) Betrachte: f: ((a+b) (c+d)) (a (d+e)) Attribute: v_reg | v_tmp 12T
regbed zelle zelle
Assign
Var
f BinExp
* (3.)
12T 31
T BinExp BinExp
- (1.)
- (1.)
2 2
12 2 12T 1
Var a
BinExp BinExp
BinExp
+ 1 + 1 1 + 1
2 2 12 1 12T 2
Var Var
d Var
d
Var Var Var
b
(2.) (1.) (1.)
1 0 1 0 1 0
28.06.2007 © A. Poetzsch-Heffter, TU Kaiserslautern 351
e d
d
a 1 b 0 c 1 0 1 0
Ina Schaefer Selected Aspects of Compilers 114
Register Allocation Evaluation Ordering with Minimal Registers
Example: Sethi-Ullman Algorithm (2)
Beispiel: (Ablauf Sethi-Ullman)
Betrachte: f:= ((a+b)-(c+d)) * (a-(d+e)) Betrachte: f: ((a+b) (c+d)) (a (d+e)) Attribute: v_reg | v_tmp 12T
regbed zelle zelle
Assign
Var
f BinExp
* (3.)
3
12T 1
T BinExp BinExp
- (1.)
- (1.)
2 2
12 2 12T 1
Var a
BinExp BinExp
BinExp
+ 1 + 1 1 + 1
2 2 12 1 12T 2
Var Var
d Var
d
Var Var Var
b
(2.) (1.) (1.)
1 0 1 0 1 0
28.06.2007 © A. Poetzsch-Heffter, TU Kaiserslautern 351
e d
d
a 1 b 0 c 1 0 1 0
Ina Schaefer Selected Aspects of Compilers 115
Example: Sethi-Ullman Algorithm (3)
For formalizing the algorithm, we realize the set of available registers and addresses for storing temporaries with lists, where
• the list RL of registers is non-empty
• the list AL of addresses is long enough
• the result cell is always a register which is the first in RL, i.e., first(RL)
• the function exchange switches the first two elements of a list, fst returns the first element of the list,
rest returns the tail of the list
Ina Schaefer Selected Aspects of Compilers 116
Register Allocation Evaluation Ordering with Minimal Registers
Example: Sethi-Ullman Algorithm (4)
Remarks:
• The algorithm generates 2AC which is optimal with respect to the number of instructions and the number of temporaries if the
expression has no common subexpressions.
• The algorithm shows the dependency between code generation and register allocation and vice versa.
• In a procedural implementation, register and address lists can be realized by a global stack.
Ina Schaefer Selected Aspects of Compilers 117
Example: Sethi-Ullman Algorithm (5)
In the following, the function expcode for code generation is given in MAX Notation (functional).
Note: The application of the functions exchange, fst and expcode satisfy their preconditions length(RL) > 1 or length(RL) > 0, resp.
Ina Schaefer Selected Aspects of Compilers 118
Register Allocation Evaluation Ordering with Minimal Registers
Example: Sethi-Ullman Algorithm (6)
FCT expcode( Exp@ E, RegList RL, AdrList AL ) CodeList: // pre: length(RL)>0 IF Var@<ID> E:
[ fst(RL) := M[adr(ID)] ]
| BinExp@< L,OP,Var@<ID> > E:
expcode(L,RL,AL)
++ [ fst(RL) := fst(RL) OP M[adr(ID)] ]
| BinExp@< L,OP,R > E:
LET vr == length( RL ) : IF regbed(L) < vr :
expcode(R,exchange(RL),AL) ++ expcode(L,rst(exchange(RL)),AL) ++ [ fst(RL):= fst(RL) OP fst(rst(RL))]
| regbed(L)>=vr AND regbed(R)<vr : expcode(L,RL,AL)
++ expcode(R,rst(RL),AL)
++ [ fst(RL):= fst(RL) OP fst(rst(RL))]
| regbed(L)>=vr AND regbed(R)>=vr : expcode(R,RL,AL)
[ [ f ( ) ] f ( ) ] ++ [ M[ fst(AL) ] := fst(RL) ] ++ expcode(L,RL,rst(AL))
++ [ fst(RL):= fst(RL) OP M[fst(AL)] ] ELSE nil
ELSE []
ELSE []
Beachte:
Die Anwendungen der Funktionen exchange, fst und
28.06.2007 © A. Poetzsch-Heffter, TU Kaiserslautern 353
expcode erfüllen jeweils ihre Vorbedingungen length(RL) > 1 bzw. length(RL) > 0 .
Ina Schaefer Selected Aspects of Compilers 119
Register Allocation by Graph Coloring
Register allocation by graph coloring is a procedure (with many variants) for allocation of registers beyond expressions and basic blocks.
Register Allocation for 3AC
• Input: 3AC in SSA with temporary variables
• Output: Structurally the same SSA with
! registers instead of temporary variables
! additional instructions for storing intermediate results on the stack, if applicable
Ina Schaefer Selected Aspects of Compilers 120
Register Allocation Register Allocation by Graph Coloring
Register Allocation by Graph Coloring (2)
Remarks:
• The SSA representation is not necessary, but simplifies the formulation of the algorithm
(e.g.,Wilhelm/Maurer in Sect. 12.5 do not use SSA)
• It is no restriction that only temporary variables are implemented by registers. We assume that program variables are assigned to temporary variables as well, if appropriate.
Ina Schaefer Selected Aspects of Compilers 121
Life Range and Interference Graph
Definition (Life Range)
The life rangeof a temporary variable is the set of program positions at which it is live.
Definition (Interference)
Two temporary variables interfere if their life ranges have a non-empty intersection
Definition (Interference Graph)
Let P be a program part in 3AC/SSA. Theinterference graph of P is an undirected graph G = (N,E), where
• N is the set of temporary variables
• E is an edge (n1,n2) iff n1 andn2 interfere.
Ina Schaefer Selected Aspects of Compilers 122
Register Allocation Register Allocation by Graph Coloring
Register Allocation by Graph Coloring
Goal: Reduce number of temporary variables with the available registers.
Idea: Translate the problem to graph coloring (NP-complete). Color the interference graph, such that
• neighboring nodes have differing colors.
• no more colors are used than available registers.
Ina Schaefer Selected Aspects of Compilers 123
Register Allocation by Graph Coloring (2)
General Procedure: For coloring the graph, we have two cases:
• If a coloring is found, terminate.
• If nodes could not be colored,
! choose a non-colored node k
! modify the 3AC program such that the value of k is stored temporarily and is first loaded when it is applied
! try to find a new coloring
Termination: The procedure terminates, because by temporarily storing the life ranges and the interferences are reduced. In practice, two or three iterations are sufficient.
Ina Schaefer Selected Aspects of Compilers 124
Register Allocation Register Allocation by Graph Coloring
Register Allocation by Graph Coloring (3)
Coloring Procedure: Let rd be the number of available registers, i.e., for coloring, maximally rn colors may be used.
The coloring procedure consists of the steps:
• (a) Simplify by Marking
• (b) Coloring
Ina Schaefer Selected Aspects of Compilers 125
Simplify by Marking
Remove iteratively nodes with less than rn neighbors from the graph and put them on the stack.
Case 1: The current simplification steps leads to an empty graph.
Continue with coloring.
Case 2: The graph contains only nodes with rn and more than rn neighbors. Choose asuitable node as candidate for storing it temporarily, mark it, put it on the stack and continue simplification.
Ina Schaefer Selected Aspects of Compilers 126
Register Allocation Register Allocation by Graph Coloring
Coloring
The nodes are pushed from the stack in their order and, if possible, colored and put back into the graph.
Let k be the node taken from the stack.
Case1: k is not marked. Thus, it has less than rn neighbors. Then, k can be colored with a new color.
Case2: k is marked.
• a) the rn or more neighbors have less than rn-1 different colors.
Then, color k appropriately.
• b) there are rn or more colors in the neighborhood. Leave k uncolored.
Ina Schaefer Selected Aspects of Compilers 127
Example - Graph Coloring
For simplicity, we only consider one basic block.
In the beginning, t0 and t2 are live.Beispiel: (Graphfärbung)
Einfachheitshalber betrachten wir nur einen Basisblock:
t1 := a + t0 t3 := t2 – 1 t4 := t1 * t3 t5 := b + t0
Am Anfang sind t0, t2 lebendig
0 1 2 3 4 5 6 7 8 9
t5 := b + t0 t6 := c + t0 t7 := d + t4 t8 := t5 + 8 t9 := t8
A E d i d
t2 := t6 + 4 t0 := t7
Am Ende sind t0, t2, t9 leb.
Interferenzgraph:
t4 t5 Interferenzgraph:
t0
t1 t2 t3
t7 t6 t8 t1
t9
Annahme: 4 verfügbare Register
28.06.2007 © A. Poetzsch-Heffter, TU Kaiserslautern 358
g g
Vereinfachung: Eliminiere der Reihe nach t1, t3, t2, t9, t0, t5, t4, t7, t8, t6 In the end, t0, t2, t9 are live.
Ina Schaefer Selected Aspects of Compilers 128
Register Allocation Register Allocation by Graph Coloring
Example - Graph Coloring (2)
Interference graph:
Assumption: 4 available registers
Simplification: Remove (in order) t1, t3, t2, t9, t0, t5, t4, t7, t8, t6
Ina Schaefer Selected Aspects of Compilers 129
Example - Graph Coloring (3)
Possible Coloring:
Fortsetzung des Beispiels:
Möglich Färbung (t1, t3, t2, t9, t0, t5, t4, t7, t8, t6): g g ( , , , , , , , , , ) t4
t5
t0 t2
t3
t5 t7 t6
t8 t1
t9
Bemerkung:
Es gibt eine Reihe von Erweiterungen des Verfahrens:
• Elimination von Move-Befehlen Elimination von Move Befehlen
• Bestimmte Heuristiken bei der Vereinfachung (Was ist ein geeigneter Knoten?)
• Berücksichtigung vorgefärbter Knoten Berücksichtigung vorgefärbter Knoten
Lesen Sie zu Abschnitt 4.3.2:
A l
28.06.2007 © A. Poetzsch-Heffter, TU Kaiserslautern 359
Appel:
• Section 11.1-11.3 , S. 238-251
Ina Schaefer Selected Aspects of Compilers 130
Register Allocation Register Allocation by Graph Coloring
Example - Graph Coloring (4)
Remarks:
There are several extensions of the procedure:
• Elimination of move instructions
• Specific heuristics for simplification (What is asuitable node?)
• Consider pre-colored nodes Recommended Reading:
• Appel, Sec. 11.1 – 11.3
Ina Schaefer Selected Aspects of Compilers 131
Further Aspects of Register Allocation
The introduced algorithms consider subproblems. In practice, there are further aspects, that have to be dealt with for register allocation:
• Interaction with other compiler phases (in particular optimization and code generation)
• Relation between temporaries and registers
• Source/Intermediate/Target Language
• Number of applications (Is a variable inside an inner loop?)
Ina Schaefer Selected Aspects of Compilers 132
Register Allocation Further Aspects of Register Allocation
Further Aspects of Register Allocation (2)
Possible global procedure
• Allocate registers for standard tasks (registers for stack and argument pointers, base registers)
• Decide which variables and parameters should be stored in registers
• Evaluate application frequency of temporaries (Occurrences in inner loops, distribution of accesses over life range)
• Use evaluation together with heuristics of register allocation algorithm
• If applicable, optimize again
Ina Schaefer Selected Aspects of Compilers 133
Code Generation
Code generation can be split into four independent machine-dependent tasks:
• Memory allocation
• Instruction selection and addressing
• Instruction scheduling
• Code optimization
Ina Schaefer Selected Aspects of Compilers 134
Code Generation
Memory Allocation
Modern machines have the following memory hierarchy:
• Registers
• Primary Cache (Instruction Cache, Data Cache)
• Secondary Cache
• Main Memory (Page/Segment Addressing)
Different from registers, the cache is controlled by the hardware.
Efficient usage of the cache means in particular to align data objects and instructions to borders of cache blocks (cf. Appel, Chap. 21). The same holds for main memory.
Ina Schaefer Selected Aspects of Compilers 135
Instruction Selection
Instruction selection aims at the best possible translation of
expressions and basic blocks using the instruction set of the machine, for instance,
• using complex addressing modes
• considering the sizes of constants or the locality of jumps Instruction selection is often formulated as a tree pattern matching problem with costs. (cf. Wilhelm/Maurer, Chap.11)
Ina Schaefer Selected Aspects of Compilers 136
Code Generation
Instruction Scheduling
Modern machines allow processor-local parallel processing (pipeline, super-scalar, VLIW).
In order to use this parallel processing, code has to comply toadditional requirements that have to be considered for code generation.
(see Appel, Chap. 20; Wilhelm/Maurer, Sect. 12.6)
Ina Schaefer Selected Aspects of Compilers 137
Code Optimization
Optimizations of the assembler or machine code may allow an additional increase in program efficiency.
(see Wilhelm/Maurer, Sect. 6.9)
Ina Schaefer Selected Aspects of Compilers 138