Compiler and Language Processing Tools
Summer Term 2009 Introduction
Dr.-Ing. Ina Schaefer
Software Technology Group TU Kaiserslautern
Ina Schaefer Compilers 1
Introduction
Outline
1. Introduction
Overview and Application Domains Tasks of Language-Processing Tools Examples
2. Language Processing
Terminology and Requirements Compiler Architecture
3. Compiler Construction
Ina Schaefer Compilers 2
Introduction Overview and Application Domains
Language Processing Tools
• Processing of Source Texts in Source Languages
• Analysis of Source Texts
• Translation to Target Languages
Ina Schaefer Compilers 3
Introduction Overview and Application Domains
Language Processing Tools (2)
Typical Source Languages
• Programming Languages: C, C++, C#, Java, ML, Smalltalk, Prolog, Script Languages (JavaScript), bash
• Languages for Configuration Management: make, ant
• Application and Tool-Specific Languages: Excel, JFlex, CUPS
• Specification Languages: Z, CASL, Isabelle/HOL
• Formatting and Data Description Languages: LaTeX, HTML, XML
• Design and Architecture Description Languages: UML, SDL, VHDL, Verilog
Introduction Overview and Application Domains
Language Processing Tools (3)
Typical Target Languages
• Assembly and Machine Languages
• Programming Language
• Data and layout Description Languages
• Languages for Printer Control
Ina Schaefer Compilers 5
Introduction Overview and Application Domains
Language Processing Tools (4)
Language Implementation Tasks
• Tool Support for Language Processing
• Integration into Existing Systems
• Connection to Other Systems
Ina Schaefer Compilers 6
Introduction Overview and Application Domains
Application Domains
• Programming Environments
! Context-sensitive Editors, Class Browers
! Graphical Programming Tools
! Pre-Processors
! Compilers
! Interpreters
! Debuggers
! Run-time Environments (loading, linking, execution, memory management)
Ina Schaefer Compilers 7
Introduction Overview and Application Domains
Application Domains (2)
• Generation of Programs from Design Documents (UML)
• Program Comprehension, Re-engineering
• Design and Implementation of Application-specific Languages
! Robot Control
! Simulation Tools
! Spread Sheets, Active Documents
• Web Technology
! Analysis of Web Sites
! Active Websites (with integrated functionality)
! Abstract Platforms, e.g. JVM, .NET
! Optimization of Caching
Introduction Overview and Application Domains
Related Fields
• Formal Languages, Language Specification and Design
• Programming and Specification Languages
• Programming, Software Engineering, Sotware Generation, Software Architecture
• System Software, Computer Architecture
Ina Schaefer Compilers 9
Introduction Tasks of Language-Processing Tools
Tasks of Language-Processing Tools
Analyser Translation Interpreter Source Code Source Code
Target Code Analysis
Results
Source
Code Input Data
Output Data
Analysis, Translation and Interpretation are often combined.
Ina Schaefer Compilers 10
Introduction Tasks of Language-Processing Tools
Tasks of Language-Processing Tools (2)
1. Translation
! Compiler implements Analysis and Translation
! OS and Real Machine implement Interpretation Pros:
! Most efficient solution
! One interpreter for all programming languages
! Prerequisite for other solutions
Ina Schaefer Compilers 11
Introduction Tasks of Language-Processing Tools
Tasks of Language-Processing Tools (3)
2. Direct Interpretation
! Interpreter implements all tasks.
! Examples: Java Script, Command Line Languages (bash)
! Pros: No translation necessary (but analysis at run-time)
Introduction Tasks of Language-Processing Tools
Tasks of Language-Processing Tools (4)
3. Abstract and Virtual Machines
! Compiler implements Analysis and Translation to Abstract Machine Code
! Abstract Machine works as Interpreter
! Examples: Java/JVM, C# ˙NET Pros:
• Platform independent (portability, mobile code)
• Self-modifing programs possible
4. Other Combinations
Ina Schaefer Compilers 13
Introduction Examples
Example: Analysis
17.04.2007 © A. Poetzsch-Heffter, TU Kaiserslautern 8
package b1_1
;
class Weltklasse extends Superklasse implement BesteBohnen {Qualifikation studieren ( Arbeit schweiss) { return new
Qualifikation ();}}
Beispiel: (Analyse)
javac-Analysator
Superklasse.class Qualifikation.class Arbeit.class
BesteBohnen.class
...
b1_1/Weltklasse.java:4: '{' expected.
extends Superklasse
^ 1 error
Ina Schaefer Compilers 14
Introduction Examples
Example: Translation
17.04.2007 © A. Poetzsch-Heffter, TU Kaiserslautern 9
package b1_1;
class Weltklasse extends Superklasse implements BesteBohnen {
Qualifikation studieren ( Arbeit schweiss ) { return new Qualifikation();
}}
Beispiel 1: (Übersetzung)
javac
Superklasse.class Qualifikation.class Arbeit.class
BesteBohnen.class
...
Compiled from Weltklasse.java class b1_1/Weltklasse
extends ... implements ... { b1_1/Weltklasse();
b1_1.Qualifikation studieren(...);
}
Method b1_1/Weltklasse() ...
Method b1_1.Qualifikation studieren(...) ...
Ina Schaefer Compilers 15
Introduction Examples
Example: Translation (2)
Result of Translation17.04.200710© A. Poetzsch-Heffter, TU Kaiserslautern Beispiel 1:(Fortsetzung)
Compiled from Weltklasse.java class b1_1/Weltklasse
extends b1_1.Superklasse
implements b1_1.BesteBohnen { b1_1/Weltklasse();
b1_1.Qualifikation studieren(b1_1.Arbeit);
}
Method b1_1/Weltklasse() 0 aload_0
1 invokespecial #6 <Method b1_1.Superklasse()>
4 return
Method b1_1.Qualifikation studieren(b1_1.Arbeit) 0 new #2 <Class b1_1.Qualifikation>
3 dup
4 invokespecial #5 <Method b1_1.Qualifikation()>
7 areturn
Introduction Examples
Example 2: Translation
17.04.2007 © A. Poetzsch-Heffter, TU Kaiserslautern 11
int main() {
printf("Willkommen zur Vorlesung!");
return 0;
}
Beispiel 2: (Übersetzung)
gcc
.file "hello_world.c"
.version "01.01"
gcc2_compiled.:
.section .rodata .LC0:
.string "Willkommen zur Vorlesung!"
.text .align 16 .globl main
.type main,@function main:
pushl %ebp movl %esp,%ebp subl $8,%esp ...
Ina Schaefer Compilers 17
Introduction Examples
Example 2: Translation (2)
Result of Translation
17.04.2007 © A. Poetzsch-Heffter, TU Kaiserslautern 12
Beispiel 2: (Fortsetzung)
.file "hello_world.c"
.version "01.01"
gcc2_compiled.:
.section .rodata .LC0:
.string "Willkommen zur Vorlesung!"
.text .align 16 .globl main
.type main,@function main:
pushl %ebp movl %esp,%ebp subl $8,%esp addl $-12,%esp pushl $.LC0 call printf addl $16,%esp xorl %eax,%eax jmp .L2 .p2align 4,,7 .L2:
movl %ebp,%esp popl %ebp ret .Lfe1:
.size main,.Lfe1-main
.ident "GCC: (GNU) 2.95.2 19991024 (release)"
Ina Schaefer Compilers 18
Introduction Examples
Example 3: Translation
17.04.2007 © A. Poetzsch-Heffter, TU Kaiserslautern 13
Beispiel 3: (Übersetzung)
\documentclass{article}
\begin{document}
\vspace*{7cm}
\centerline{\Huge\bf It‘s groovy}
\end{document}
groovy.tex (104 bytes)
...
groovy.dvi (207 bytes, binary)
%!PS-Adobe-2.0
%%Creator: dvips(k) 5.86 ...
%%Title: groovy.dvi ...
groovy.ps (7136 bytes)
latex
dvips
Ina Schaefer Compilers 19
Introduction Examples
Example: Interpretation
Beispiel: (Interpretation)
...
14 iload_1 15 iload_2 16 idiv 17 istore_3 ...
.class-Datei
Eingabedaten
Ausgabedaten ...
14 iload_1 15 iload_2 16 idiv 17 istore_3 ...
Java Virtual Machine (JVM)
Input Data
Output Data .class File
Introduction Examples
Example: Combined Technique
Java Implementation with Just-In-time (JIT) Compiler
17.04.2007 © A. Poetzsch-Heffter, TU Kaiserslautern 15
Kombinierte Implementierungstechnik:
Java-Implementierung mit JIT-Übersetzer
Java-Überset- zungseinheit
javac
Analysator Übersetzer
Eingabedaten
Java Byte Code .class-Datei
Ausgabedaten JIT-Übersetzer
JVM
Maschinencode reale Maschine/Hardware (JIT=Just in time)
Beispiel: (Kombinierte Technik)
Java Source Code Unit
Analyzer Translator
Input Data
Output Data .class file
JIT Translator
Machine Code Real Machine / Hardware
Ina Schaefer Compilers 21
Language Processing Terminology and Requirements
Language Processing: The Translation Task
Translator Source Code
Error Message or Target Code
• Translator (in a broader sense):
Analysis, Optimization and Translation
• Souce Code:
Input (String) for Translator in Syntax of Source Language (SL)
• Target Code:
Output (String) of Translator in Syntax of Target Language (TL)
Ina Schaefer Compilers 22
Language Processing Terminology and Requirements
Phases of Language Processing
• Analysis of Input:
! Program Text
! Specification
! Diagrams
• Dependant on Target of Implementation
! Transformation (XSLT, Refactoring)
! Pretty Printing, Formatting
! Semantic Analysis (Program Comprehension)
! Optimization
! (Actual) Translation
Ina Schaefer Compilers 23
Language Processing Terminology and Requirements
Compile Time vs. Run-time
• Compile Time: during Run-time of Compiler/Translator Static: All Information/Aspects known at Compile Time, e.g.
! Type Checks
! Evaluation of Constant Expressions
! Relative Adresses
• Run Time: during Run-time of Compiled Program
Dynamic: All Information that are not statically known, e.g.
! Allocation of Dynamic Arrays
! Bounds Check of Arrays
! Dynamic Binding of Methods
! Memory Management of Recursive Procedures
For dynamic aspects that cannot be handled at compile time, the compiler generates code that handles these aspects at runtime.
Language Processing Terminology and Requirements
What is a good compiler?
Ina Schaefer Compilers 25
Language Processing Terminology and Requirements
Requirements for Translators
• Error Handling (Static/Dynamic)
• Efficient Target Code
• Choice: Fast Translation with Slow Code vs. Slow Translation with Fast Code
• Semantically Correct Translation
Ina Schaefer Compilers 26
Language Processing Terminology and Requirements
Semantically Correct Translation
Intuitive Definition: Compiled Program behaves according to Language Definition of Source Language.
Formal Definition:
• semSL: SL_Program × SL_Data → SL_Data
• semTL: TL_Program × TL_Data → TL_Data
• compile: SL_Program → TL_Program
• code: SL_Data → TL_Data
• decode: TL_Data → SL_Data Semantic Correctness:
semSL(P,D) = decode(semTL(compile(P), code(D)))
Ina Schaefer Compilers 27
Language Processing Compiler Architecture
Compiler Architecture
Scanner Source Code
as String
Token Stream
Parser
Name and Type Analysis
Translator
Code Generator
Syntax Tree
Decorated Syntax Tree
(Close to SL)
Intermediate Language
Target Code as String
Attribution &
Optimization
Attribution &
Optimization
Peep Hole Optimization
Analysis
Synthesis
Language Processing Compiler Architecture
Properties of Compiler Architectures
• Phases represent Concepts.
• Phases can be interleaved.
• Concrete Layout of Phases depends on Source Language, Target Language and Concrete Design Decisions.
• Phase vs. Pass (Phase can comprise more than one pass.)
• Separate Translation of Program Parts (Interface information must be accessible.)
• Combination with other Architecture Decisions:
Common Intermediate Language
Ina Schaefer Compilers 29
Language Processing Compiler Architecture
Common Intermediate Language
Source
Language 1 Source
Language 2 Source
Language n
Intermediate Language
Target
Language 1 Target
Language 2 Target
Language m ...
...
Ina Schaefer Compilers 30
Language Processing Compiler Architecture
Dimensions of Compiler Construction
• Programming Languages
! Sequential Procedural, Imperative, OO-Languages
! Functional, Logical Languages
! Parallel Languages/Language Constructs
• Target Languages/Machines
! Code for Abstract Machines
! Assembler
! Machine Languages (CISC, RISC, ...)
! Multi Processor Architectures
! Memory Hierarchy
• Translation Tasks: Analysis, Optimization, Synthesis
• Construction Techniques and Tools: Bootstrapping, Generators
• Portability, Specification, Correctness
Ina Schaefer Compilers 31
Compiler Construction
Compiler Construction Techniques
1. Stepwise Construction
! Construction with compiler for different language
! Construction with compiler for different machine
! Bootstrapping
2. Compiler - Compiler: Tools for Compiler Generation
! Scanner Generators (regular expressions)
! Parser Generators (context-free grammars)
! Attribute Evaluation Generators (attribute grammar)
! Code Generator Generators (machine specification)
! Interpreter Generators (semantics of language)
! Other Phase-specific Tools 3. Special Programming Techniques
! General Technique: syntax-driven
! Special Technique: recursive descend
Compiler Construction
Stepwise Construction
Programming typically depends on existing compiler for implemen- tation language. For compiler construction, this does not hold in general.
Source, target and implementation languages of compilers can be denoted in T-Diagrams.
17.04.2007 © A. Poetzsch-Heffter, TU Kaiserslautern 22
Konstruktionstechniken:
Programmierung basiert üblicherweise auf Existenz eines Übersetzers für die Implementierungssprache.
Bei Übersetzerkonstruktion kann davon im Allg. nicht ausgegangen werden; stehe
für einen Übersetzer der in Sprache PS geschrieben ist und QS in ZS übersetzt.
QS
PS
ZS
A. Schrittweise Konstruktion:
1. Konstruktion mit Übersetzer für andere Sprache:
Gesucht: QS-Compiler, Ziel MS, läuft auf MS Annahme: C-Compiler existiert auf Plattform mit
Maschinensprache MS
C
MS
MS QS
C
MS QS
MS
MS zu entwickeln
existiert
durch Übersetzung
T-diagram denotes compiler from QS to ZS written in PS.
Ina Schaefer Compilers 33
Compiler Construction
Construction with compiler for different language
• Given: SL Compiler
• Construct: Compiler for machine language (MS) in MS
• Suppose: C Compiler exists on platform with machine language
17.04.2007 © A. Poetzsch-Heffter, TU Kaiserslautern 22
Konstruktionstechniken:
Programmierung basiert üblicherweise auf Existenz eines Übersetzers für die Implementierungssprache.
Bei Übersetzerkonstruktion kann davon im Allg. nicht ausgegangen werden; stehe
für einen Übersetzer der in Sprache PS geschrieben ist und QS in ZS übersetzt.
QS
PS
ZS
A. Schrittweise Konstruktion:
1. Konstruktion mit Übersetzer für andere Sprache:
Gesucht: QS-Compiler, Ziel MS, läuft auf MS Annahme: C-Compiler existiert auf Plattform mit
Maschinensprache MS
C
MS
MS QS
C
MS QS
MS
MS zu entwickeln
existiert
durch Übersetzung
SL SL
existing
to be developed by
translation
Ina Schaefer Compilers 34
Compiler Construction
Construction with compiler for different machine
• Construct: C Compiler for M1 in M1
• Suppose: C Compiler exists for M2 in M2
• Method: Construct Cross Compiler first First Step
17.04.2007 © A. Poetzsch-Heffter, TU Kaiserslautern 23
2. Konstruktion mit Übersetzer für anderen Rechner:
Gesucht: C-Compiler auf für
Annahme: C-Compiler auf für existiert Methode: realisiere zunächst Cross-Compiler
C C
C
C
MS
1MS
2MS
1MS
2MS
1MS
2MS
1MS
1MS
2MS
2Cross-Compiler
C C
C
C MS
1MS
1MS
21.Schritt:
2.Schritt:
MS
117.04.2007 © A. Poetzsch-Heffter, TU Kaiserslautern 23
2. Konstruktion mit Übersetzer für anderen Rechner:
Gesucht: C-Compiler auf für
Annahme: C-Compiler auf für existiert Methode: realisiere zunächst Cross-Compiler
C C
C
C
MS1
MS2 MS1
MS2
MS1
MS2
MS1
MS1
MS2
MS2
Cross-Compiler
C C
C
C MS1
MS1
MS2
1.Schritt:
2.Schritt:
MS1
Cross Compiler
Cross Compiler
Ina Schaefer Compilers 35
Compiler Construction
Construction with compiler for different machine (2)
Second Step
17.04.2007 © A. Poetzsch-Heffter, TU Kaiserslautern 23
2. Konstruktion mit Übersetzer für anderen Rechner:
Gesucht: C-Compiler auf für
Annahme: C-Compiler auf für existiert Methode: realisiere zunächst Cross-Compiler
C C
C
C
MS
1MS
2MS
1MS
2MS
1MS
2MS
1MS
1MS
2MS
2Cross-Compiler
C C
C
C MS
1MS
1MS
21.Schritt:
2.Schritt:
MS
117.04.2007 © A. Poetzsch-Heffter, TU Kaiserslautern 23
2. Konstruktion mit Übersetzer für anderen Rechner:
Gesucht: C-Compiler auf für
Annahme: C-Compiler auf für existiert Methode: realisiere zunächst Cross-Compiler
C C
C
C
MS
1MS
2MS
1MS
2MS
1MS
2MS
1MS
1MS
2MS
2Cross-Compiler
C C
C
C MS
1MS
1MS
21.Schritt:
2.Schritt:
MS
1Cross Compiler
Cross Compiler
Ina Schaefer Compilers 36
Compiler Construction
Bootstrapping
• Construct: QS Compiler for MS in MS
• Suppose: yet no compiler exists
• Method:
1. Construct partial languageQSi of QS such that
QS0 ⊂ QS1 ⊂QS2 ⊂. . . ⊂QS
2. Implement QS0 Compiler forMS inMS 3. Implement QSi+1 Compiler for MS in QSi 4. Create QSi+1 Compiler for MS inMS
Ina Schaefer Compilers 37
Compiler Construction
Bootstrapping (2)
17.04.2007 © A. Poetzsch-Heffter, TU Kaiserslautern 24
3. Bootstrapping:
Gesucht: QS-Compiler auf MS für MS Annahme: kein Compiler verfügbar
Methode:
1. Entwerfe Teilsprachen von QS mit
2. Implementiere QS -Compiler für MS in MS 3. Implementiere QS -Compiler für MS in QS 4. Erzeuge QS -Compiler für MS in MS
QS1
QSi
QS
durch Erweiterung
QS MS
2
UQS2
UQS1
UQS0
0
i+1
i
0
i+1
QS0 MS MS
MS QS1 MS QS2
QS1
MS
MS QS2 MS QS
QS
MS
MS MS QS
durch Übersetzung von Hand
manually
by extension
by translation
Ina Schaefer Compilers 38
Compiler Construction
Recommended Reading
Wilhelm, Maurer:
• Chap. 1, Introduction (pp. 1–5)
• Chap. 6, Structure of Compilers (pp. 225 – 238) Appel
• Chap. 1, Introduction (pp. 3 – 14)
Ina Schaefer Compilers 39