Compilers and Language Processing Tools
Summer Term 2011
Prof. Dr. Arnd Poetzsch-Heffter
Software Technology Group TU Kaiserslautern
c
Prof. Dr. Arnd Poetzsch-Heffter 1
Content of Lecture
1. Introduction
2. Syntax and Type Analysis 2.1Lexical Analysis
2.2Context-Free Syntax Analysis 2.3Context-Dependent Syntax Analysis 3. Translation to Target Language
3.1Translation of Imperative Language Constructs 3.2Translation of Object-Oriented Language Constructs 4. Selected Aspects of Compilers
4.1Intermediate Languages 4.2Optimization
4.3Data Flow Analysis 4.4Register Allocation 4.5Code Generation 5. Garbage Collection
6. XML Processing (DOM, SAX, XSLT)
c
Prof. Dr. Arnd Poetzsch-Heffter 2
Context-Free Syntax Analysis
2.2. Context-Free Syntax Analysis
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 3
Context-Free Syntax Analysis Introduction
Section outline
1. Specification of parsers 2. Implementation of parsers
2.1Top-down syntax analysis - Recursive descent - LL(k) parsing theory - LL parser generation 2.2Bottom-up syntax analysis
- Principles of LR parsing - LR parsing theory - SLR, LALR, LR(k) parsing - LALR parser generation 3. Error handling
4. Concrete and abstract syntax
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 4
Context-Free Syntax Analysis Introduction
Task of context-free syntax analysis
• Check if token stream (from scanner) matches context-free syntax of language
I if erroneous: error handling
I if correct: construct syntax tree
Parser Token Stream
Abstract / Concrete Syntax Tree
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 5
Context-Free Syntax Analysis Introduction
Task of context-free syntax analysis (2)
Remarks:
•Parsing can be interleaved with other actions processing the program (e.g. attributation).
•Syntax tree controls translation. We distinguish
I Concrete syntax treecorresponding to context-free grammar
I Abstract syntax treeproviding a more compact representation tailored to subsequent phases
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 6
Context-Free Syntax Analysis Specification of Parsers
2.2.1 Specification of Parsers
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 7
Context-Free Syntax Analysis Specification of Parsers
Specification of parsers
2 general specification techniques
•Syntax diagrams
•Context-free grammars (often in extended form)
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 8
Context-Free Grammars
Definition Let
• NandTbe two alphabets withN∩T=∅
• Πa finite subset ofN×(N∪T)∗
• S∈N
Then,Γ = (N,T,Π,S)is acontext-free grammar(CFG) where
• Nis the set of nonterminals
• Tis the set of terminals
• Πis the set of productions rules
• Sis the start symbol (axiom)
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 9
Context-Free Grammars (2)
Notations:
•A,B,C, . . .denote nonterminals
•a,b,c, . . .denote terminals
•x,y,z, . . .denote strings of terminals, i.e.x∈T∗
•α, β, γ, ψ, φ, σ, τare strings of terminals and nonterminals, i.e.
α∈(N∪T)∗
Productions are denoted byA→α.
The notationA→α|β|γ|. . .is an abbreviation for A→α,A→β,A→γ,. . .
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 10
Context-Free Syntax Analysis Specification of Parsers
Derivation
LetΓ = (N,T,Π,S)be a CFG:
• ψisdirectly derivablefromφinΓandφdirectly producesψ, written asφ⇒ψ, if there areσ, τwith
σAτ=φandσατ=ψandA→α∈Π
• ψisderivablefromφinΓ, written asφ⇒∗ψ, if there exist φ0, . . . , φnwithφ=φ0andψ=φnandφi⇒φi+1for all i∈ {0, . . . ,n−1}.
• φ0, . . . , φnis called aderivationofψfromφ.
• ⇒∗is the reflexive, transitive closure of⇒.
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 11
Context-Free Syntax Analysis Specification of Parsers
Derivation (2)
•A derivationφ0, . . . , φnis aleftmostderivation (rightmost) if in every derivation stepφi⇒φi+1the leftmost (rightmost) nonterminal inφiis replaced.
•Leftmost and rightmost derivation steps are denoted byφ⇒lmψ andφ⇒rmψresp.
•The tree representation of a derivation is asyntax tree.
•L(Γ) ={z∈T∗|S⇒∗z}is thelanguagegenerated byΓ.
•x∈L(Γ)is asentenceofΓ(germ.Satz).
•φ∈(N∪T)∗withS⇒∗φis asentential formofΓ(germ.
Satzform).
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 12
Context-Free Syntax Analysis Specification of Parsers
Derivation (3)
Remarks:
• Each derivation corresponds to exactly one syntax tree. In reverse, for each syntax tree, there can be several derivations.
• For “syntax tree”, the term “derivation tree” is also used.
• For each language, there can be several generating grammars, i.e., the mapping L: Grammar→Language is in general not injective.
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 13
Context-Free Syntax Analysis Specification of Parsers
Ambiguity in Grammars
•A sentence isunambiguousif it has exactly one syntax tree. A sentence isambiguousif it has more than one syntax tree.
•For each syntax tree, there exists exactly on leftmost derivation and exactly one rightmost derivation.
•Thus: A sentence is unambiguous iff it has exactly one leftmost (rightmost) derivation.
•A grammar isambiguousif it contains an ambiguous sentence.
•For programming languages, unambiguous grammars are essential, as the semantics and the translation are defined by the syntactic structure.
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 14
Context-Free Syntax Analysis Specification of Parsers
Ambiguity in Grammars (2)
Example 1:GrammarΓ0for expressions:
• S→E
• E→E+E
• E→E∗E
• E→(E)
• E→ID
Consider the input string
(av+av)∗bv+cv+dv resulting in the following input for the context-free analysis
(ID+ID)∗ID+ID+ID
Context-Free Syntax Analysis Specification of Parsers
Ambiguity in Grammars (3)
Syntax treefor(ID+ID)∗ID+ID+ID Beispiele: (Mehrdeutigkeit) 1. Beispiel einer Ausdrucksgrammatik:
!0: S E, E E + E, E E * E, E ( E ), E ID
Betrachte die Eingabe: (av+av) * bv + cv +dv) Eingabe zur kf-Analyse: ( ID + ID ) * ID + ID + ID
S
"
" "
" "
E E E E E ( ID + ID ) * ID + ID + ID
- Syntaxbaum entspricht nicht den üblichen Rechenregeln.
- Es gibt mehrere Syntaxbäume gemäß !0, insbesondere ist die Grammatik mehrdeutig.
•Syntax tree does not match conventional rules of arithmetic.
•There are several syntax trees according toΓ0for this input, henceΓ0is ambiguous.
Context-Free Syntax Analysis Specification of Parsers
Ambiguity in Grammars (4)
Example 2:Ambiguity in if-then-else construct
if B1 then if B2 then A:= 9 else A:= 7 First Derivation
55
© A. Poetzsch-Heffter, TU Kaiserslautern 25.04.2007
2. Mehrdeutigkeit beim if-then-else-Konstrukt:
if B1 then if B2 then A:=8 else A:= 7
IFTHENELSE
ANW IFTHEN
ANW ANW ZW ZW IF ID THEN IF ID THEN ID EQ CO ELSE ID EQ CO
ZW ZW ANW ANW
IFTHENELSE ANW IFTHEN
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 17
Context-Free Syntax Analysis Specification of Parsers
Ambiguity in Grammars (5)
Second Derivation
55
© A. Poetzsch-Heffter, TU Kaiserslautern 25.04.2007
2. Mehrdeutigkeit beim if-then-else-Konstrukt:
if B1 then if B2 then A:=8 else A:= 7
IFTHENELSE
ANW IFTHEN
ANW ANW ZW ZW IF ID THEN IF ID THEN ID EQ CO ELSE ID EQ CO
ZW ZW ANW ANW
IFTHENELSE ANW IFTHEN
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 18
Context-Free Syntax Analysis Specification of Parsers
Ambiguity as Grammar Property
Ambiguity is a grammar property. The grammar for expressionsΓ0is an example of an ambiguous grammar.
Γ0:
•S→E
•E→E+E
•E→E∗E
•E→(E)
•E→ID
57
© A. Poetzsch-Heffter, TU Kaiserslautern 25.04.2007
Beispiel: (Mehrdeutigkeit als Grammatikeig.)
Die obige Ausdrucksgrammatik
!0: S E, E E + E | E * E | E ( E ) | E ID ist ein Beispiel für eine mehrdeutige Grammatik:
S E E E E E ID + ID * ID
E E E E
E S
Mehrdeutigkeit ist zunächst einmal eine Grammatik- eigenschaft.
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 19
Context-Free Syntax Analysis Specification of Parsers
Ambiguity as Grammar Property (2)
But there exists an unambiguous grammar for the same language:
Γ1:
• S→E
• E→T+E|T
• T→F∗T|F
• F→(E)|ID
58
© A. Poetzsch-Heffter, TU Kaiserslautern 25.04.2007
Aber es gibt eine eindeutige Grammatik für die Sprache:
!1: S E, E T + E | T, T F * T | F, F ( E ) | ID S
E
E E
E F
F
F
F T
T T
T
( ID + ID ) * ID + ID F T
Lesen Sie zu Abschnitt 2.2.1:
Wilhelm, Maurer:
• aus Kap. 8, Syntaktische Analyse, die S. 271 - 283 Appel:
• aus Chap. 3, S. 40 - 47
(Es gibt aber auch kontextfreie Sprachen, die nur durch mehrdeutige Grammatiken beschrieben werden.) c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 20
Context-Free Syntax Analysis Specification of Parsers
Ambiguity as Grammar Property (3)
Remark:
• A context-free language for which every grammar is ambiguous is calledinherently ambiguous.
• There are inherently ambiguous CFLs.
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 21
Context-Free Syntax Analysis Specification of Parsers
Literature
Recommended reading:
•Wilhelm, Maurer: Chapter 8, pp. 271 - 283 (Syntactic Analysis)
•Appel: Chapter 3, pp. 40-47
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 22
Context-Free Syntax Analysis Implementation of Parsers
2.2.2 Implementation of Parsers
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 23
Context-Free Syntax Analysis Implementation of Parsers
Implementation of parsers
Overview
•Top-down parsing
I Recursive descent
I LL parsing
I LL parser generation
•Bottom-up parsing
I LR parsing
I LALR, SLR, LR(k) parsing
I LALR parser generation
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 24
Methods for context-free analysis
• Manually developed, grammar-specific implementation (error-prone, inflexible)
• Backtracking (simple, but inefficient)
• Cocke-Younger-Kasami-Algorithm (1967):
I for all CFGs in Chomsky normalform
I based on idea of dynamic programming
I time complexityO(n3)(however linear complexity desired)
• Top-down methods: from axiom to word/token stream
• Bottom-up methods: from word/token stream to axiom
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 25
Example: Top-down analysis
Top-down analysis leads to leftmost derivation.
Example derivation withΓ1:
60
© A. Poetzsch-Heffter, TU Kaiserslautern 25.04.2007
Beispiel: (Top-down-Analyse) S E =>
T + E =>
F * T + E =>
( E ) * T + E =>
( T + E ) * T + E =>
( F + E ) * T + E =>
( ID + E ) * T + E =>
( ID + T ) * T + E =>
( ID + F ) * T + E =>
( ID + ID ) * T + E =>
( ID + ID ) * F + E =>
( ID + ID ) * ID + E =>
( ID + ID ) * ID + T =>
( ID + ID ) * ID + F =>
( ID + ID ) * ID + ID
Ergebnis der td-Analyse ist eine Linksableitung.
Gemäß !1 :
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 26
Context-Free Syntax Analysis Implementation of Parsers
Example: Bottom-up analysis
Bottom-up analysis leads to rightmost derivation.
Example derivation withΓ1:
© A. Poetzsch-Heffter, TU Kaiserslautern 61 25.04.2007
Beispiel: (Bottom-up-Analyse)
( ID + ID ) * ID + ID <=
( F + ID ) * ID + ID <=
( T + ID ) * ID + ID <=
( T + F ) * ID + ID <=
( T + T ) * ID + ID <=
( T + E ) * ID + ID <=
( E ) * ID + ID <=
F * ID + ID <=
F * F + ID <=
F * T + ID <=
T + ID <=
T + F <=
T + T <=
T + E <=
E <=
S <=
Ergebnis der bu-Analyse ist eine Rechtsableitung.
Gemäß !1 :
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 27
Context-Free Syntax Analysis Implementation of Parsers
Context-free analysis with linear complexity
•Restrictions on grammar (not every CFG has a linear parser)
•Use of push-down automata or systems of recursive procedures
•Usage of look ahead to remaining input in order to select next production rule to be applied
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 28
Context-Free Syntax Analysis Implementation of Parsers
Syntax analysis methods and parser generators
• Basic knowledge of syntax analysis is essential for use of parser generators.
• Parser generators are not always applicable.
• Often, error handling has to be done manually.
• Methods underlying parser generation is a good example for a generic technique (and a highlight of computer science!).
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 29
Context-Free Syntax Analysis Implementation of Parsers
2.2.2.1 Top-down syntax analysis
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 30
Context-Free Syntax Analysis Implementation of Parsers
Top-down syntax analysis
Learning objectives
• Understand the general principle of top-down syntax analysis
• Be able to implement recursive descent parsing (by example)
• Know expressiveness and limitations of top-down parsing
• Understand the basic concepts of LL(k) parsing
Context-Free Syntax Analysis Implementation of Parsers
Recursive descent parsing
Basic idea
•Each nonterminal A is associated with a procedure. This procedure accepts a partial sentence derived from A.
•The procedure implements a finite automaton constructed from the productions with A as left-hand side. This automaton is called theitem automatonof A.
•The recursiveness of the grammar is mapped to mutual recursive procedures such that the stack of higher programing languages is used for handling the recursion.
Context-Free Syntax Analysis Implementation of Parsers
Construction of recursive descent parser
LetΓ01be an CFG accepting w# iffw∈L(Γ1), i.e.,
# is used as a special character denoting the end of the input.
Γ01:
• S→E#
• E→T+E |T
• T→F∗T |F
• F→(E)|ID
Constructitem automatonfor each nonterminal.
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 33
Context-Free Syntax Analysis Implementation of Parsers
Item automata
S→E#
© A. Poetzsch-Heffter, TU Kaiserslautern 64 25.04.2007
Konstruktion eines Parsers mit der Methode des rekursiven Abstiegs (exemplarisch):
Sei !‘ wie !1, aber mit Randzeichen #, d.h.
S E #, E T + E | T, T F * T | F, F ( E ) | ID Konstruiere für jedes Nichtterminal A den sogenannten Item-Automaten. Er beschreibt die Analyse derjenigen Produktionen, deren linke Seite A ist:
1
[S .E#] [S E.# ] [S E#.]
[E .T+E]
[E .T ]
[T .F*T]
[ T .F ]
[F .(E)]
[F .ID ]
[E T+.E] [E T+E.] [E T.+E]
[E T.]
[ T F.]
[T F.*T] [T F*.T] [T F*T.]
[F ID.]
[F (.E)] [F (E.)] [F (E).]
E #
T + E
F * T
(
ID
E )
E→T+E |T
64
© A. Poetzsch-Heffter, TU Kaiserslautern 25.04.2007
Konstruktion eines Parsers mit der Methode des rekursiven Abstiegs (exemplarisch):
Sei !‘ wie !1, aber mit Randzeichen #, d.h.
S E #, E T + E | T, T F * T | F, F ( E ) | ID Konstruiere für jedes Nichtterminal A den sogenannten Item-Automaten. Er beschreibt die Analyse derjenigen Produktionen, deren linke Seite A ist:
1
[S .E#] [S E.# ] [S E#.]
[E .T+E]
[E .T ]
[T .F*T]
[ T .F ]
[F .(E)]
[F .ID ]
[E T+.E] [E T+E.] [E T.+E]
[E T.]
[ T F.] [T F.*T]
[T F*.T] [T F*T.]
[F ID.]
[F (.E)] [F (E.)] [F (E).]
E #
T + E
F * T
(
ID
E )
T→F∗T |F
64
© A. Poetzsch-Heffter, TU Kaiserslautern 25.04.2007
Konstruktion eines Parsers mit der Methode des rekursiven Abstiegs (exemplarisch):
Sei !‘ wie !1, aber mit Randzeichen #, d.h.
S E #, E T + E | T, T F * T | F, F ( E ) | ID Konstruiere für jedes Nichtterminal A den sogenannten Item-Automaten. Er beschreibt die Analyse derjenigen Produktionen, deren linke Seite A ist:
1
[S .E#] [S E.# ] [S E#.]
[E .T+E]
[E .T ]
[T .F*T]
[ T .F ]
[F .(E)]
[F .ID ]
[E T+.E] [E T+E.] [E T.+E]
[E T.]
[ T F.]
[T F.*T] [T F*.T] [T F*T.]
[F ID.]
[F (.E)] [F (E.)] [F (E).]
E #
T + E
F * T
(
ID
E )
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 34
Context-Free Syntax Analysis Implementation of Parsers
Item automata (2)
F→(E)|ID
64
© A. Poetzsch-Heffter, TU Kaiserslautern 25.04.2007
Konstruktion eines Parsers mit der Methode des rekursiven Abstiegs (exemplarisch):
Sei !‘ wie !1, aber mit Randzeichen #, d.h.
S E #, E T + E | T, T F * T | F, F ( E ) | ID Konstruiere für jedes Nichtterminal A den sogenannten Item-Automaten. Er beschreibt die Analyse derjenigen Produktionen, deren linke Seite A ist:
1
[S .E#] [S E.# ] [S E#.]
[E .T+E]
[E .T ]
[T .F*T]
[ T .F ]
[F .(E)]
[F .ID ]
[E T+.E] [E T+E.] [E T.+E]
[E T.]
[ T F.] [T F.*T]
[T F*.T] [T F*T.]
[F ID.]
[F (.E)] [F (E.)] [F (E).]
E #
T + E
F * T
(
ID
E )
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 35
Context-Free Syntax Analysis Implementation of Parsers
Recursive descent parsing procedures
•The recursive procedures are constructed from the item automata.
•The input is a token stream terminated by #.
•The variablecurrTokencontains one token look ahead, i.e., the first symbol of the input rest.
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 36
Context-Free Syntax Analysis Implementation of Parsers
Recursive descent parsing procedures (2)
Production:S→E#
void S() { E();
if( currToken == ’#’ ) { accept();
} else { error();
} }
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 37
Context-Free Syntax Analysis Implementation of Parsers
Recursive descent parsing procedures (3)
Production:E→T+E |T
void E() { T();
if( currToken == ’+’ ) { readToken();
E();
} }
Production:T→F∗T |F
void T() { F();
if( currToken == ’*’ ){
readToken();
T();
} } c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 38
Context-Free Syntax Analysis Implementation of Parsers
Recursive descent parsing procedures (4)
Production:F→(E)|ID
void F() {
if( currToken == ’(’ ) { readToken();
E();
if( currToken == ’)’ ) { readToken();
} else error();
} else if( currToken == ID ) { readToken();
} else error();
}
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 39
Context-Free Syntax Analysis Implementation of Parsers
Recursive descent parsing procedures (5)
Remarks:
•Recursive descent
I is relatively easy to implement
I can easily be used with other tasks (see following example)
I is a typical example for syntax-directed methods (see also following example)
•Example uses one token look ahead.
•Error handling is not considered.
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 40
Recursive descent and evaluation
Example:Interpreter for expressions using recursive descent
int env(Ident); // Ident -> int
// local variables imr store intermediate results
int S() { int imr = E();
if (currToken == ’#’) { return imr;
} else { error();
return err_result;
} }
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 41
Recursive descent and evaluation (2)
int E() { int imr = T();
if( currToken == ’+’ ) { readToken();
return imr + E();
} }
int T() { int imr := F();
if (currToken == ’*’){
readToken();
return imr * T();
} }
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 42
Context-Free Syntax Analysis Implementation of Parsers
Recursive descent and evaluation (3)
int F() { int imr;
if (currToken == ’(’){
readToken();
imr := E();
if (currToken == ’)’){
readToken(); return imr;
} else {
error(); return err_result;
}
} else if (currToken == ID) { readToken(); return env(code(ID));
} else {
error(); return err_result;
} } c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 43
Context-Free Syntax Analysis Implementation of Parsers
Recursive descent and evaluation (4)
•Extension of parser with actions/computations can easily be implemented, but mixes conceptually different phases/tasks and causes programs hard to maintain.
•Question: For which grammars does the recursive descent technique work?
→LL(k) parsing theory
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 44
Context-Free Syntax Analysis Implementation of Parsers
LL parsing
• Basis for town-down syntax analysis
• First “L” refers to reading input from left to right
• Second “L” refers to search for leftmost derivations
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 45
Context-Free Syntax Analysis Implementation of Parsers
LL(k) grammars
Definition (LL(k) grammar)LetΓ = (N,T,Π,S)be a CFG andk∈N.
Γis an LL(k) grammar if for any two leftmost derivations S⇒∗lmuAα ⇒lmuβα⇒∗lmux and
S⇒∗lmuAα⇒lmuγα⇒∗lmuy the following holds:
ifprefix(k,x) =prefix(k,y), thenβ=γ whereprefix(k,x)yields the longest prefix ofxwith length≤k.
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 46
Context-Free Syntax Analysis Implementation of Parsers
LL(k) grammars (2)
Remarks:
• A grammar is an LL(k) grammar if for a leftmost derivation with k token look ahead the correct production for the next derivation step can be found.
• AlanguageLk⊆Σ∗isLL(k)if there exists an LL(k) grammarΓ withL(Γ) =Lk.
• The definition of LL(k) grammars provides no method to test if a grammar has the LL(k) property.
Context-Free Syntax Analysis Implementation of Parsers
Non LL(k) grammars
Example 1:Grammar with left recursionΓ2:
•S→E#
•E→E+T |T
•T→T∗F |F
•F→(E)|ID
Elimination of left recursion:
Replace productions of formA→Aα|βwhereβdoes not start withA byA→βA0andA0→αA0|.
Context-Free Syntax Analysis Implementation of Parsers
Non LL(k) grammars (2)
Elimination of left recursion:FromΓ2we obtainΓ3. Γ2:
•S→E#
•E→E+T |T
•T→T∗F |F
•F→(E)|ID
Γ3
•S→E#
•E→TE0
•E0→+TE0|
•T→FT0
•T0→ ∗FT|
•F→(E)|ID
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 49
Context-Free Syntax Analysis Implementation of Parsers
Non LL(k) grammars (3)
Example 2:GrammarΓ4with unlimited look ahead
•STM→VAR:=VAR|ID(IDLIST)
•VAR→ID|ID(IDLIST)
•IDLIST→ID|ID,IDLIST Γ4is not an LL(k) grammar for any k.
(Proof: cf. Wilhelm, Maurer, Example 8.3.4, p. 319) Transformation to LL(2) grammarΓ04:
•STM→ASS_CALL|ID:=VAR
•ASS_CALL→ID(IDLIST)ASS_CALL_REST
•ASS_CALL_REST→:=VAR|
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 50
Context-Free Syntax Analysis Implementation of Parsers
Non LL(k) grammars (4)
Remarks:
• The transformed grammars accept the same language, but generate other syntax trees:
I From a theoretical point of view, this is acceptable.
I From a programming language implementation perspective, this is in generalnotacceptable.
• There are languagesLfor which no LL(k) grammarΓexists that generates the language, i.e.L(Γ) =L. (Example: grammarΓ5)
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 51
Context-Free Syntax Analysis Implementation of Parsers
Non LL(k) grammars (5)
Example 3:
For the following grammar, there is noksuch thatΓ5is an LL(k).
• S→A|B
• A→aAb|0
• B→aBbb|1
Remark:
ForL(Γ5), there exists no LL(k) grammar.
Proof.
Let k be arbitrary, but fixed.
Choose two derivations according to the LL(k) definition and show that, despite of equal prefixes of length k,βandγare not equal:
S⇒∗lmS⇒lmA⇒∗lmak0bk S⇒∗lmS⇒lmB⇒∗lmak1b2k
Then:prefix(k,ak0bk) =ak=prefix(k,ak1b2k), butβ=A6=B=γ.
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 52
Context-Free Syntax Analysis Implementation of Parsers
FIRST and FOLLOW sets
DefinitionLetΓ = (N,T,Π,S)be a CFG,k∈N;
T≤k={u∈T∗|length(u)≤k}
denotes the set of all prefixes of length at leastk.
We define:
• FIRSTk: (N∪T)∗→ P(T≤k) FIRSTk(α) ={prefix(k,u)|α⇒∗u}
whereprefix(n,u) =ufor alluwithlength(u)≤n.
• FOLLOWk: (N∪T)∗→ P(T≤k)ß
FOLLOWk(α) ={w|S⇒∗βαγ∧w∈FIRSTk(γ)}
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 53
Context-Free Syntax Analysis Implementation of Parsers
FIRST and FOLLOW sets in parse trees
X S
FIRST
k(X) FOLLOW
k(X)
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 54
Context-Free Syntax Analysis Implementation of Parsers
Characterization of LL(1) grammars
Definition (reduced CFG)
A CFGΓ = (N,T,Π,S)isreducedif each nonterminal occurs in a derivation and each nonterminal derives at least one word.
Lemma
A reduced CFG is LL(1) iff for any two productions A→βand A→γ the following holds:
(FIRST1(β)⊕1FOLLOW1(A))∩(FIRST1(γ)⊕1FOLLOW1(A)) =∅ where L1⊕1L2={prefix(1,vw)|v∈L1,w∈L2}
Remark:FIRST and FOLLOW sets are computable, so this criterion can be checked automatically.
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 55
Context-Free Syntax Analysis Implementation of Parsers
Example: FIRST
kand FOLLOW
kCheck that the modified expression grammarΓ3is LL(1).
•S→E#
•E→TE0
•E0→+TE0|
•T→FT0
•T0→ ∗FT|
•F→(E)|ID
ComputeFIRST1andFOLLOW1for each nonterminal.
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 56
Example: FIRST
kand FOLLOW
k(2)
• F→(E)|ID:
FIRST1((E))⊕1FOLLOW1(F)∩FIRST1(ID)⊕1FOLLOW1(F)
= {(} ⊕1FOLLOW1(F)∩ {ID} ⊕1FOLLOW1(F)
= ∅
• E0→+TE0|:
FIRST1(+TE0)⊕1FOLLOW1(E0)∩FIRST1()⊕1FOLLOW1(E0)
= {+} ⊕1FOLLOW1(E0)∩ {} ⊕1FOLLOW1(E0)
= {+} ∩ {#,)}
= ∅
• T0→ ∗FT|:
FIRST1(∗FT0)⊕1FOLLOW1(T0)∩FIRST1()⊕FOLLOW1(T0)
= {∗} ⊕1FOLLOW1(T0)∩ {} ⊕1FOLLOW1(T0)
= {∗} ∩ {+,#,)}
= ∅
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 57
Proof of LL characterization lemma
•Direction from left to right:
Γis LL(1) implies FIRST-FOLLOW disjointness.
Proof by contradiction:
(“FIRST-FOLLOW intersection non empty” implies “not LL(1)” ) LetA→βandA→γbe two distinct productions ofGamma (β6=γ) such that the FIRST-FOLLOW intersection is non empty.
Case distinction. We consider three cases:
Case 1:β⇒∗andγ⇒∗
In this case, the LL(1) property does not hold forA→β,A→γ.
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 58
Context-Free Syntax Analysis Implementation of Parsers
Proof of LL characterization lemma (2)
Case 2:β6⇒∗
Then, there is azwithlength(z) =1 and
z∈((FIRST1(β)⊕1FOLLOW1(A))∩(FIRST1(γ)⊕1FOLLOW1(A))) BecauseΓis reduced, there are two derivations:
S⇒∗ψAα⇒ψβα⇒∗ψzx S⇒∗ψAα⇒ψγα⇒∗ψzy
and there is ausuch thatψ⇒∗u, i.e., there are leftmost derivations
S⇒∗lmuAα⇒lmuβα⇒∗lmuzx S⇒∗lmuAα⇒lmuγα⇒∗lmuzy But,prefix(1,zx) =z=prefix(1,zy)contradicts the LL(1) property ofΓ.
Case 3:γ6⇒∗: similar to Case 2.
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 59
Context-Free Syntax Analysis Implementation of Parsers
Proof of LL characterization lemma (3)
•Direction from right to left:
FIRST-FOLLOW disjointness impliesΓis LL(1):
Proof:
Consider any two derivations withβ6=γ:
S⇒∗lmuAα⇒lmuβα⇒∗lmux S⇒∗lmuAα⇒lmuγα⇒∗lmuy that is,prefix(1,x)∈(FIRST1(β)⊕1FOLLOW1(A))and prefix(1,y)∈(FIRST1(γ)⊕1FOLLOW1(A)). Because of FIRST-FOLLOW disjointness,prefix(1,x)6=prefix(1,y)
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 60
Context-Free Syntax Analysis Implementation of Parsers
Parser generation for LL(k) languages
LL(k) Parser Generator Grammar
Table for Push-Down Automaton/
Parser Program
Error:
Grammar is not LL(k)
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 61
Context-Free Syntax Analysis Implementation of Parsers
Parser generation for LL(k) languages (2)
Remarks:
•Use of push-down automata with look ahead
•Select production from tables
•Advantages over bottom-up techniques in error analysis and error handling
Example system: ANTLR (http://www.antlr.org/) Recommended reading for top-down analysis:
•Wilhelm, Maurer: Chapter 8, Sections 8.3.1. to Sections 8.3.4, pp.
312 - 329
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 62
Context-Free Syntax Analysis Implementation of Parsers
2.2.2.2 Bottom-up syntax analysis
Context-Free Syntax Analysis Implementation of Parsers
Bottom-up syntax anaysis
Learning objectives:
•General principles of bottom-up syntax analysis
•LR(k) analysis
•Resolving conflicts in parser generation
•Connection between CFGs and push-down automata
Context-Free Syntax Analysis Implementation of Parsers
Basic ideas: bottom-up syntax analysis
• Bottom-up analysis is more powerful than top-down analysis, since production is chosen at the end of the analysis while in top-down analysis the production is selected up front.
• LR: read input from left (L)
and search for rightmost derivations (R)
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 65
Context-Free Syntax Analysis Implementation of Parsers
Principles of LR parsing
1. Reduce from sentence to axiom according to productions ofΓ 2. Reduction yields sentential forms αx withα∈(N∪T)∗and
x∈T∗wherexis the input rest
3. αhas to be a prefix of a right sentential form ofΓ. Such prefixes are called viable prefixes. This prefix property has to hold invariantly during LR parsing to avoid dead ends.
4. Reductions are always made at the leftmost possible position.
More precisely:
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 66
Context-Free Syntax Analysis Implementation of Parsers
Viable prefix
Definition
Let S⇒∗rmβAu⇒rmβαu be a right sentential form ofΓ.
Thenαis called ahandleorredexof the right sentential formβαu. Each prefix ofβαis aviable prefixofΓ.
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 67
Context-Free Syntax Analysis Implementation of Parsers
Regularity of viable prefixes
Theorem
The language of viable prefixes of a grammarΓis regular.
Proof.
Cf. Wilhelm, Maurer Thm. 8.4.1 and Corrollary 8.4.2.1. (pp. 361, 362).
Essential proof steps are illustrated in the following by the construction of the LR-DFA(Γ).
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 68
Context-Free Syntax Analysis Implementation of Parsers
Examples: towards LR parsing
• ConsiderΓ1 I S→aCD
I C→b
I D→a|b
Analysis of aba can lead to a dead end (cf. lecture).
Considering viable prefixes can avoid this.
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 69
Context-Free Syntax Analysis Implementation of Parsers
Examples: towards LR parsing (2)
•ConsiderΓ2
I S→E#
I E→a|(E)|EE
Analysis of ((a))(a)# (cf. lecture) Stack can manage prefixes already read.
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 70
Context-Free Syntax Analysis Implementation of Parsers
Examples: towards LR parsing (3)
• ConsiderΓ3 I S→E#
I E→E+T|T
I T→ID
Analysis of ID + ID + ID # (cf. lecture)
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 71
Context-Free Syntax Analysis Implementation of Parsers
LR parsing: shift and reduce actions
Schematic syntax tree for input xay with α∈(N∪T)∗,a∈T,x,y∈T∗and start symbolS:
80
© A. Poetzsch-Heffter, TU Kaiserslautern 26.04.2007
x a y
!a
Lesezeiger
Schematischer Syntaxbaum zur Eingabe xay mit a in T, x,y in T* und Startsymbol S:
x a y
! ="#
Lesezeiger x a y
!
Lesezeiger
Schiebe Schritt (shift): Reduktionsschritt (reduce):
"$=>
80
© A. Poetzsch-Heffter, TU Kaiserslautern 26.04.2007
x a y
!a
Lesezeiger
Schematischer Syntaxbaum zur Eingabe xay mit a in T, x,y in T* und Startsymbol S:
x a y
! = "#
Lesezeiger x a y
!
Lesezeiger
Schiebe Schritt (shift): Reduktionsschritt (reduce):
"$ =>
© A. Poetzsch-Heffter, TU Kaiserslautern 80 26.04.2007
x a y
!a
Lesezeiger
Schematischer Syntaxbaum zur Eingabe xay mit a in T, x,y in T* und Startsymbol S:
x a y
! ="#
Lesezeiger x a y
!
Lesezeiger
Schiebe Schritt (shift): Reduktionsschritt (reduce):
"$=>
Read Pointer
Read Pointer
Read Pointer
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 72
Context-Free Syntax Analysis Implementation of Parsers
LR parsing: shift and reduce actions (2)
Shift step:
80
© A. Poetzsch-Heffter, TU Kaiserslautern 26.04.2007
x a y
!a
Lesezeiger
Schematischer Syntaxbaum zur Eingabe xay mit a in T, x,y in T* und Startsymbol S:
x a y
! ="#
Lesezeiger x a y
!
Lesezeiger
Schiebe Schritt (shift): Reduktionsschritt (reduce):
"$=>
© A. Poetzsch-Heffter, TU Kaiserslautern 80 26.04.2007
x a y
!a
Lesezeiger
Schematischer Syntaxbaum zur Eingabe xay mit a in T, x,y in T* und Startsymbol S:
x a y
! ="#
Lesezeiger x a y
!
Lesezeiger
Schiebe Schritt (shift): Reduktionsschritt (reduce):
"$=>
80
© A. Poetzsch-Heffter, TU Kaiserslautern 26.04.2007
x a y
!a
Lesezeiger
Schematischer Syntaxbaum zur Eingabe xay mit a in T, x,y in T* und Startsymbol S:
x a y
! ="#
Lesezeiger x a y
!
Lesezeiger
Schiebe Schritt (shift): Reduktionsschritt (reduce):
"$=>
Read Pointer
Read Pointer
Read Pointer
Reduce step:
80
© A. Poetzsch-Heffter, TU Kaiserslautern 26.04.2007
x a y
!a
Lesezeiger
Schematischer Syntaxbaum zur Eingabe xay mit a in T, x,y in T* und Startsymbol S:
x a y
! ="#
Lesezeiger x a y
!
Lesezeiger
Schiebe Schritt (shift): Reduktionsschritt (reduce):
"$=>
80
© A. Poetzsch-Heffter, TU Kaiserslautern 26.04.2007
x a y
!a
Lesezeiger
Schematischer Syntaxbaum zur Eingabe xay mit a in T, x,y in T* und Startsymbol S:
x a y
! ="#
Lesezeiger x a y
!
Lesezeiger
Schiebe Schritt (shift): Reduktionsschritt (reduce):
"$=>
80
© A. Poetzsch-Heffter, TU Kaiserslautern 26.04.2007
x a y
!a
Lesezeiger
Schematischer Syntaxbaum zur Eingabe xay mit a in T, x,y in T* und Startsymbol S:
x a y
! ="#
Lesezeiger x a y
!
Lesezeiger
Schiebe Schritt (shift): Reduktionsschritt (reduce):
"$=>
Read Pointer
Read Pointer
Read Pointer
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 73
Context-Free Syntax Analysis Implementation of Parsers
LR parsing: shift and reduce actions (3)
Problems:
•Make sure that all reductions guarantee that the resulting prefix remains a viable prefix.
•When to shift? When to reduce? Which production to use?
Solution:
For each grammarΓconstruct LR-DFA(Γ) automaton (also called LR(0) automaton), that describes the viable prefixes.
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 74
Context-Free Syntax Analysis Implementation of Parsers
Construction of LR-DFA
LetΓ = (T,N,Π,S)be a CFG.
• For each nonterminalA∈N, construct item automaton
• Build union of item automata: Start state is the start state of item automaton for S, final states are final states of item automata
• Addtransitions from each state which contains the dot in front of a nonterminalAto the starting state of the item automaton ofA Theorem
The automaton
obtained from LR-DFA(Γ) by declaring all states to be final states exactly accepts the language of viable prefixes ofΓ.
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 75
Context-Free Syntax Analysis Implementation of Parsers
Example: Construction of LR-DFA
Γ3:S→E#,E→E+T|T,T→ID
82
© A. Poetzsch-Heffter, TU Kaiserslautern 26.04.2007
!5 : S E # , E E + T | T , T ID Beispiel: (Konstruktion eines LR-DEA) Konstruktion des LR-DEA für
[S .E#] [S E.# ] [S E#.]
[E .E+T]
[E .T ]
[T .ID ]
[E E+.T] [E E+T.]
[E T.]
[T ID.]
E #
E + T
ID [E E.+T]
T
"
" "
"
Deterministisch machen liefert folgenden Automaten:
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 76
Context-Free Syntax Analysis Implementation of Parsers
Example: Construction of LR-DFA (2)
Power set construction:
83
© A. Poetzsch-Heffter, TU Kaiserslautern 26.04.2007
[S .E#]
[S E.# ]
[S E#.] [E .E+T]
[E .T ] [T .ID ]
[E E+.T]
[E E+T.]
[E T.] [T ID.]
E #
+
T
ID Fehler
T
[E E.+T]
bezeichnet Fehlerkanten q0
q1 q2
q3
q4
q5
q6
Die zuverlässigen Präfixe maximaler Länge:
E# , T , ID , E+ID , E+T
[T .ID ] ID
Bemerkungen:
• Im Beispiel enthält jeder Endzustand genau eine vollständig gelesene Produktion. Dies ist im Allg.
nicht so.
• Enthält ein Endzustand mehrere vollständig gelesene Produktionen spricht man von einemreduce/reduce- Konflikt.
• Enthält ein Endzustand eine vollständig gelesene und eine unvollständig gelesene Produktion mit einem Terminal nach dem Positionspunkt, spricht man von einemshift/reduce-Konflikt.
q7
Error
Error Transitions
Viable prefixes of maximal length:E#,T,ID,E+ID,E+T
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 77
Context-Free Syntax Analysis Implementation of Parsers
Example: Construction of LR-DFA (3)
Remarks:
•In the example, each final state contains one completely read production, this is in general not the case.
•If a final state contains more than one completely read productions, we have areduce/reduce conflict.
•If a final state contains a completely read and an uncompletely read production with a terminal after the dot, we have a shift/reduce conflict.
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 78
Context-Free Syntax Analysis Implementation of Parsers
Analysis with LR-DFA
Analysis of ID + ID + ID # with LR-DFA (the viable prefix is underlined)
Analyse von ID + ID + ID # mit dem LR-DEA, unterstrichen ist jeweils der zuverlässige Präfix:
ID + ID + ID # <=
T + ID + ID # <=
E + ID + ID # <=
E + T + ID # <=
E + ID # <=
E + T # <=
E # <=
S Beispiel: (Analyse mit LR-DEA)
Beachte:
• Die Satzformen bestehen immer aus einem zuverlässigen Präfix und der Resteingabe.
• Verwendet man nur den LR-DEA
zur Analyse muss man nach jeder Reduktion die Satzform von Anfang an lesen.
deshalb: verwende Kellerautomaten zur Analyse
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 79
Context-Free Syntax Analysis Implementation of Parsers
Analysis with LR-DFA (2)
Note:
•The sentential forms always consist of a viable prefix and an input rest.
•If an LR-DFA is used, after each reduction the sentential form has to be read from the beginning.
Thus: Use pushdown automaton for analysis.
c
Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 80