• Keine Ergebnisse gefunden

2.2. Context-Free Syntax Analysis

N/A
N/A
Protected

Academic year: 2022

Aktie "2.2. Context-Free Syntax Analysis"

Copied!
76
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Summer Term 2011

Prof. Dr. Arnd Poetzsch-Heffter

Software Technology Group TU Kaiserslautern

c Prof. Dr. Arnd Poetzsch-Heffter 1

Content of Lecture

1. Introduction

2. Syntax and Type Analysis 2.1 Lexical Analysis

2.2 Context-Free Syntax Analysis

2.3 Context-Dependent Syntax Analysis 3. Translation to Target Language

3.1 Translation of Imperative Language Constructs

3.2 Translation of Object-Oriented Language Constructs 4. Selected Aspects of Compilers

4.1 Intermediate Languages 4.2 Optimization

4.3 Data Flow Analysis 4.4 Register Allocation 4.5 Code Generation 5. Garbage Collection

(2)

2.2. Context-Free Syntax Analysis

c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 3

Context-Free Syntax Analysis Introduction

Section outline

1. Specification of parsers 2. Implementation of parsers

2.1 Top-down syntax analysis - Recursive descent - LL(k) parsing theory - LL parser generation 2.2 Bottom-up syntax analysis

- Principles of LR parsing - LR parsing theory

- SLR, LALR, LR(k) parsing - LALR parser generation

3. Error handling

(3)

Task of context-free syntax analysis

Check if token stream (from scanner) matches context-free syntax of language

I if erroneous: error handling

I if correct: construct syntax tree

Parser Token Stream

Abstract / Concrete Syntax Tree

c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 5

Context-Free Syntax Analysis Introduction

Task of context-free syntax analysis (2)

Remarks:

Parsing can be interleaved with other actions processing the program (e.g. attributation).

Syntax tree controls translation. We distinguish

I Concrete syntax tree corresponding to context-free grammar

I Abstract syntax tree providing a more compact representation tailored to subsequent phases

(4)

2.2.1 Specification of Parsers

c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 7

Context-Free Syntax Analysis Specification of Parsers

Specification of parsers

2 general specification techniques

Syntax diagrams

Context-free grammars (often in extended form)

(5)

Context-Free Grammars

Definition Let

N and T be two alphabets with N ∩T = ∅

Π a finite subset of N ×(N ∪T)

S ∈ N

Then, Γ = (N,T,Π,S) is a context-free grammar (CFG) where

N is the set of nonterminals

T is the set of terminals

Π is the set of productions rules

S is the start symbol (axiom)

c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 9

Context-Free Syntax Analysis Specification of Parsers

Context-Free Grammars (2)

Notations:

A,B,C, . . . denote nonterminals

a,b,c, . . . denote terminals

x,y,z, . . . denote strings of terminals, i.e. x ∈ T

α, β, γ, ψ, φ, σ, τ are strings of terminals and nonterminals, i.e.

α ∈ (N ∪T)

Productions are denoted by A →α.

The notation A → α | β | γ |. . . is an abbreviation for A → α, A →β, A → γ, . . .

(6)

Derivation

Let Γ = (N,T,Π,S) be a CFG:

ψ is directly derivable from φ in Γ and φ directly produces ψ, written as φ ⇒ ψ, if there are σ, τ with

σAτ =φ and σατ = ψ and A → α ∈ Π

ψ is derivable from φ in Γ, written as φ ⇒ ψ, if there exist φ0, . . . , φn with φ =φ0 and ψ = φn and φi ⇒ φi+1 for all i ∈ {0, . . . ,n−1}.

φ0, . . . , φn is called a derivation of ψ from φ.

is the reflexive, transitive closure of ⇒.

c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 11

Context-Free Syntax Analysis Specification of Parsers

Derivation (2)

A derivation φ0, . . . , φn is a leftmost derivation (rightmost) if in every derivation step φi ⇒ φi+1 the leftmost (rightmost)

nonterminal in φi is replaced.

Leftmost and rightmost derivation steps are denoted by φ ⇒lm ψ and φ ⇒rm ψ resp.

The tree representation of a derivation is a syntax tree.

L(Γ) = {z ∈ T|S ⇒ z} is the language generated by Γ.

x ∈ L(Γ) is a sentence of Γ (germ. Satz).

φ ∈ (N ∪T) with S ⇒ φ is a sentential form of Γ (germ.

(7)

Derivation (3)

Remarks:

Each derivation corresponds to exactly one syntax tree. In reverse, for each syntax tree, there can be several derivations.

For “syntax tree”, the term “derivation tree” is also used.

For each language, there can be several generating grammars, i.e., the mapping L: Grammar → Language is in general not injective.

c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 13

Context-Free Syntax Analysis Specification of Parsers

Ambiguity in Grammars

A sentence is unambiguous if it has exactly one syntax tree. A sentence is ambiguous if it has more than one syntax tree.

For each syntax tree, there exists exactly on leftmost derivation and exactly one rightmost derivation.

Thus: A sentence is unambiguous iff it has exactly one leftmost (rightmost) derivation.

A grammar is ambiguous if it contains an ambiguous sentence.

For programming languages, unambiguous grammars are

essential, as the semantics and the translation are defined by the syntactic structure.

(8)

Ambiguity in Grammars (2)

Example 1: Grammar Γ0 for expressions:

S → E

E →E +E

E →E ∗E

E →(E)

E →ID

Consider the input string

(av +av)∗bv +cv +dv

resulting in the following input for the context-free analysis (ID +ID)∗ID +ID+ID

c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 15

Context-Free Syntax Analysis Specification of Parsers

Ambiguity in Grammars (3)

Syntax tree for (ID+ID)∗ ID+ID +ID

Beispiele: (Mehrdeutigkeit)

1. Beispiel einer Ausdrucksgrammatik:

!0: S E, E E + E, E E * E, E ( E ), E ID

Betrachte die Eingabe: (av+av) * bv + cv +dv) Eingabe zur kf-Analyse: ( ID + ID ) * ID + ID + ID

S

"

" "

" "

E E E E E ( ID + ID ) * ID + ID + ID

- Syntaxbaum entspricht nicht den üblichen Rechenregeln.

- Es gibt mehrere Syntaxbäume gemäß

Syntax tree does not match conventional rules of arithmetic.

There are several syntax trees according to Γ0 for this input, hence is ambiguous.

(9)

Ambiguity in Grammars (4)

Example 2: Ambiguity in if-then-else construct

if B1 then if B2 then A:= 9 else A:= 7

First Derivation

55

© A. Poetzsch-Heffter, TU Kaiserslautern 25.04.2007

2. Mehrdeutigkeit beim if-then-else-Konstrukt:

if B1 then if B2 then A:=8 else A:= 7

IFTHENELSE

ANW IFTHEN

ANW ANW ZW ZW IF ID THEN IF ID THEN ID EQ CO ELSE ID EQ CO

ZW ZW ANW ANW

IFTHENELSE ANW IFTHEN

c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 17

Context-Free Syntax Analysis Specification of Parsers

Ambiguity in Grammars (5)

Second Derivation

2. Mehrdeutigkeit beim if-then-else-Konstrukt:

if B1 then if B2 then A:=8 else A:= 7

IFTHENELSE

ANW IFTHEN

ANW ANW ZW ZW IF ID THEN IF ID THEN ID EQ CO ELSE ID EQ CO

ZW ZW ANW ANW

IFTHENELSE ANW

IFTHEN

(10)

Context-Free Syntax Analysis Specification of Parsers

Ambiguity as Grammar Property

Ambiguity is a grammar property. The grammar for expressions Γ0 is an example of an ambiguous grammar.

Γ0:

S → E

E → E +E

E → E ∗E

E → (E)

E → ID

57

© A. Poetzsch-Heffter, TU Kaiserslautern 25.04.2007

Die obige Ausdrucksgrammatik

!0: S E, E E + E | E * E | E ( E ) | E ID

ist ein Beispiel für eine mehrdeutige Grammatik:

S E E E E E

ID + ID * ID E E E E

E S

Mehrdeutigkeit ist zunächst einmal eine Grammatik- eigenschaft.

c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 19

Context-Free Syntax Analysis Specification of Parsers

Ambiguity as Grammar Property (2)

But there exists an unambiguous grammar for the same language:

Γ1:

S → E

E → T +E |T

T → F ∗T |F

F → (E)|ID

Aber es gibt eine eindeutige Grammatik für die Sprache:

!1: S E, E T + E | T, T F * T | F, F ( E ) | ID S

E

E E

E F

F

F

F T

T T

T

( ID + ID ) * ID + ID F T

Lesen Sie zu Abschnitt 2.2.1:

Wilhelm, Maurer:

(Es gibt aber auch kontextfreie Sprachen, die nur durch mehrdeutige Grammatiken beschrieben werden.)

c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 20

(11)

Ambiguity as Grammar Property (3)

Remark:

A context-free language for which every grammar is ambiguous is called inherently ambiguous.

There are inherently ambiguous CFLs.

c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 21

Context-Free Syntax Analysis Specification of Parsers

Literature

Recommended reading:

Wilhelm, Maurer: Chapter 8, pp. 271 - 283 (Syntactic Analysis)

Appel: Chapter 3, pp. 40-47

(12)

2.2.2 Implementation of Parsers

c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 23

Context-Free Syntax Analysis Implementation of Parsers

Implementation of parsers

Overview

Top-down parsing

I Recursive descent

I LL parsing

I LL parser generation

Bottom-up parsing

I LR parsing

I LALR, SLR, LR(k) parsing

I LALR parser generation

(13)

Methods for context-free analysis

Manually developed, grammar-specific implementation (error-prone, inflexible)

Backtracking (simple, but inefficient)

Cocke-Younger-Kasami-Algorithm (1967):

I for all CFGs in Chomsky normalform

I based on idea of dynamic programming

I time complexity O(n3)(however linear complexity desired)

Top-down methods: from axiom to word/token stream

Bottom-up methods: from word/token stream to axiom

c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 25

Context-Free Syntax Analysis Implementation of Parsers

Example: Top-down analysis

Top-down analysis leads to leftmost derivation.

Example derivation withBeispiel: (Top-down-Analyse)Γ1:

S

E =>

T + E =>

F * T + E =>

( E ) * T + E =>

( T + E ) * T + E =>

( F + E ) * T + E =>

( ID + E ) * T + E =>

( ID + T ) * T + E =>

( ID + F ) * T + E =>

( ID + ID ) * T + E =>

( ID + ID ) * F + E =>

( ID + ID ) * ID + E =>

( ID + ID ) * ID + T =>

( ID + ID ) * ID + F =>

( ID + ID ) * ID + ID Gemäß !1 :

c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 26

(14)

Example: Bottom-up analysis

Bottom-up analysis leads to rightmost derivation.

Example derivation with Γ1:

© A. Poetzsch-Heffter, TU Kaiserslautern 61 25.04.2007

Beispiel: (Bottom-up-Analyse)

( ID + ID ) * ID + ID <=

( F + ID ) * ID + ID <=

( T + ID ) * ID + ID <=

( T + F ) * ID + ID <=

( T + T ) * ID + ID <=

( T + E ) * ID + ID <=

( E ) * ID + ID <=

F * ID + ID <=

F * F + ID <=

F * T + ID <=

T + ID <=

T + F <=

T + T <=

T + E <=

E <=

S <=

Ergebnis der bu-Analyse ist eine Rechtsableitung.

Gemäß !1 :

c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 27

Context-Free Syntax Analysis Implementation of Parsers

Context-free analysis with linear complexity

Restrictions on grammar (not every CFG has a linear parser)

Use of push-down automata or systems of recursive procedures

Usage of look ahead to remaining input in order to select next production rule to be applied

(15)

Syntax analysis methods and parser generators

Basic knowledge of syntax analysis is essential for use of parser generators.

Parser generators are not always applicable.

Often, error handling has to be done manually.

Methods underlying parser generation is a good example for a generic technique (and a highlight of computer science!).

c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 29

Context-Free Syntax Analysis Implementation of Parsers

2.2.2.1 Top-down syntax analysis

(16)

Top-down syntax analysis

Learning objectives

Understand the general principle of top-down syntax analysis

Be able to implement recursive descent parsing (by example)

Know expressiveness and limitations of top-down parsing

Understand the basic concepts of LL(k) parsing

c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 31

Context-Free Syntax Analysis Implementation of Parsers

Recursive descent parsing

Basic idea

Each nonterminal A is associated with a procedure. This procedure accepts a partial sentence derived from A.

The procedure implements a finite automaton constructed from the productions with A as left-hand side. This automaton is called the item automaton of A.

The recursiveness of the grammar is mapped to mutual recursive procedures such that the stack of higher programing languages is used for handling the recursion.

(17)

Construction of recursive descent parser

Let Γ01 be an CFG accepting w# iff w ∈ L(Γ1), i.e.,

# is used as a special character denoting the end of the input.

Γ01:

S → E#

E →T +E | T

T → F ∗T | F

F → (E) | ID

Construct item automaton for each nonterminal.

c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 33

Context-Free Syntax Analysis Implementation of Parsers

Item automata

S → E#

64

© A. Poetzsch-Heffter, TU Kaiserslautern 25.04.2007

Konstruktion eines Parsers mit der Methode des rekursiven Abstiegs (exemplarisch):

Sei !‘ wie !1, aber mit Randzeichen #, d.h.

S E #, E T + E | T, T F * T | F, F ( E ) | ID Konstruiere für jedes Nichtterminal A den sogenannten Item-Automaten. Er beschreibt die Analyse derjenigen Produktionen, deren linke Seite A ist:

1

[S .E#] [S E.# ] [S E#.]

[E .T+E]

[E .T ]

[T .F*T]

[ T .F ]

[F .(E)]

[F .ID ]

[E T+.E] [E T+E.] [E T.+E]

[E T.]

[ T F.]

[T F.*T] [T F*.T] [T F*T.]

[F ID.]

[F (.E)] [F (E.)] [F (E).]

E #

T + E

F * T

(

ID

E )

E → T +E | T

Konstruktion eines Parsers mit der Methode des rekursiven Abstiegs (exemplarisch):

Sei !‘ wie !1, aber mit Randzeichen #, d.h.

S E #, E T + E | T, T F * T | F, F ( E ) | ID Konstruiere für jedes Nichtterminal A den sogenannten Item-Automaten. Er beschreibt die Analyse derjenigen Produktionen, deren linke Seite A ist:

1

[S .E#] [S E.# ] [S E#.]

[E .T+E]

[E .T ]

[T .F*T]

[ T .F ]

[F .(E)]

[F .ID ]

[E T+.E] [E T+E.]

[E T.+E]

[E T.]

[ T F.]

[T F.*T] [T F*.T] [T F*T.]

[F ID.]

[F (.E)] [F (E.)] [F (E).]

E #

T + E

F * T

(

ID

E )

T → F ∗T | F

Konstruktion eines Parsers mit der Methode des rekursiven Abstiegs (exemplarisch):

Sei !‘ wie !1, aber mit Randzeichen #, d.h.

S E #, E T + E | T, T F * T | F, F ( E ) | ID Konstruiere für jedes Nichtterminal A den sogenannten Item-Automaten. Er beschreibt die Analyse derjenigen Produktionen, deren linke Seite A ist:

1

[S .E#] [S E.# ] [S E#.]

[E .T+E]

[E .T ]

[T .F*T]

[ T .F ]

[F .(E)]

[F .ID ]

[E T+.E] [E T+E.] [E T.+E]

[E T.]

[ T F.]

[T F.*T]

[T F*.T] [T F*T.]

[F (.E)] [F (E.)] [F (E).]

E #

T + E

F * T

(

ID

E )

c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 34

(18)

Context-Free Syntax Analysis Implementation of Parsers

Item automata (2)

F → (E) | ID

64

© A. Poetzsch-Heffter, TU Kaiserslautern 25.04.2007

des rekursiven Abstiegs (exemplarisch):

Sei !‘ wie !1, aber mit Randzeichen #, d.h.

S E #, E T + E | T, T F * T | F, F ( E ) | ID Konstruiere für jedes Nichtterminal A den sogenannten Item-Automaten. Er beschreibt die Analyse derjenigen Produktionen, deren linke Seite A ist:

1

[S .E#] [S E.# ] [S E#.]

[E .T+E]

[E .T ]

[T .F*T]

[ T .F ]

[F .(E)]

[F .ID ]

[E T+.E] [E T+E.] [E T.+E]

[E T.]

[ T F.]

[T F.*T]

[T F*.T] [T F*T.]

[F ID.]

[F (.E)] [F (E.)] [F (E).]

E #

T + E

F * T

(

ID

E )

c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 35

Context-Free Syntax Analysis Implementation of Parsers

Recursive descent parsing procedures

The recursive procedures are constructed from the item automata.

The input is a token stream terminated by #.

The variable currToken contains one token look ahead, i.e., the first symbol of the input rest.

(19)

Recursive descent parsing procedures (2)

Production: S → E#

void S() { E();

if( currToken == ’#’ ) { accept();

} else { error();

} }

c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 37

Context-Free Syntax Analysis Implementation of Parsers

Recursive descent parsing procedures (3)

Production: E →T +E | T

void E() { T();

if( currToken == ’+’ ) { readToken();

E();

} }

Production: T → F ∗T | F

void T() { F();

if( currToken == ’*’ ){

readToken();

T();

}

(20)

Recursive descent parsing procedures (4)

Production: F → (E ) | ID

void F() {

if( currToken == ’(’ ) { readToken();

E();

if( currToken == ’)’ ) { readToken();

} else error();

} else if( currToken == ID ) { readToken();

} else error();

}

c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 39

Context-Free Syntax Analysis Implementation of Parsers

Recursive descent parsing procedures (5)

Remarks:

Recursive descent

I is relatively easy to implement

I can easily be used with other tasks (see following example)

I is a typical example for syntax-directed methods (see also following example)

Example uses one token look ahead.

Error handling is not considered.

(21)

Recursive descent and evaluation

Example: Interpreter for expressions using recursive descent

int env(Ident); // Ident -> int

// local variables imr store intermediate results int S() {

int imr = E();

if (currToken == ’#’) { return imr;

} else { error();

return err_result;

} }

c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 41

Context-Free Syntax Analysis Implementation of Parsers

Recursive descent and evaluation (2)

int E() {

int imr = T();

if( currToken == ’+’ ) { readToken();

return imr + E();

} }

int T() {

int imr := F();

if (currToken == ’*’){

readToken();

return imr * T();

} }

(22)

Recursive descent and evaluation (3)

int F() { int imr;

if (currToken == ’(’){

readToken();

imr := E();

if (currToken == ’)’){

readToken(); return imr;

} else {

error(); return err_result;

}

} else if (currToken == ID) {

readToken(); return env(code(ID));

} else {

error(); return err_result;

} }

c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 43

Context-Free Syntax Analysis Implementation of Parsers

Recursive descent and evaluation (4)

Extension of parser with actions/computations can easily be implemented, but mixes conceptually different phases/tasks and causes programs hard to maintain.

Question: For which grammars does the recursive descent technique work?

LL(k) parsing theory

(23)

LL parsing

Basis for town-down syntax analysis

First “L” refers to reading input from left to right

Second “L” refers to search for leftmost derivations

c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 45

Context-Free Syntax Analysis Implementation of Parsers

LL(k) grammars

Definition (LL(k) grammar)

Let Γ = (N,T,Π,S) be a CFG and k ∈ N.

Γ is an LL(k) grammar if for any two leftmost derivations S ⇒lm uAα ⇒lm uβα ⇒lm ux

and

S ⇒lm uAα ⇒lm uγα ⇒lm uy the following holds:

if prefix(k,x) = prefix(k,y), thenβ = γ

where prefix(k,x) yields the longest prefix of x with length ≤ k.

(24)

LL(k) grammars (2)

Remarks:

A grammar is an LL(k) grammar if for a leftmost derivation with k token look ahead the correct production for the next derivation step can be found.

A language Lk ⊆Σ is LL(k) if there exists an LL(k) grammar Γ with L(Γ) = Lk.

The definition of LL(k) grammars provides no method to test if a grammar has the LL(k) property.

c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 47

Context-Free Syntax Analysis Implementation of Parsers

Non LL(k) grammars

Example 1: Grammar with left recursion Γ2:

S → E#

E →E +T | T

T → T ∗ F | F

F → (E) | ID

Elimination of left recursion:

Replace productions of form A → Aα | β where β does not start with A by A → βA0 and A0 → αA0 | .

(25)

Non LL(k) grammars (2)

Elimination of left recursion: From Γ2 we obtain Γ3. Γ2:

S → E#

E → E +T | T

T → T ∗F | F

F → (E) | ID

Γ3

S → E#

E → TE0

E0 → +TE0 |

T → FT0

T0 → ∗FT |

F → (E) | ID

c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 49

Context-Free Syntax Analysis Implementation of Parsers

Non LL(k) grammars (3)

Example 2: Grammar Γ4 with unlimited look ahead

STM → VAR := VAR | ID(IDLIST)

VAR → ID | ID(IDLIST)

IDLIST → ID | ID,IDLIST

Γ4 is not an LL(k) grammar for any k.

(Proof: cf. Wilhelm, Maurer, Example 8.3.4, p. 319) Transformation to LL(2) grammar Γ04:

STM → ASS_CALL | ID := VAR

ASS_CALL → ID(IDLIST)ASS_CALL_REST

ASS_CALL_REST →:= VAR |

(26)

Non LL(k) grammars (4)

Remarks:

The transformed grammars accept the same language, but generate other syntax trees:

I From a theoretical point of view, this is acceptable.

I From a programming language implementation perspective, this is in general notacceptable.

There are languages L for which no LL(k) grammar Γ exists that generates the language, i.e. L(Γ) = L. (Example: grammar Γ5)

c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 51

Context-Free Syntax Analysis Implementation of Parsers

Non LL(k) grammars (5)

Example 3:

For the following grammar, there is no k such that Γ5 is an LL(k).

S → A | B

A → aAb | 0

B → aBbb | 1

Remark:

For L(Γ5), there exists no LL(k) grammar.

Proof.

Let k be arbitrary, but fixed.

Choose two derivations according to the LL(k) definition and show that, despite of equal prefixes of length k, β and γ are not equal:

S ⇒lm S ⇒lm A ⇒lm ak0bk S ⇒lm S ⇒lm B ⇒lm ak1b2k

(27)

FIRST and FOLLOW sets

Definition

Let Γ = (N,T,Π,S) be a CFG, k ∈ N;

Tk = {u ∈ T | length(u) ≤ k} denotes the set of all prefixes of length at least k. We define:

FIRSTk : (N ∪T) → P(Tk)

FIRSTk(α) = {prefix(k,u)|α ⇒ u}

where prefix(n,u) = u for allu with length(u) ≤ n.

FOLLOWk : (N ∪T) → P(Tk) ß

FOLLOWk(α) = {w |S ⇒ βαγ ∧ w ∈ FIRSTk(γ)}

c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 53

Context-Free Syntax Analysis Implementation of Parsers

FIRST and FOLLOW sets in parse trees

X S

FIRST

k

(X) FOLLOW

k

(X)

(28)

Characterization of LL(1) grammars

Definition (reduced CFG)

A CFG Γ = (N,T,Π,S) is reduced if each nonterminal occurs in a derivation and each nonterminal derives at least one word.

Lemma

A reduced CFG is LL(1) iff for any two productions A → β and A →γ the following holds:

(FIRST1(β)⊕1 FOLLOW1(A)) ∩ (FIRST1(γ)⊕1 FOLLOW1(A)) = ∅ where L11 L2 = {prefix(1,vw)|v ∈ L1,w ∈ L2}

Remark: FIRST and FOLLOW sets are computable, so this criterion can be checked automatically.

c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 55

Context-Free Syntax Analysis Implementation of Parsers

Example: FIRST

k

and FOLLOW

k

Check that the modified expression grammar Γ3 is LL(1).

S → E#

E →TE0

E0 → +TE0 |

T → FT0

T0 → ∗FT |

F → (E) | ID

Compute FIRST1 and FOLLOW1 for each nonterminal.

(29)

Example: FIRST

k

and FOLLOW

k

(2)

F → (E) | ID:

FIRST1((E))⊕1 FOLLOW1(F)∩FIRST1(ID)⊕1 FOLLOW1(F)

= {(} ⊕1 FOLLOW1(F)∩ {ID} ⊕1 FOLLOW1(F)

= ∅

E0 → +TE0 | :

FIRST1(+TE0)⊕1 FOLLOW1(E0)∩ FIRST1()⊕1 FOLLOW1(E0)

= {+} ⊕1 FOLLOW1(E0)∩ {} ⊕1 FOLLOW1(E0)

= {+} ∩ {#,)}

= ∅

T0 → ∗FT | :

FIRST1(∗FT0)⊕1 FOLLOW1(T0)∩FIRST1()⊕FOLLOW1(T0)

= {∗} ⊕1 FOLLOW1(T0)∩ {} ⊕1 FOLLOW1(T0)

= {∗} ∩ {+,#,)}

= ∅

c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 57

Context-Free Syntax Analysis Implementation of Parsers

Proof of LL characterization lemma

Direction from left to right:

Γ is LL(1) implies FIRST-FOLLOW disjointness.

Proof by contradiction:

(“FIRST-FOLLOW intersection non empty” implies “not LL(1)” ) Let A → β and A → γ be two distinct productions of Gamma (β 6= γ) such that the FIRST-FOLLOW intersection is non empty.

Case distinction. We consider three cases:

Case 1: β ⇒ and γ ⇒

In this case, the LL(1) property does not hold for A → β, A → γ.

(30)

Proof of LL characterization lemma (2)

Case 2: β 6⇒

Then, there is a z with length(z) = 1 and

z ∈ ((FIRST1(β)⊕1FOLLOW1(A))∩(FIRST1(γ)⊕1FOLLOW1(A))) Because Γ is reduced, there are two derivations:

S ⇒ ψAα ⇒ ψβα ⇒ ψzx S ⇒ ψAα ⇒ ψγα ⇒ ψzy

and there is a u such that ψ ⇒ u, i.e., there are leftmost derivations

S ⇒lm uAα ⇒lm uβα ⇒lm uzx S ⇒lm uAα ⇒lm uγα ⇒lm uzy

But, prefix(1,zx) = z = prefix(1,zy) contradicts the LL(1) property of Γ.

Case 3: γ 6⇒ : similar to Case 2.

c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 59

Context-Free Syntax Analysis Implementation of Parsers

Proof of LL characterization lemma (3)

Direction from right to left:

FIRST-FOLLOW disjointness implies Γ is LL(1):

Proof:

Consider any two derivations with β 6= γ:

S ⇒lm uAα ⇒lm uβα ⇒lm ux S ⇒lm uAα ⇒lm uγα ⇒lm uy

that is, prefix(1,x) ∈ (FIRST1(β)⊕1 FOLLOW1(A)) and prefix(1,y) ∈ (FIRST1(γ)⊕1 FOLLOW1(A)). Because of FIRST-FOLLOW disjointness, prefix(1,x) 6=prefix(1,y)

(31)

Parser generation for LL(k) languages

LL(k) Parser Generator

Grammar

Table for Push-Down Automaton/

Parser Program

Error:

Grammar is not LL(k)

c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 61

Context-Free Syntax Analysis Implementation of Parsers

Parser generation for LL(k) languages (2)

Remarks:

Use of push-down automata with look ahead

Select production from tables

Advantages over bottom-up techniques in error analysis and error handling

Example system: ANTLR (http://www.antlr.org/) Recommended reading for top-down analysis:

Wilhelm, Maurer: Chapter 8, Sections 8.3.1. to Sections 8.3.4, pp.

312 - 329

(32)

2.2.2.2 Bottom-up syntax analysis

c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 63

Context-Free Syntax Analysis Implementation of Parsers

Bottom-up syntax anaysis

Learning objectives:

General principles of bottom-up syntax analysis

LR(k) analysis

Resolving conflicts in parser generation

Connection between CFGs and push-down automata

(33)

Basic ideas: bottom-up syntax analysis

Bottom-up analysis is more powerful than top-down analysis, since production is chosen at the end of the analysis while in top-down analysis the production is selected up front.

LR: read input from left (L)

and search for rightmost derivations (R)

c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 65

Context-Free Syntax Analysis Implementation of Parsers

Principles of LR parsing

1. Reduce from sentence to axiom according to productions of Γ 2. Reduction yields sentential forms αx with α ∈ (N ∪ T) and

x ∈ T where x is the input rest

3. α has to be a prefix of a right sentential form of Γ. Such prefixes are called viable prefixes. This prefix property has to hold

invariantly during LR parsing to avoid dead ends.

4. Reductions are always made at the leftmost possible position.

More precisely:

(34)

Viable prefix

Definition

Let S ⇒rm βAu ⇒rm βαu be a right sentential form of Γ.

Then α is called ahandle or redex of the right sentential form βαu. Each prefix of βα is a viable prefix of Γ.

c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 67

Context-Free Syntax Analysis Implementation of Parsers

Regularity of viable prefixes

Theorem

The language of viable prefixes of a grammar Γ is regular.

Proof.

Cf. Wilhelm, Maurer Thm. 8.4.1 and Corrollary 8.4.2.1. (pp. 361, 362).

Essential proof steps are illustrated in the following by the construction of the LR-DFA(Γ).

(35)

Examples: towards LR parsing

Consider Γ1

I S aCD

I C b

I D a | b

Analysis of aba can lead to a dead end (cf. lecture).

Considering viable prefixes can avoid this.

c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 69

Context-Free Syntax Analysis Implementation of Parsers

Examples: towards LR parsing (2)

Consider Γ2

I S E#

I E a | (E)| EE

Analysis of ((a))(a)# (cf. lecture) Stack can manage prefixes already read.

(36)

Examples: towards LR parsing (3)

Consider Γ3

I S E#

I E E +T | T

I T ID

Analysis of ID + ID + ID # (cf. lecture)

c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 71

Context-Free Syntax Analysis Implementation of Parsers

LR parsing: shift and reduce actions

Schematic syntax tree for input xay with

α ∈ (N ∪T), a ∈ T, x,y ∈ T and start symbol S:

x a y

! a

Schematischer Syntaxbaum zur Eingabe xay mit a in T, x,y in T* und Startsymbol S:

x a y

! = "#

x a y

!

Lesezeiger

Schiebe Schritt (shift): Reduktionsschritt (reduce):

"$ =>

© A. Poetzsch-Heffter, TU Kaiserslautern 80 26.04.2007

x a y

! a

Lesezeiger

Schematischer Syntaxbaum zur Eingabe xay mit a in T, x,y in T* und Startsymbol S:

x a y

! = "#

Lesezeiger x a y

!

Lesezeiger

Schiebe Schritt (shift): Reduktionsschritt (reduce):

"$ =>

80

© A. Poetzsch-Heffter, TU Kaiserslautern 26.04.2007

x a y

! a

Lesezeiger

Schematischer Syntaxbaum zur Eingabe xay mit a in T, x,y in T* und Startsymbol S:

x a y

! = "#

Lesezeiger x a y

!

Lesezeiger

Schiebe Schritt (shift): Reduktionsschritt (reduce):

"$ =>

Read Pointer

Read Pointer

Read Pointer

c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 72

(37)

Context-Free Syntax Analysis Implementation of Parsers

LR parsing: shift and reduce actions (2)

Shift step:

80

© A. Poetzsch-Heffter, TU Kaiserslautern 26.04.2007

x a y

!a

Lesezeiger

Schematischer Syntaxbaum zur Eingabe xay mit a in T, x,y in T* und Startsymbol S:

x a y

! = "#

Lesezeiger x a y

!

Lesezeiger

Schiebe Schritt (shift): Reduktionsschritt (reduce):

"$=>

© A. Poetzsch-Heffter, TU Kaiserslautern 80 26.04.2007

x a y

! a

Lesezeiger

Schematischer Syntaxbaum zur Eingabe xay mit a in T, x,y in T* und Startsymbol S:

x a y

! = "#

Lesezeiger x a y

!

Lesezeiger

Schiebe Schritt (shift): Reduktionsschritt (reduce):

"$ =>

80

© A. Poetzsch-Heffter, TU Kaiserslautern 26.04.2007

x a y

!a

Lesezeiger

a in T, x,y in T* und Startsymbol S:

x a y

! = "#

Lesezeiger x a y

!

Lesezeiger

Schiebe Schritt (shift): Reduktionsschritt (reduce):

"$=>

Read Pointer

Read Pointer

Read Pointer

Reduce step:

© A. Poetzsch-Heffter, TU Kaiserslautern 80 26.04.2007

x a y

! a

Lesezeiger

Schematischer Syntaxbaum zur Eingabe xay mit a in T, x,y in T* und Startsymbol S:

x a y

! = "#

Lesezeiger x a y

!

Lesezeiger

Schiebe Schritt (shift): Reduktionsschritt (reduce):

"$ =>

© A. Poetzsch-Heffter, TU Kaiserslautern 80 26.04.2007

x a y

! a

Lesezeiger

Schematischer Syntaxbaum zur Eingabe xay mit a in T, x,y in T* und Startsymbol S:

x a y

! = "#

Lesezeiger x a y

!

Lesezeiger

Schiebe Schritt (shift): Reduktionsschritt (reduce):

"$ =>

© A. Poetzsch-Heffter, TU Kaiserslautern 80 26.04.2007

x a y

! a

Lesezeiger

Schematischer Syntaxbaum zur Eingabe xay mit a in T, x,y in T* und Startsymbol S:

x a y

! = "#

Lesezeiger x a y

!

Lesezeiger

Schiebe Schritt (shift): Reduktionsschritt (reduce):

"$ =>

Read Pointer

Read Pointer

Read Pointer

c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 73

Context-Free Syntax Analysis Implementation of Parsers

LR parsing: shift and reduce actions (3)

Problems:

Make sure that all reductions guarantee that the resulting prefix remains a viable prefix.

When to shift? When to reduce? Which production to use?

Solution:

For each grammar Γ construct LR-DFA(Γ) automaton (also called LR(0) automaton), that describes the viable prefixes.

(38)

Construction of LR-DFA

Let Γ = (T,N,Π,S) be a CFG.

For each nonterminal A ∈ N, construct item automaton

Build union of item automata: Start state is the start state of item automaton for S, final states are final states of item automata

Add transitions from each state which contains the dot in front of a nonterminal A to the starting state of the item automaton of A Theorem

The automaton

obtained from LR-DFA(Γ) by declaring all states to be final states exactly accepts the language of viable prefixes of Γ.

c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 75

Context-Free Syntax Analysis Implementation of Parsers

Example: Construction of LR-DFA

Γ3: S → E#, E →!5 : S E +T |E # , E T, T → IDE + T | T , T ID

Beispiel: (Konstruktion eines LR-DEA)

Konstruktion des LR-DEA für

[S .E#] [S E.# ] [S E#.]

[E .E+T]

[E .T ]

[T .ID ]

[E E+.T] [E E+T.]

[E T.]

[T ID.]

E #

E + T

ID

[E E.+T]

T

"

" "

"

Deterministisch machen liefert folgenden Automaten:

c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 76

(39)

Example: Construction of LR-DFA (2)

Power set construction:

83

© A. Poetzsch-Heffter, TU Kaiserslautern 26.04.2007

[S .E#]

[S E.# ]

[S E#.]

[E .E+T]

[E .T ]

[T .ID ]

[E E+.T]

[E E+T.]

[E T.] [T ID.]

E #

+

T

ID Fehler

T

[E E.+T]

bezeichnet Fehlerkanten

q0

q1 q2

q3

q4

q5

q6

Die zuverlässigen Präfixe maximaler Länge:

E# , T , ID , E+ID , E+T

[T .ID ]

ID

Bemerkungen:

• Im Beispiel enthält jeder Endzustand genau eine vollständig gelesene Produktion. Dies ist im Allg.

nicht so.

• Enthält ein Endzustand mehrere vollständig gelesene Produktionen spricht man von einemreduce/reduce- Konflikt.

• Enthält ein Endzustand eine vollständig gelesene und eine unvollständig gelesene Produktion mit einem Terminal nach dem Positionspunkt, spricht man von einem shift/reduce-Konflikt.

q7

Error

Error Transitions

Viable prefixes of maximal length: E#, T, ID, E +ID, E +T

c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 77

Context-Free Syntax Analysis Implementation of Parsers

Example: Construction of LR-DFA (3)

Remarks:

In the example, each final state contains one completely read production, this is in general not the case.

If a final state contains more than one completely read productions, we have a reduce/reduce conflict.

If a final state contains a completely read and an uncompletely read production with a terminal after the dot, we have a

shift/reduce conflict.

(40)

Analysis with LR-DFA

Analysis of ID + ID + ID # with LR-DFA (the viable prefix is underlined)

84

© A. Poetzsch-Heffter, TU Kaiserslautern 26.04.2007

Analyse von ID + ID + ID # mit dem LR-DEA, unterstrichen ist jeweils der zuverlässige Präfix:

ID + ID + ID # <=

T + ID + ID # <=

E + ID + ID # <=

E + T + ID # <=

E + ID # <=

E + T # <=

E # <=

S

Beispiel: (Analyse mit LR-DEA)

Beachte:

• Die Satzformen bestehen immer aus einem zuverlässigen Präfix und der Resteingabe.

• Verwendet man nur den LR-DEA

zur Analyse muss man nach jeder Reduktion die Satzform von Anfang an lesen.

deshalb: verwende Kellerautomaten zur Analyse

c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 79

Context-Free Syntax Analysis Implementation of Parsers

Analysis with LR-DFA (2)

Note:

The sentential forms always consist of a viable prefix and an input rest.

If an LR-DFA is used, after each reduction the sentential form has to be read from the beginning.

Thus: Use pushdown automaton for analysis.

(41)

LR pushdown automaton

Definition

Let Γ = (N,T,Π,S) be a CFG. The LR-DFA pushdown automaton for Γ contains:

a finite set of states Q (the states of the LR-DFA(Γ))

a set of actions Act = {shift,accept,error} ∪ red(Π), where red(Π) contains an action reduce(A → α) for each A → α.

an action table at : Q → Act.

a successor table succ : P ×(N ∪T) → Q with P = {q ∈ Q | at(q) = shift}

c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 81

Context-Free Syntax Analysis Implementation of Parsers

LR pushdown automaton (2)

Remarks:

The LR-DFA pushdown automaton is a variant of pushdown automata particularly designed for LR parsing.

States encode the read left context.

If there are no conflicts, the action table can be directly constructed from the LR-DFA:

I accept: final state of item automaton of start symbol

I reduce: all other final states

I error: error state

I shift: all other states

(42)

Execution of Pushdown Automaton

Configuration: Q ×T where variable stack denotes the sequence of states and variable inr denotes the input rest

Start configuration: (q0,input), where q0 is the start state of the LR-DFA

Interpretation Procedure:

(stack, inr) := (q0,input);

do {

step(stack,inr);

} while( at( top(stack) ) != accept

&& at( top(stack) ) != error );

if( at( top(stack) ) == error ) return error;

with

c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 83

Context-Free Syntax Analysis Implementation of Parsers

Execution of Push-Down Automaton (2)

void step ( var StateSeq stack, var TokenSeq inr) { State tk: = top(stack);

switch( at(tk) ) { case shift:

stack := push ( succ(tk,top(inr)), stack );

inr := tail(inr);

break;

case reduce A -> a:

stack := mpop( length(a), stack );

stack := push( succ(top(stack),A), stack);

break;

} }

Referenzen

ÄHNLICHE DOKUMENTE

If there are more than one token matching the longest input prefix, procedure token nondeterministically returns one of them... Lexical Analysis Implementation

If there are more than one token matching the longest input prefix, procedure token nondeterministically returns one of them... Lexical Analysis Implementation

Extend grammar with productions describing typical error situations, so called error productions. Error messages can be directly associated with

Arnd Poetzsch-Heffter Syntax and Type Analysis 3c. Context-Free Syntax

Lidzbabski, der sich mit der Erforschung östlicher neuaramäischer Manuskripte beschäftigte, bemerkt, daß diese Dialekte in 2 Hauptgruppen.. zerfallen: die tur-abdinische (er nennt

Wir werden einen Algorith- mus angeben, welcher zu jeder gegebenen Sequenz Γ ⇒ ∆ entweder einen Beweis konstruiert, oder aber eine Interpretation findet, welche jede Formel aus Γ,

Wir werden einen Algorith- mus angeben, welcher zu jeder gegebenen Sequenz Γ ⇒ ∆ entweder einen Beweis konstruiert, oder aber eine Interpretation findet, welche jede Formel aus Γ,

Aber auch wenn wir eine Zeile/Spalte entdecken, in der kein Turm steht wissen wir bereits, dass die Matrix keine Losung sein kann: da wir genau acht Turme auf acht Zeilen und