Lex File Generation - Common Parts of the Syntax

Part 5: RSDL Reference Implementation

5.3 Implementation of the Syntax Representations

5.3.2 Common Parts of the Syntax

5.3.3.2 Lex File Generation

The lex file generation involves several steps, namely lex macro generation for all lexical rules, keyword rules generation and finally lex token rules generation for all tokens as defined by the procedure from the previous chapter. The lex file generation starts with the generation of a predefined part.

%view ast2l, lexdefs, kwdefs, upper, lower;

Spec( syn, * )

-> [ast2l: "%{\n" "#include <stdio.h>\n#include <string.h>\n\n"

"extern int yflineno;\n"

"{ yylval.yt_AS0_rule=AS0_TOKEN(mkcasestring(yytext)); }\n"

"#endif DEBUG\n\n"

"#define YY_INPUT(buf,result,max_size) \\\n"

" { int c=preyylex(); \\\n"

" result = (c==EOF)?YY_NULL:(buf[0]=c, 1); \\\n"

" }\n\n"

"%}\n\n"

The next step is the actual generation. It starts with a cross reference generation.

syn:xref syn:lexdefs "\n%%\n\n{NOTE}\t;\n" syn:kwdefs syn

The footer is again predefined.

"{SPACE}\t;\n"

".\t{ yyerror(\"invalid character\"); }\n"

"\n%%\n\n"

The generation of ^lex definitions from the lexis BNF is straightforward. It is merely another output format for the lexis BNF, namely regular expressions.

Conssyntax(h,t) -> [lexdefs: t h ];

Rule(*,name,e=Consexpression( *, Nilexpression() )) -> [lexdefs: name:ucname "\t" e "\n" ];

Rule(*,name,e=Consexpression(Consserial(*, Nilserial()),*)) -> [lexdefs: name:ucname "\t" e "\n" ];

Rule(*,name,e=Consexpression(Consserial(Terminal(*),Consserial(Terminal(*),*)),*)) -> [lexdefs: name:ucname "\t" e "\n" ];

Rule(*,name,Consexpression(Consserial(Terminal(t),Nilserial()),Nilexpression())) -> [lexdefs: name:ucname "\t[" t:upper "]\n" ];

Token(*) -> [lexdefs: ];

Rule(*, name, e)

-> [lexdefs: name:ucname "\t" e "\n" ];

Consexpression( head, tail ) -> [lexdefs: tail "|" head ];

Consexpression( head, Nilexpression()) -> [lexdefs: head ];

AnyAtom( a )

-> [lexdefs: a "*" ];

Terminal(name) -> [lexdefs: name ];

Nonterminal(name)

-> [lexdefs: "{" name:ucname "}" ];

SetAtom(a)

->[lexdefs: "set-error(" a ")" ];

NonZeroAtom(a) ->[lexdefs: a "+" ];

ZeroOneExpression(a) ->[lexdefs: a "?" ];

SubExpression(e)

->[lexdefs: "(" e ")" ];

The next step is the generation of ^lex rules for the lexis BNF keywords. For every keyword a rule with the lower case and the upper case variant is generated.

/* Handling of the keywords */

Conssyntax( head, Conssyntax(Token(*), *) ) -> [kwdefs: head ];

Conssyntax( *, tail) -> [kwdefs: tail ];

Rule( *, *, Consexpression( *, Nilexpression() )),

Rule(*,*,Consexpression(Consserial(Terminal(*),Nilserial()),Nilexpression())), Rule( *, *, * )

-> [kwdefs: ];

Rule( *, *, expr=Consexpression( Consserial( *, Nilserial() ), * )), Rule( *, *, expr=Consexpression( Consserial( Terminal(*), * ), * )) -> [kwdefs: expr ];

Consexpression( head, tail ) -> [kwdefs: tail head ];

Consserial( head, Nilserial() ) -> [kwdefs: head ];

c=Consserial( Terminal(t), * )

-> [kwdefs: { if(islower(t->name[0])) }

${ c:lower "|" c:upper "\t{ return token(" c:upper "); }\n" $} ];

Token( * ), AnyAtom( * ), SetAtom( * ), NonZeroAtom( * ), ZeroOneExpression( * ), SubExpression( * ), Terminal(*)

->[kwdefs: ];

Nonterminal( name ), PrefixedNT( *, name )

->[kwdefs: ${ { symbol s=NT(name); } (symbol)s $} ];

s=NT( * )

->[kwdefs: s->refersto ];

The generation of the lex rules follows the same algorithm as the token generation. The difference is in the generated string. For every token tok we generate a line {TOK} { return token(TOK); }.

Conssyntax( head, Conssyntax(Token(*), *) ) -> [ast2l: head ];

Conssyntax( *, tail) -> [ast2l: tail ];

Rule(*,name,Consexpression(Consserial(Terminal(*),Nilserial()),Nilexpression())) -> [ast2l: "{" name:ucname "}" "\t{ return token(*yytext); }\n" ];

Rule( *, *, expr=Consexpression( Consserial( *, Nilserial() ), * )) -> [ast2l: expr ];

Rule( *, *, expr=Consexpression( Consserial( Terminal(*), * ), * )) -> [ast2l: expr ];

Rule( *, name, * )

-> [ast2l: "{" name:ucname "}" "\t{ return token(L_" name:cname "); }\n" ];

Consexpression( head, tail ) -> [ast2l: tail head ];

Consserial( head, Nilserial() ) -> [ast2l: head ];

Consserial( Terminal(*), * ) -> [ast2l: ];

Token( * ), AnyAtom( * ), SetAtom( * ), NonZeroAtom( * ), ZeroOneExpression( * ), SubExpression( * ) ->[ast2l: ];

Nonterminal( name ), PrefixedNT( *, name )

->[ast2l: ${ { symbol s=NT(name); } (symbol)s $} ];

Terminal(t)

->[ast2l: "error(" t ")\n" ];

s=NT( * )

->[ast2l: s->refersto ];

Finally, we inspect the generated lex file.

First, there are predefined things: declaration of external functions used and debugging support.

#include <stdio.h>

#include <string.h>

extern int yflineno;

extern int preyylex();

extern int yyparse();

extern void yyerror();

#ifdef DEBUG

#define token(x) (int) #x

#else

#include "k.h"

#include "rsdl-cs.h"

#define token(x) x

#define YY_USER_ACTION { yylval.yt_AS0_rule=AS0_TOKEN(mkcasestring(yytext)); }

#endif DEBUG

Please note the declaration of the macro YY_USER_ACTION, which is called whenever a token is analysed. The code given here generates in this case always an AS0 token with an embedded case string containing the token text.

The next step is to define the connection to the pre-lexical part. This is simply done calling the function preyylex() as generated by ^lex from the prelexic-file.

#define YY_INPUT(buf,result,max_size) \ { int c=preyylex(); \

result = (c==EOF)?YY_NULL:(buf[0]=c, 1); \ }

This concludes the predefined part. The next part is formed by the definition of the macros for all of the BNF rules of the lexis BNF. Please note, that there is a rule for each of the BNF rules regardless if they are used or not. Examples for entities that are not used further are LEXICAL_UNIT and KEYWORD. The first of these is not used because it is just a container for declaring the lexical units and the second one because keywords have a special status and are tokens each. Please note that some of the lines have been too long - they have been cut

using \. They are indented by two spaces on the next line. For brevity, some lines are omitted (indicated by

This concludes the first part of the lex file. The next part contains the declaration what the lexical units are and which value to return. The first element is the lexical deletion of <note>. Whenever note is analysed, it is skipped. Please note that there is another rule for <note> further down stating that the token <note> has to be returned. However, the first rule takes precedence and no token <note> will ever reach the parser. The flex tool will generate a warning that the second <note> rule is not reachable.

{NOTE} ;

active|ACTIVE { return token(ACTIVE); } and|AND { return token(AND); }

...

xor|XOR { return token(XOR); } {NAME} { return token(L_name); }

{CHARACTER_STRING} { return token(L_character_string); } {NOTE} { return token(L_note); }

{CONCATENATION_SIGN} { return token(L_concatenation_sign); }

{GREATER_THAN_OR_EQUALS_SIGN} { return token(L_greater_than_or_equals_sign); } ...

{GREATER_THAN_SIGN} { return token(*yytext); } {SPACE} ;

. { yyerror("invalid character"); }

This concludes the second part of the lex file. Please note the last rule saying that any other character not covered by the rules above is regarded to be illegal. The yyerror routine is defined within prelexic, because this is the only place where line numbers are known. The input to the generated lexer does already contain only spaces instead of special symbols.

The last part contains some more debugging support.

#ifdef DEBUG

int main() { char *p;

printf("checking lexis\n");

while((p=(char*)yylex()))

printf("%-20.20s on line %4d is <%s>\n", p, yflineno, yytext);

return 0;

}

#endif DEBUG

5.3.4 Concrete Syntax

From the concrete syntax BNF several parts have to be generated. The first and most important part is the yacc file that resembles the concrete syntax. Yacc will then check the grammar for shift reduce conflicts and generate a parser. The second part is the generation of the abstract syntax level 0 (AS0). This is the interface to the static semantic part, which starts with the AS0 and describes the transformation to the AS1. There is also a description of the AS0 within the formal semantics part, which has to match the description generated from the concrete syntax BNF. The last part to be generated is the ^kimwitu representation of the AS0.

For all of the three parts to be generated some grammar corrections have to be made. Therefore a first step is inserted to implement the common grammar changes. This is achieved with the generation of a so-called GST file. Starting from that file, the yacc output is generated. The problem in the yacc generation is that for generating a valid yacc file some more grammar changes have to be done. However, for the insertion of syntax tree generation actions as provided by the kimwitu representation of the AS0 the original GST structure has to be known. Please find an overview of the concrete syntax implementation in Figure 12 below.

rsdl-cs-extr.cs cs

front gen

one rsdl-cs-extr.one rsdl-cs-extr.ast

rsdl-cs.cs cs

front gen

one

genyacc rsdl-cs.y rsdl-cs.one rsdl-cs.ast

diff rsdl-lex.tok

gen gst rsdl-cs.gst

gen

k rsdl-as0.k

gen kst rsdl-cs.kst

gentxt rsdl-as0.as0

Figure 12: Structure of the Concrete Syntax Implementation These dependencies are represented by the following make statements.

${RSDL_C}.cs: ${INPUTS}/${RSDL_C}.txt; ln -f $< $@

${RSDL_O}.cs: ${INPUTS}/${RSDL_O}.txt; ln -f $< $@

%.tok: ${INPUTS}/%.tok; ln -f $< $@

%.ast: %.cs cs2ast${EXE} ${RSDL_L}.tok

( cat $< ${RSDL_L}.tok | ./cs2ast${EXE} > $@ ) || (rm $@; exit 1)

%.gst: %.ast ast2gst${EXE}; ./ast2gst${EXE} < $< > $@ || (rm $@; exit 1)

# gst is a general syntax tree for yacc generation and kimwitu generation

%.kst: %.gst gst2kst${EXE}; ./gst2kst${EXE} < $< > $@ || (rm $@; exit 1)

# kst is a syntax tree for kimwitu generation

%.one: %.ast ast2one${EXE}; (./ast2one${EXE} < $< | sort > $@) || (rm $@; exit 1)

%.y: %.gst gst2y${EXE}; ./gst2y${EXE} < $< > $@ || (rm $@; exit 1)

${RSDL_0}.k: ${RSDL_C}.kst ast2k${EXE}; ./ast2k${EXE} < $< > $@ || (rm $@; exit 1)

${RSDL_0}.as0: ${RSDL_C}.kst kst2txt${EXE}

./kst2txt${EXE} < $< | \

sed -e "s/::/###(::)/" -e "s/=/###(=)/" -e "s/###/::=/" > $@ || (rm $@; exit 1) cs.diff: ${RSDL_C}.one ${RSDL_O}.one; diff $^ > $@

There are some tricky details of the transformation. We will not show all the details of the transformation but only two of the more special parts.

The first detail is the insertion of additional rules. Additional rules are inserted using two steps. First, two new kimwitu constructors are introduced (lines 1-2) to trigger the insertion of a new rule (lines 14-20). The insertion itself is done using an auxiliary C-function inserting the new rule into a temporary rule list (lines 6-12). This temporary rule list is inserted into the global rule list when the rewriting reaches the outermost Spec node (lines 32-33).

1. atom: MakeRule( casestring atom );

2. atom: AtomAndRule( atom rule );

3. %{ KC_REWRITE

4. static syntax addRules=0;

5. %}

6. atom storeRule(a,r) atom a; rule r;

7. { if(!addRules) addRules=Nilsyntax(); addRules= Conssyntax(r,addRules); return a; }

8. spec insertRules(s,sy,t) spec s; syntax sy; symtab t;

9. { syntax loc=(addRules)?concat_syntax(addRules,sy):sy; addRules=Nilsyntax();

10. if(loc==sy) return s;

11. fprintf(stderr,"\nadding all rules\n"); return Spec(loc,t);

12. }

13. /* divide up "a b | ..." and "... | a b" into "..." and "a b" */

14. Consexpression(o=Consserial(*,Consserial(*,*)), r=Consexpression(*,*))

15. -> <r_cst2gst: Consexpression(Consserial(MakeRule(NewAS0Name(ser2atom(o)), ser2atom(o)), 16. Nilserial()), r) >;

17. Consexpression(r=*, Consexpression(o=Consserial(*, Consserial(*,*)), Nilexpression())) 18. -> <r_cst2gst: Consexpression(r, Consexpression(Consserial(MakeRule(

19. NewAS0Name(ser2atom(o)), ser2atom(o)), Nilserial()), Nilexpression())) >;

20. MakeRule(n,SubExpression(e))

21. -> <: AtomAndRule(Nonterminal(n), Rule(Unknown(), n,e)) >;

22. MakeRule(n,x)

23. -> <: AtomAndRule(Nonterminal(n),

24. Rule(Unknown(),n,Consexpression(Consserial(x,Nilserial()), 25. Nilexpression()))) >;

26. MakeRule(*, SubExpression(Consexpression(Consserial(g=GAtom(*,*), *), *))) 27. -> <: g >;

28. AtomAndRule(a,r)

29. -> <: storeRule(a,r) >;

30. s=Spec(sy,t)

31. -> <: insertRules(s,sy,t) >;

The second detail is the generation of new names. The problem here is that we would like the name generation to be expressed using unparse-rules, but to call it using a C-function. This is accomplished using a special print function strprint as shown below.

%{ KC_REWRITE char buffer[2000] ;

void strprint_f(const char *s, uview_enum v) { strcat(buffer,s); }

casestring NewName(atom a) { buffer[0]=0;

a->unparse(strprint,gen_name);

return mkcasestring(buffer);

}

Im Dokument Formal Semantics for SDL (Seite 154-160)