LRSTAR Parser Generator

A.M.D.G.

About Feedback Installation
and Setup LRSTAR DFA Papers Release
Notes Contact,
Support

BASIC ADVANCED EXPERT

DFA Options

Type "dfa" and you get this:

DFA 24.0.000 64b Copyright Paul B Mann.
|
|   DFA LEXER GENERATOR
|
|   dfa <grammar> [/<option>...]
|
|   OPTION  DEFAULT  DESCRIPTION
|   crr         1    Conflict report for Reduce-Reduce
|   csr         1    Conflict report for Shift-Reduce
|   d           0    Debug lexer activated
|   g           0    Grammar listing
|   ko          0    Keywords only (no identifiers).
|   m           0    Minimize lexer-table size
|   st          0    State machine for conflicts report
|   sto         0    State machine optimized
|   v           2    Verbose mode (0,1,2)
|   w           0    Print warnings on screen
|_

DFA Grammars

DFA reads a grammar, which is more powerful than regular expressions, and more readable. Have you ever tried to specify a multi-line comment with regular expressions?

DFA reads two files. In the Calc project, the first one is Calc.lex. This file contains all the tokens of your language. It's generated by LRSTAR when reading the "Calc.grm" file. The second one DFA reads is Calc.lgr, in which you specify the rules for making the tokens and the character-set definitions.

Here is the Calc.lex generated by LRSTAR.

$Goal    -> $Token $End
$Token   -> <eof>              1
         -> <identifier>       2
         -> <integer>          3
         -> 'else'            20
         -> 'endif'           16
         -> 'if'              15
         -> 'program'         10
         -> 'then'            19
         -> '!='               5
         -> '('               17
         -> ')'               18
         -> '*'                8
         -> '+'                6
         -> '-'                7
         -> '/'                9
         -> ';'               14
         -> '='               13
         -> '=='               4
         -> '{'               11
         -> '}'               12
         ;

Here is the Calc.lgr written by a user.

<eof>             -> \z	            
<identifier>      -> letter (letter|digit)*		  
<integer>         -> digit+ 
{whitespace}      -> ( \t | \n | \r | ' ' )+
{commentline}     -> '/' '/' neol*
{commentblock}    -> '/' '*' na* '*'+ (nans na* '*'+)* '/'												

letter            = 'a'..'z' | 'A'..'Z' | '_' 
digit             = '0'..'9'						 
any               = 0..127 - \z       // any character except EOF
na                = any - '*'         // not asterisk
nans              = any - '*' - '/'   // not asterisk not slash
neol              = any - \n          // not end of line

\t                =  9                // tab
\n                = 10                // newline
\r                = 13                // return
\z                = 26                // end of file

Notice the {whitespace}, {commentline} and {commentblock}. They are ignored symbols, NOT transmitted to the parser.

Notice the = indicator is used to define a character set, whereas the -> is for rules only.

DFA-Only Grammars

If you are using DFA only, without LRSTAR, then you may specify everything in the Calc.lgr file, as follows. The defined constants EOFILE through RBRACE will be available to use in your code, hand-written parser or whatever.

$Goal    -> $Token $End
$Token   -> <eof>             EOFILE
         -> <identifier>      IDENTIFIER
         -> <integer>         INTEGER
         -> 'else'            ELSE
         -> 'endif'           ENDIF
         -> 'if'              IF
         -> 'program'         PROGRAM
         -> 'then'            THEN
         -> '!='              NOTEQ
         -> '('               LPAREN
         -> ')'               RPAREN
         -> '*'               MUL
         -> '+'               PLUS
         -> '-'               MINUS
         -> '/'               DIV
         -> ';'               SEMI
         -> '='               EQ
         -> '=='              EQS
         -> '{'               LBRACE
         -> '}'               RBRACE

<eof>             -> \z	            
<identifier>      -> letter (letter|digit)*		  
<integer>         -> digit+ 
{whitespace}      -> ( \t | \n | \r | ' ' )+
{commentline}     -> '/' '/' neol*
{commentblock}    -> '/' '*' na* '*'+ (nans na* '*'+)* '/'												

letter            = 'a'..'z' | 'A'..'Z' | '_' 
digit             = '0'..'9'						 
any               = 0..127 - \z       // any character except EOF
na                = any - '*'         // not asterisk
nans              = any - '*' - '/'   // not asterisk not slash
neol              = any - \n          // not end of line

\t                =  9                // tab
\n                = 10                // newline
\r                = 13                // return
\z                = 26                // end of file