LRSTAR: LR(*) parser generator for C++ A.M.D.G.
About Feedback Downloads,
Setup
LRSTAR DFA Release
Notes
Contact
BASIC ADVANCED EXPERT

DFA Options

Type "dfa" and you get this:

DFA 20.0.001 32b Copyright LRTEC.
|
|   DFA LEXER GENERATOR
|
|   dfa <grammar> [/<option>...]
|
|   OPTION  DEFAULT  DESCRIPTION
|   d          0     Debug option for the lexer
|   ci         0     Case insensitivity (1,2)  
|   st         0     State-machine listing
|   v          2     Verbose mode (1,2,3)
|   w          0     Print warnings on screen
|_

DFA Grammars

DFA reads a grammar, which is more powerful than regular expressions, and more readable. Have you ever tried to specify a multi-line comment with regular expressions?

DFA reads two files. In the Calc project, the first one is Calc.lex. This file contains all the tokens of your language. It's generated by LRSTAR when reading the "Calc.grm" file. The second one DFA reads is Calc.lgr, in which you specify the rules for making the tokens and the character-set definitions.

Here is the Calc.lex generated by LRSTAR.

$Goal    -> $Token $End
$Token   -> <eof>              1
         -> <identifier>       2
         -> <integer>          3
         -> 'else'            20
         -> 'endif'           16
         -> 'if'              15
         -> 'program'         10
         -> 'then'            19
         -> '!='               5
         -> '('               17
         -> ')'               18
         -> '*'                8
         -> '+'                6
         -> '-'                7
         -> '/'                9
         -> ';'               14
         -> '='               13
         -> '=='               4
         -> '{'               11
         -> '}'               12
         ;

Here is the Calc.lgr written by a user.

<eof>             -> \z	            
<identifier>      -> letter (letter|digit)*		  
<integer>         -> digit+ 
{whitespace}      -> ( \t | \n | \r | ' ' )+
{commentline}     -> '/' '/' neol*
{commentblock}    -> '/' '*' na* '*'+ (nans na* '*'+)* '/'												

letter            = 'a'..'z' | 'A'..'Z' | '_' 
digit             = '0'..'9'						 
any               = 0..127 - \z       // any character except EOF
na                = any - '*'         // not asterisk
nans              = any - '*' - '/'   // not asterisk not slash
neol              = any - \n          // not end of line

\t                =  9                // tab
\n                = 10                // newline
\r                = 13                // return
\z                = 26                // end of file

Notice the {whitespace}, {commentline} and {commentblock}. They are ignored symbols, NOT transmitted to the parser.

Notice the = indicator is used to define a character set, whereas the -> is for rules only.


DFA-Only Grammars

If you are using DFA only, without LRSTAR, then you need to specify everything in the Calc.lgr file, as follows. The defined constants EOFILE through RBRACE will be available to use in your code, hand-written parser or whatever.

$Goal    -> $Token $End
$Token   -> <eof>             EOFILE
         -> <identifier>      IDENTIFIER
         -> <integer>         INTEGER
         -> 'else'            ELSE
         -> 'endif'           ENDIF
         -> 'if'              IF
         -> 'program'         PROGRAM
         -> 'then'            THEN
         -> '!='              NOTEQ
         -> '('               LPAREN
         -> ')'               RPAREN
         -> '*'               MUL
         -> '+'               PLUS
         -> '-'               MINUS
         -> '/'               DIV
         -> ';'               SEMI
         -> '='               EQ
         -> '=='              EQS
         -> '{'               LBRACE
         -> '}'               RBRACE
         ;

<eof>             -> \z	            
<identifier>      -> letter (letter|digit)*		  
<integer>         -> digit+ 
{whitespace}      -> ( \t | \n | \r | ' ' )+
{commentline}     -> '/' '/' neol*
{commentblock}    -> '/' '*' na* '*'+ (nans na* '*'+)* '/'												

letter            = 'a'..'z' | 'A'..'Z' | '_' 
digit             = '0'..'9'						 
any               = 0..127 - \z       // any character except EOF
na                = any - '*'         // not asterisk
nans              = any - '*' - '/'   // not asterisk not slash
neol              = any - \n          // not end of line

\t                =  9                // tab
\n                = 10                // newline
\r                = 13                // return
\z                = 26                // end of file

(c) Copyright LRTEC 2020.  All rights reserved.