LRSTAR Parser Generator

A.M.D.G.

About Feedback Installation
and Setup LRSTAR DFA Papers Release
Notes Contact,
Support

BASIC ADVANCED EXPERT

LRSTAR Options

Type "lrstar" and you get this:

LRSTAR 24.0.000 64b Copyright Paul B Mann.
|
|   LR(*) PARSER GENERATOR
|																
|   lrstar <grammar> [/<option>...]
|
|   OPTION  DEFAULT  DESCRIPTION
|
|   crr        0     Conflict report for Reduce-Reduce
|   csr        0     Conflict report for Shift-Reduce
|   fsa        0     Force shift actions in conflicts
|
|   g          1     Grammar listing
|   gh         0     Grammar HTML format output
|
|   st         0     States listing
|   ci         0     Case-insensitive input
|
|   pt         1     Parser-table creation
|   o          0     Optimize parser-tables
|   m          0     Minimize parser-tables
|
|   k          1     k-lookaheads for LR(*) parsing
|
|   ast        1     AST activated in parser
|   exp        1     Expecting-list activated
|   ta         1     Terminal-actions activated
|   na         1     Node-actions activated
|
|   d          0     Debug parser activated.
|   dt         0     Debug trace activated.
|
|   v          2     Verbose listing (0,1,2)
|   w          0     Warnings listed on screen
|
|   wa         0     Warning, arguments -> nonterminals
|   wc         0     Warning, constants not used
|   wk         0     Warning, keywords not declared
|   wm         0     Warning, mixed case keywords
|   wu         0     Warning, upper case keywords
|_

LRSTAR Grammars

LRSTAR reads a grammar. A grammar is a "list of rules" specifying (1) the pattern of an input file, or (2) the syntax of a language. LRSTAR reads pure grammar notation, mostly rules, action operators and function names. By using AST operators, LRSTAR parsers can construct an abstract-syntax tree (AST). Simple, concise and powerful.

Lexical Tokens

<identifier>, <integer>, <string>, <eof>.
These are the variable symbols of your language. Tokens are provided by the lexer. Here is a simple grammar:

<error>       => error();
<identifier>  => lookup();  // Call symbol-table function.
<integer>     => lookup();  // Call symbol-table function.

{ '+'  '-' }  <<  // Left associative, lowest priority.     
{ '*'  '/' }  <<  // Left associative, highest priority.

Start   -> Program+ <eof>                        *> goal_		

Program -> 'program' <identifier> '{' Stmt* '}'  *> program_(2)
            
Stmt    -> Target '=' Exp ';'                    *> store_~			
           
Target  -> <identifier>                          *> target_(1)		
            
Exp     -> <integer>                             *> integer_(1)			
        -> <identifier>                          *> identifier_(1)		
        -> Exp '+' Exp                           *> add_				
        -> Exp '-' Exp                           *> sub_				
        -> Exp '*' Exp                           *> mul_				
        -> Exp '/' Exp                           *> div_				
        -> '(' Exp ')'

Lexical Tokens As Defined Constants

IDENTIFIER, INTEGER, STRING, EOF.
Some people are in the habit of using defined constants for lexical symbols. It originated with "lex" and "flex" which both do a "return(IDENTIFIER)" to the parser. DFA returns the actual token number to the LRSTAR parser, so there's no need for using defined constants in LRSTAR grammars, in most cases.

However, you may have a YACC grammar which uses defined constants in the rules. In this case, you will need to put the defined constants at the top of your grammar (easier than changing the rules). These defined constants will also be available to use in your source code. Here is an example:

ERROR       <error>        => error();
IDENTIFIER  <identifier>   => lookup();  // Call symbol table function.
INTEGER     <integer>      => lookup();  // Call symbol table function.
EOF         <eof>;     
PLUS        '+';
MINUS       '-';
MUL         '*';
DIV         '/';
LBRACE      '{';
RBRACE      '}';
EQUALS      '=';
SEMICOLON   ';';
LPAREN      '(';
RPAREN      ')';
PROGRAM     'program';

Parser-Defined Terminals

{typename}, {headsymbol}, {function_name}, {array_name}.
These are also variable symbols of your language, called "semantic symbols". These are usually provided by the lexer as <identifier>s, but changed to the more meaningful semantic symbols, when appropriate. A special action is necessary to "upgrade" the input by using the "^" operator. Here is an example. Notice that LRSTAR accepts the YACC style notation (: | ;) as well as the arrow (->).

<identifier>  => lookup();  // Call symbol-table function.

Goal        : Declaration* <eof>
            ;
Declaration :         TypeSpec+ Identifier ';'					
            | typedef TypeSpec+ Typename   ';'	 
            ;
Identifier  : <identifier>				
            ;
Typename    : <identifier>^{typename} 
            ;
TypeSpec    : char							
            | int												
            | short							
            | {typename}
            ;

Two Special Symbols

<eof> End-of-file symbol.
<error> Error symbol.

<eof>
This can only be used in one place in the rules of the grammar, at the end of the start rule, for example:

Goal -> CompilationUnit <eof>

<error>
This symbol is here for completeness. It can be used for custom parsing algorithms, but none are provided, yet. If one needs LR(*) parsing, the usage of the <error> symbol becomes more complicated. It's here in case you want to write your own custom parser.

EBNF Notation

Basic operators:       Extended operators:
+ One or more.         / Separator.
* Zero or more.
? Optional.
( Group start.         [ Optional start.
) Group end.           ] Optional end.

Here are some useful examples:

Goal -> ExternalDef* <eof>         // Zero or more ExternalDef			   	                     
      
PrimaryExp
   -> <string>+                     // One or more <string>
   -> <identifier>														 

Default -> (default ':' Stmt*)?     // Optional group

PostfixExp
      -> PrimaryExp
      -> {function} '(' [Args] ')'  // Optional Args

Args  -> <identifier>/','+          // One or more separated by ','s

Three Action Operators

=> Call a Token Action.
+> Make a Node in the AST.
*> Make a Node and call a Node Action.

Here is a simple grammar, with action operators:

<error>       => error();
<identifier>  => lookup();  // Call symbol table function.
<integer>     => lookup();  // Call symbol table function.

{ '+'  '-' }  <<  // Left associative, lowest priority.     
{ '*'  '/' }  <<  // Left associative, highest priority.

Goal    -> Program+ <eof>                        *> goal_		

Program -> 'program' <identifier> '{' Stmt* '}'  *> program_(2)
            
Stmt    -> Target '=' Exp ';'                    *> store_~			
           
Target  -> <identifier>                          *> target_(1)		
            
Exp     -> <integer>                             *> integer_(1)			
        -> <identifier>                          *> identifier_(1)		
        -> Exp '+' Exp                           *> add_				
        -> Exp '-' Exp                           *> sub_				
        -> Exp '*' Exp                           *> mul_				
        -> Exp '/' Exp                           *> div_				
        -> '(' Exp ')'

Reverse Operator

~ Reverse the child nodes.

In the AST, some subtrees work much better when the order is "reversed" from the original input order. In the following case, we want to evaluate the Exp before we store the result in the Target, so we use the ~ after the Node Action name. See the Calc project.

Stmt  -> Target '=' Exp ';'        *> store_~