LRSTAR: LR(*) parser generator for C++ A.M.D.G.
About Feedback Downloads,
Setup
LRSTAR DFA Release
Notes
Contact
BASIC ADVANCED EXPERT

LRSTAR Options

Type "lrstar" and you get this:

LRSTAR 20.0.001 32b Copyright LRTEC.
|
|   LR(*) PARSER GENERATOR
|																
|   lrstar <grammar> [/<option>...]
|
|   OPTION  DEFAULT  DESCRIPTION
|   ast        1     AST construction in parser
|   d          0     Debug option for the parser
|   k          1     k-lookaheads for LR(*) parsing
|   m          0     Minimize parser-table size
|   o          0     Optimize parser speed 
|   st         0     State-machine listing 
|   v          2     Verbose mode (1,2,3)
|   w          0     Print warnings on screen
|_

LRSTAR Grammars

LRSTAR reads a grammar. A grammar is a "list of rules" specifying (1) the pattern of an input file, or (2) the syntax of a language. LRSTAR reads pure grammar notation, mostly rules, action operators and function names. By using AST operators, LRSTAR parsers can construct an abstract-syntax tree (AST). Simple, concise and powerful.


Lexical Tokens

<identifier>, <integer>, <string>, <eof>.
These are the variable symbols of your language. Tokens are provided by the lexer. Here is a simple grammar, with lexical tokens in pink:

<error>       => error();
<identifier>  => lookup();  // Call symbol-table function.
<integer>     => lookup();  // Call symbol-table function.

{ '+'  '-' }  <<  // Left associative, lowest priority.     
{ '*'  '/' }  <<  // Left associative, highest priority.

Start   -> Program+ <eof>                        *> goal_		

Program -> 'program' <identifier> '{' Stmt* '}'  *> program_(2)
            
Stmt    -> Target '=' Exp ';'                    *> store_~			
           
Target  -> <identifier>                          *> target_(1)		
            
Exp     -> <integer>                             *> integer_(1)			
        -> <identifier>                          *> identifier_(1)		
        -> Exp '+' Exp                           *> add_				
        -> Exp '-' Exp                           *> sub_				
        -> Exp '*' Exp                           *> mul_				
        -> Exp '/' Exp                           *> div_				
        -> '(' Exp ')'  

Lexical Tokens As Defined Constants

IDENTIFIER, INTEGER, STRING, EOF.
Some people are in the habit of using defined constants for lexical symbols. It originated with "lex" and "flex" which both do a "return(IDENTIFIER)" to the parser. DFA returns the actual token number to the LRSTAR parser, so there's no need for using defined constants in LRSTAR grammars, in most cases.

However, you may have a YACC grammar which uses defined constants in the rules. In this case, you will need to put the defined constants at the top of your grammar (easier than changing the rules). These defined constants will also be available to use in your source code. Here is an example:

ERROR       <error>        => error();
IDENTIFIER  <identifier>   => lookup();  // Call symbol table function.
INTEGER     <integer>      => lookup();  // Call symbol table function.
EOF         <eof>;     
PLUS        '+';
MINUS       '-';
MUL         '*';
DIV         '/';
LBRACE      '{';
RBRACE      '}';
EQUALS      '=';
SEMICOLON   ';';
LPAREN      '(';
RPAREN      ')';
PROGRAM     'program';

Parser-Defined Terminals

{typename}, {headsymbol}, {function_name}, {array_name}.
These are also variable symbols of your language, called "semantic symbols". These are usually provided by the lexer as <identifier>s, but changed to the more meaningful semantic symbols, when appropriate. A Parse Action is necessary to "upgrade" the input token. Here is an example. Notice that LRSTAR accepts the YACC style notation (: | ;) as well as the arrow (->).

<identifier>  => lookup();  // Call symbol-table function.

Goal        : Declaration* <eof>
            ;
Declaration :         TypeSpec+ Identifier ';'					
            | typedef TypeSpec+ Typename   ';'	 
            ;
Identifier  : <identifier>				
            ;
Typename    : <identifier>   => typedef_(1,{typename}) 
            ;
TypeSpec    : char							
            | int												
            | short							
            | {typename}
            ;

Two Special Symbols

<eof>      End-of-file symbol.
<error>    Error symbol.

<eof>
This can only be used in one place in the rules of the grammar, at the end of the start rule, for example:

Start -> CompilationUnit <eof>

<error>
This symbol is here for completeness. It can be used for custom parsing algorithms, but none are provided, yet. If one needs LR(*) parsing, the usage of the <error> symbol becomes more complicated. It's here in case you want to write your own custom parser.


EBNF Notation

Basic operators:       Extended operators:
+ One or more.         / Separator.
* Zero or more.
? Optional.
( Group start.         [ Optional start.
) Group end.           ] Optional end.

Here are some useful examples:

Start -> ExternalDef* <eof>         // Zero or more ExternalDef			   	                     
      
PrimaryExp
   -> <string>+                     // One or more <string>
   -> <identifier>														 

Default -> (default ':' Stmt*)?     // Optional group

PostfixExp
      -> PrimaryExp
      -> {function} '(' [Args] ')'  // Optional Args

Args  -> <identifier>/','+          // One or more separated by ','s

Five Action Operators

=> Call a Token Action or a Parse Action.
+> Make a Node in the AST.
*> Make a Node and call a Node Action.
=+> Call a Parser action, make a Node in the AST.
=*> Call a Parser action, make a Node and call a Node Action.

Here's where they fit in to the parser system:

(1) Get token         => Call a Token Action
(2) Recognize a rule  => Call a Parse Action
                      +> Make a Node in the AST
(3) Traverse the AST  *> Call a Node Action

Here is a simple grammar, with action operators in pink:

<error>       => error();
<identifier>  => lookup();  // Call symbol table function.
<integer>     => lookup();  // Call symbol table function.

{ '+'  '-' }  <<  // Left associative, lowest priority.     
{ '*'  '/' }  <<  // Left associative, highest priority.

Start   -> Program+ <eof>                        *> goal_		

Program -> 'program' <identifier> '{' Stmt* '}'  *> program_(2)
            
Stmt    -> Target '=' Exp ';'                    *> store_~			
           
Target  -> <identifier>                          *> target_(1)		
            
Exp     -> <integer>                             *> integer_(1)			
        -> <identifier>                          *> identifier_(1)		
        -> Exp '+' Exp                           *> add_				
        -> Exp '-' Exp                           *> sub_				
        -> Exp '*' Exp                           *> mul_				
        -> Exp '/' Exp                           *> div_				
        -> '(' Exp ')'  

Reverse Operator

~  Reverse the child nodes.

In the AST, some subtrees work much better when the order is "reversed" from the original input order. In the following case, we want to evaluate the Exp before we store the result in the Target, so we use the ~ after the Node Action name. See the Calc project.

Stmt  -> Target '=' Exp ';'        *> store_~			

Three Argument Arrays

arga[], argx[], argy[].
These are available as follows:

Token Actions: arga[]

Grammar:
   <alpha>  => headsym ({headsymbol});

Function:
int   LRSTAR_TokenActions::headsym (int& t) 
{
      int original_t = t;
      char* p = token.start-1;
      while (*p == ' ' || *p == '\t') p--;
      if (*p == '\n') // First symbol on this line?
      {
         t = arga[t]; // Set to {headsymbol}
      }
      return lookup (original_t);
}

Parse Actions: 1st argument = argx[], 2nd argument = argy[]

Grammar:
   TypedefIdentifier -> <identifier>  => typedef_ (1,{typename})

Function:
int   Typedef_ParseActions::typedef_ (int p) 
{
      int x = argx[p];
      int sti = PS[x].sti;        // Get sti from parse stack.
      symbol[sti].term = argy[p]; // Set term to {typename}.
      return 1; 
}

(c) Copyright LRTEC 2020.  All rights reserved.