Chapter 6: The Generated Parser Class' Members
Bisonc++ generates a C++ class, rather than a function like Bison.
Furthermore, Bisonc++'s class is a plain C++ class and not a fairly
complex macro-based class like the one generated by Bison++. The C++ class
generated by Bisonc++ does not have (need) virtual members, since the
essential member, the parser member parse()
is generated from the grammar
specification and the software engineer will therefore hardly ever feel the
need to override that function. All but a few of the remaining predefined
members have very clear definitions and meanings as well, making it unlikely
that they should ever require overriding. It is likely that members like
lex()
and/or error()
need dedicated definitions with different parsers
generated by Bison++; but then again: while defining the grammar the
definition of the associated support members is a natural extension of
defining the grammar, and can be realized in parallel with defining the
grammar, in practice not requiring any virtual members. By not defining
(requiring) virtual members the parser's class organization is simplified, and
the calling of the non-virtual members will be just a trifle faster than when
these member functions would have been virtual.
In this chapter all available members and features of the generated parser
class are discussed. Having read this chapter you should be able to use the
generated parser class in your program (using its public members) and to use
its facilities in the actions defined for the various production rules and/or
use these facilities in additional class members that you might have defined
yourself.
In the sequel the class' public members are first discussed, to be followed by
the class' private members. While constructing the grammar the private members
are all available for use in the rules' actions. Furthermore, any member (and
not just the rules' action blocks) may generate errors (thus initiating error
recovery procedures) and may flag the (un)successful parsing of the
information given to the parser (thus terminating the parsing function
parse()
).
6.1: Public Members and Types
The following public members can be used by users of the parser classes
generated by bisonc++ (`Parser Class'::
prefixes are silently implied):
- LTYPE:
The parser's location type (user-definable). Available only when
either %lsp-needed, %ltype or %locationstruct has been
declared.
- STYPE:
The parser's stack-type (user-definable), defaults to int.
- Tokens:
The enumeration type of all the symbolic tokens defined in the
grammar file (i.e., bisonc++'s input file). The scanner should be
prepared to return these symbolic tokens. Note that, since the
symbolic tokens are defined in the parser's class and not in the
scanner's class, the lexical scanner must prefix the parser's
class name to the symbolic token names when they are
returned. E.g., return Parser::IDENT should be used rather
than return IDENT.
- int parse():
The parser's parsing member function. It returns 0 when parsing has
completed successfully, 1 if errors were encountered while parsing
the input.
6.2: Private Enumerations and Types
The following enumerations and types can be used by members of parser
classes generated by bisonc++. When prefixed by Base:: they are actually
protected members inherited from the parser's base class.
- Base::ErrorRecovery:
This enumeration defines two values:
DEFAULT_RECOVERY_MODE,
UNEXPECTED_TOKEN
DEFAULT_RECOVERY_MODE consists of terminating the parsing
process. UNEXPECTED_TOKEN activates the recovery procedure
whenever an error is encountered. The recovery procedure consists of
looking for the first state on the state-stack having an
error-production, and then skipping subsequent tokens until (in that
state) a token is retrieved which may follow the error terminal
token in that production rule. If this error recovery procedure fails
(i.e., if no acceptable token is ever encountered) error recovery
falls back to the default recovery mode, terminating the parsing
process.
- Base::Return:
This enumeration also defines two values:
PARSE_ACCEPT = 0,
PARSE_ABORT = 1
The parse() member function will return one of these values.
6.3: Private Member Functions
The following private members can be used by members of parser classes
generated by bisonc++. When prefixed by Base:: they are actually protected
members inherited from the parser's base class.
- Base::ParserBase():
The default base-class constructor. Can be ignored in practical
situations.
- void Base::ABORT() const throw(Return):
This member can be called from any member function (called from any of
the parser's action blocks) to indicate a failure while parsing thus
terminating the parsing function with an error value 1. Note that this
offers a marked extension and improvement of the macro YYABORT
defined by bison++ in that YYABORT could not be called from
outside of the parsing member function.
- void Base::ACCEPT() const throw(Return):
This member can be called from any member function (called from any of
the parser's action blocks) to indicate successful parsing and thus
terminating the parsing function. Note that this offers a marked
extension and improvement of the macro YYACCEPT defined by
bison++ in that YYACCEPT could not be called from outside of
the parsing member function.
- void Base::clearin():
This member replaces bison(++)'s macro yyclearin and causes
bisonc++ to request another token from its lex+nop()()
member,
even if the current token has not yet been processed.
- bool Base::debug() const:
This member returns the current value of the debug variable. See
setDebug() below.
- void Base::ERROR() const throw(ErrorRecovery):
This member can be called from any member function (called from any of
the parser's action blocks) to generate an error, and thus initiate
the parser's error recovery code. Note that this offers a marked
extension and improvement of the macro YYERROR defined by
bison++ in that YYERROR could not be called from outside of
the parsing member function.
- void error(char const *msg):
This member may be redefined in the parser class. Its default (inline)
implementation is to write a simple message to the standard error
stream. It is called when a syntactical error is encountered.
- void errorRecovery():
Used internally by the parsing function. Not to be called otherwise.
- void executeAction():
Used internally by the parsing function. Not to be called otherwise.
- int lex():
This member is called by the parse() member to
obtain the next lexical token. By default it is not implemented, but
the %scanner
directive (see 5.6.15 may be used to
pre-implement a standard interface to a lexical analyzer. See section
6.3.1 for further details about the lex() private member
function.
- int lookup():
Used internally by the parsing function. Not to be called otherwise.
- void nextToken():
Used internally by the parsing function. Not to be called otherwise.
- void Base::pop():
Used internally by the parsing function. Not to be called otherwise.
- void print()):
This member can be redefined in the parser class to print information
about the parser's state. It is called by the parser immediately after
retrieving a token from lex(). As it is a member function it has
access to all the parser's members, in particular d_token, the
current token value and d_loc, the current token location
information (if %lsp-needed, %ltype or %locationstruct has
been specified).
- void Base::push():
Used internally by the parsing function. Not to be called otherwise.
- void Base::reduce():
Used internally by the parsing function. Not to be called otherwise.
- void Base::setDebug(bool mode):
This member can be used to activate or deactivate the debug-code
compiled into the parsing function. It is available, but has no
effect, if no debug code has been compiled into the parsing
function. When debugging code has been compiled into the parsing
function, it is active by default, but it may be suppressed by calling
setDebug(false).
- void Base::top():
Used internally by the parsing function. Not to be called otherwise.
6.3.1: `lex()': the Lexical Analyzer interface
The int lex() private member function is called by the
parse() member to obtain the next lexical token. By default it is not
implemented, but the %scanner
directive (see 5.6.15 may be used to
pre-implement a standard interface to a lexical analyzer.
The lex() member function interfaces to the lexical scanner, and it is
expected to return the next token produced by the lexical scanner. This token
may either be a plain character or it may be one of the symbolic tokens
defined in the Parser::Tokens enumeration. Any zero or negative token
value is interpreted as `end of input', causing parse()
to return.
The lex() member function may be implemented in various ways:
- lex() may itself implement a lexical analyzer (a
scanner). This may actually be a useful option when the input offered to
the program using bisonc++'s parser class is not overly complex. This approach was
used when implementing the earlier examples (see sections 4.1.3 and
4.4.4).
- lex() may call a external function or member function of class
implementing a lexical scanner, and return the information offered by this
external function. When using a class, an object of that class could also be
defined as additional data member of the parser (see the next
alternative). This approach can be followed when generating a lexical scanner
from a lexical scanner generating tool like lex(1) or flex(1). The
latter program allows its users to generate a scanner class.
- Since flex(1) is fairly often used as a tool to generate a
scanner (class), a standard interface for this situation is available with
bisonc++: using the scanner option or directive (see section 5.6.15) it
is assumed that a class Scanner is available, and that the parser should
have a data member
Scanner d_scanner
. Furthermore, it is assumed that the
parser's lex() member merely has to return d_scanner.yylex()
.
6.4: Private Data Members
The following private members can be used by members of parser classes
generated by bisonc++. All data members are actually protected
members inherited from the parser's base class.
6.5: Types and Variables in the Anonymous Namespace
In the file defining the parse() function (see section
5.6.16.4) the following types and variables are defined in the
anonymous namespace. These are mentioned here for the sake of completeness,
and are not normally accessible to other parts of the parser.
- ReservedTokens:
This enumeration defines some token values used internally by the
parsing functions. They are:
_UNDETERMINED_ = -2,
_EOF_ = -1,
_error_ = 256,
These tokens are used by the parser to determine whether another token
should be requested from the lexical scanner, and to handle
error-conditions.
- SR (Shift-Reduce Info):
This struct provides the shift/reduce information for the various
grammatical states. SR values are collected in arrays, one array
per grammatical state. These array, named s_<nr>
,
where tt<nr> is a state number are defined in the anonymous namespace
as well. The SR elements consist of two unions,
defining fields that are applicable to, respectively, the first,
intermediate and the last array elements.
The first element of each array consists of (1st field) a StateType
and (2nd field) the index of the last array element;
intermediate elements consist of (1st field) a symbol value and (2nd
field) (if negative) the production rule number reducing to the
indicated symbol value or (if positive) the next state when the symbol
given in the 1st field is the current token;
the last element of each array consists of (1st field) a placeholder for
the current token and (2nd field) the (negative) rule number to reduce
to by default or the (positive) number of an error-state to go to when
an erroneous token has been retrieved. If the 2nd field is zero, no
error or default action has been defined for the state, and
error-recovery is attepted.
- StateType:
This enumeration defines the type of the various grammar-states. They
are:
NORMAL,
HAS_ERROR_ITEM,
IS_ERROR_STATE,
HAS_ERROR_ITEM is used for a state having at least one
error-production. IS_ERROR_STATE is used for a state from which
error recovery is attempted. So, while in these states tokens are
retrieved until a token from where parsing may continue is seen by the
parser. All other states are NORMAL states.
- PI (Production Info):
This struct provides information about production rules. It has two
fields: d_nonTerm is the identification number of the production's
non-terminal, d_size represents the number of elements of the
productin rule.
6.5.1: Special Features for Actions
Here is an overview of special syntactical constructions that may be used
inside action blocks:
$$
: This acts like a variable that contains the semantic value for
the grouping made by the current rule. See section 5.5.3.
$n
: This acts like a variable that contains the semantic value for
the n-th component of the current rule. See section 5.5.3.
$<typealt>$
: This is like $$
, but it specifies alternative
typealt
in the union specified by the %union
directive. See sections
5.5.1 and 5.5.2.
$<typealt>n
: This is like $n
but it specifies an alternative
typealt
in the union specified by the %union
directive. See sections
5.5.1 and 5.5.2.
@n
: This acts like a structure variable containing information on the
line numbers and column numbers of the nth component of the current rule. The
default structure is defined like this (see section 5.6.12):
struct LTYPE
{
int timestamp;
int first_line;
int first_column;
int last_line;
int last_column;
char *text;
};
Thus, to get the starting line number of the third component, you would
use @3.first_line
.
In order for the members of this structure to contain valid information,
you must make sure the lexical scanner supplies this information about each
token. If you need only certain fields, then the lexical scanner
only has to provide those fields.
Be advised that using this or corresponding (custom-defined, see sections
5.6.13 and 5.6.14) may slow down the parsing process noticeably.