.he 'ECMP 432'-%-'Compiler Construction'
.bp
.ce
Use of the Tables Generated by Yacc
.sp 1
Yacc, Yet Another Compiler Compiler, generates a set of tables to be
used in constructing an LR(1) parser.
This note explains the care and feeding of the these tables.  Algorithms
will be given to explicitly access the tables.
.sp 1
The tables generated by yacc fall into three categories, action tables
reduction tables, and goto tables.  The array names associated with each category are
listed below:
.in +5
actions - yypact, yyact
.br
reductions - yyr1, yyr2
.br
gotos - yypgo, yygo
.in -5
The tables and their use will be discussed separately.
.sp 1
Actions: When parsing, the tables are used to find the appropriate
action to take from a given state, dependent on the next token found
in the input stream.  The array yyact is indexed by the value state+1
to find a list of entries in the yyact array to search for an action.
The search of the list goes as follows:  each integer value found in the
array is either the concatenation of a token to compare to the input symbol
and a 1, or the concatenation of an action and a piece of information to
be used in carrying out that action;
so to search the list one compares the input symbol against the token in the
yyact array till a match is found, then by taking the entry in the table
immediately following the needed information is found.
Each list is terminated by an "always match" action, so that a search will
always be successful.  The convention used by yacc to store both information
and a token type to be matched in a single integer is,
.sp 1
.ce 2
action or 1 = yyact[entry] >> 12
token to compare or info = yyact[entry] & 4095
.sp 1
The action values found in the array are:
.sp 1
.nf
.in +5
0	error
2	shift
3	reduce
4	accept
.in -5
.fi
.sp 1
Thus an algorithm to interrogate the tables might look like,
.sp 2
.nf
readaction(state, inputsym, type)
integer state, inputsym;
pointer type;
begin
	integer entry, action, token, plist;

	plist := yypact[state+1]	/* assume states start at 0 */

	while true do
	begin
		action := yyact[plist] >> 12; /* right shift 12 places */
		token := yyact[plist] & 4095; /* bitwise and */
		if(action neq 1) break;
		if(token neq inputsym) plist := plist + 1;
		plist := plist + 1;
	end

	*type := action;
	return(token);
end
.fi
.sp 1
Reductions: When the action value returned by the readaction() procedure
indicates a reduction is to be carried out the reduction tables are used.
These tables are indexed directly by the production number returned by
readaction() (Remember that readaction() returns two values, one is the
action to be taken the other is the production number to be used when a
'reduce' action is to be taken).  The yyr1 table gives the internal value
of the nonterminal to be placed on the stack when reducing and the yyr2
array gives the number of items to be popped off the stack for the production.
Thus an algorithm to access these tables would be,
.sp 2
.nf
readprod(prodnum, nonterm)
integer prodnum;
pointer nonterm;
begin

	*nonterm := yyr1[prodnum];
	return(yyr2[prodnum]);

end
.fi
.sp 1
Gotos: The goto tables are organized much like the action tables.  First a
pointer into yygo is calculated by indexing into yypgo with a nonterminal
value returned by readprod().
Then the yygo table is searched, starting at the entry
pointed to by yypgo[nonterminal] till a match on the current state is found,
or a -1 is encountered.
The value in the yygo array immediately following the entry being used for the
comparison contains the state to goto next.  Thus an algorithm to compute
the goto for a given state and nonterminal might be,
.sp 2
.nf
readgoto(state, nonterminal)
integer state, nonterminal;
begin
	integer plist;

	plist := yypgo[nonterminal];

	while true do
	begin
		if(state eql yygo[plist] or yygo[plist] eql -1)
			return( yygo[plist+1] );
		else
			plist := plist + 2;
	end
end
.fi
.sp 1
One must consider the benefits and drawbacks to using the yacc generated
tables.  First, if any changes should be made in the grammar (god forbid),
then if you have direct access to the tables, updating your parser is a
simple matter.  Second, the tables and the algorithms used to access them
have been found to be very efficient.  On the bad side, the tables do not
allow easy modification to allow error recovery information to be placed
directly in the tables (this will require external arrays to be accessed
upon encountering an error), and the porting of the arrays to another
system not using a compatible medium for transportation implies the user
must type them by hand.  For the latter problem we know the tables
must be constructed, no matter what format they take, so the format used
should be something that can easily be verified for accuracy; integer numbers
do not lend easily to this.
