CHAPTER 30


				REPORTING DATATYPE ERRORS



	  It is possible, in fact probable, that the user of a language will
	  commit errors.	The user might forget to declare a variable, or call
	  a function that hasn't been defined, or pass parameters of a wrong
	type to existing functions.  We need to report such errors and point
	out where in the program specification each error lies.

	We introduce here a general method for reporting errors that works
	no matter what syntax the language has.


30.1
An Example Error

	Let's consider some example datatype errors.  Suppose J is an
	  ~undeclared variable.	 The EXPR:

		    J + 3 * 4

	  makes some, but not complete sense.  The "3*4" is meaningful.  Only
	  the J and the "+" fail to make sense.

	  Consider the following error report for this EXPR:

		    J
		    <EXPR> + <expr>

	  The first line indicates the sub-EXPR that fails to make sense, J.

	  The second line indicates where in the overall EXPR the erroneous J
	  sits.  It appears within a "+" EXPR:

		    <EXPR> + <expr>

	  Of the two <EXPR>s in this rule, it is the first one that corresponds
	  to the expression J.	That first <EXPR> is ~capitalized because it
	  contains the offending J EXPR.  To the right of the "+", the <expr> is
	  ~not capitalized, meaning that this sub-EXPR (3*4) makes perfect sense.


30.2
Syntax Backtraces

	  These two lines represent what is called a ~syntax ~backtrace.	The
	  rules involved in this EXPR are:

		    <ID>					->	  <EXPR>		(for J)
	  and
		    <EXPR>	+  <EXPR>			->	  <EXPR>		(for +)

	  The offending J emerges in the first rule.  That EXPR, J, appears in
	  an application of the second rule.

	  Each line in our error report corresponds to one rule application:

		    J			  (the  <ID>  ->	<EXPR>  rule)
		    <EXPR>+<expr>	  (the  <EXPR>+<EXPR>  ->  <EXPR>  rule)

	  Except for the first line, each line has ~one item in upper case.
	  That upper-case part-of-speech corresponds to the entirety of the
	  previous line.

	  Notice in our error report that the <EXPR> to the left of the "+"
	  is capitalized.	 It corresponds to the previous line, the J.  (Notice
	  how J appears to the left of "+" in the original expression).


30.2.1
Another Example

	  Consider now the EXPR:

		    3 + J * 4

	  The error message this time reads as:

		    J
		    <EXPR> * <expr>
		    <expr> + <EXPR>

	  Again, the first line notes the error.	"J" doesn't make sense.  The
	remaining lines serve to pinpoint ~where that J is.  The second line
	shows that the J is used in a "*" EXPR, and in fact, that the J lives
	to the left of the "*" (the upper-case <EXPR>).

	Where does this multiply EXPR reside?  The third line tells us that.
	The multiply EXPR appears in a "+" EXPR, and in fact lies to the right
	of the "+".  Subsequent lines would show where this overall EXPR
	resides within the enclosing program specification.

	For example, suppose this EXPR appears in:

		I :=  3 + J * 4 ;

	We introduce one more line at the bottom of the error message to
	represent the enclosing assignment statement:

		<expr> := <EXPR> ;

	This indicates that our overall EXPR resides in an assignment
	statement, and in fact, to the right of the ":=".  The full backtrace
	is now:

		J
		<EXPR> * <expr>
		<expr> + <EXPR>
		<expr> := <EXPR> ;


30.2.2
Beautifying Syntax Backtraces

	The most recent backtrace can be written more conveniently as:

		<expr>  :=  <expr>  +  J  *  <expr>  ;
				       ^

	We've collapsed the lines by replacing each capitalized item with the
	  previous line.

	  Our earlier example:

		    I := J + 3*4 ;

	  whose syntax backtrace is:

		    J
		    <EXPR> + <expr>
		    <expr> := <EXPR> ;

	  collapses to:

		    <expr>	:=  J	 +  <expr>	;
				    ^


30.2.3
A Deeper Example

	  Consider the backtrace for the program:

		    IF  A < B  THEN  K := I + 1;
					   I := 3 + J * 4 ;
		    ELSE  K:= I - 1;	    FI

	  The offending assignment statement, which uses J, resides now in the
	  THEN-clause.  It is the ~second of the two STATEMENTs there.

	  The backtrace now is:

		    J
		    <EXPR> * <expr>
		    <expr> + <EXPR>
		    <expr> := <EXPR> ;

		    <statement> <STATEMENT>
		    if <expr> then <STATEMENT> else <statement> fi

	  The last line indicates that the error is in the THEN-clause (where
	  <STATEMENT> is capitalized).  The next line up expands on the THEN-
	  clause.  It shows that the THEN-clause consists of two statements, the
	  ~second of which contains the error.  (It is capitalized).  The first
	  four lines pinpoint the error within that (second) statement.

	  Beautified, this reads as:

		    if <expr> then <statement> <expr> := <expr> + J * <expr> ;
										  ^
		    else  <statement>  fi


30.2.4
Another Kind Of Datatype Error

	  Let's consider another type of datatype error.  Assume now that J ~is
	declared, but that it has been declared to be of a type that makes
	no sense to "+" or "*", like the type CHOICE_OF_PHRASES.  In the EXPR:

		J * 4

	J itself makes perfect sense.  This time, however, it is the
	multiplication that chokes.  It isn't defined to take in a
	  CHOICE_OF_PHRASES.  The syntax backtrace this time is ~not:

		    J
		    <EXPR> * <expr>

	  but instead is:

		    <expr> * <expr>

	  Neither one of the <EXPR>s are capitalized, because each
	  one makes sense of some kind independently.  It's just the
	combination by "*" that fails to make sense.  Here, the NO block
	(having no phrases) is ~not a leaf of the tree.

	Continue assuming that J is of a type that makes no sense with "*".
	The STATEMENT:

		I := 3 + J * 4 ;

	now has as its full backtrace:

		<expr> * <expr>
		<expr> + <EXPR>
		<expr> := <EXPR> ;

	Beautified, it appears as:

		<expr>  :=  <expr>  +  <expr>  *  <expr>  ;
				       ^^^^^^^^^^^^^^^^^

	The first line in the syntax backtrace is highlighted, indicating
	that the error occurs in the:

		<expr> * <expr>

	(Again, each of those two <expr>s make sense individually, but their
	combination via "*" makes no sense).


30.2.5
Summary

	Here is a summary about syntax backtraces used for reporting datatype
	errors:

	   1)	The first line shows the (smallest) program fragment that
		fails to make sense.

	   2)	The second line shows a larger program that contains the error.
		One of the parts-of-speech in this line is always capitalized,
		showing where the first line resides within the larger
		fragment.

	   3)	Each subsequent line is like the second line.  It shows an even
		larger program fragment that contains the previous line.
		Capitalization once again indicates where the previous line
		fits within this line.

	It is helful to associate with the first line the ~possible datatypes
	that may be involved.  For example, in our previous example where J
	was undeclared, the first line has ~no associated datatypes, and
	therefore can appear as:

		J	no datatypes

	In our latter example, where J is of type CHOICE_OF_PHRASES, the
	first line might appear as:

		<expr> * <expr>		(CHOICE_OF_PHRASES) , (INT or REAL)

	The first EXPR, which by itself makes sense, can be seen as a
	CHOICE_OF_PHRASES.  The second EXPR, which also makes sense by itself,
	can be seen as an INT or a REAL.  The error is that no pair of types,
	one taken from the left <expr> and one taken from the right <expr>
	can be combined by "*".

		BOX:	What is a syntax backtrace?
		BOX:
		BOX:	How does one "beautify" a syntax backtrace?
		BOX:
		BOX:	Does the first line or the last line in a syntax
		BOX:	backtrace represent the largest piece of the program
		BOX:	specification?


30.3
Implementation Of Syntax Backtraces

	Recall that the semantics of the syntax grammar generates phrases in
	the datatype language.  When things don't make sense, no full-spanning
	  types (unit-length phrases) come to exist.  For example, the EXPR:

		    3 * 4

	  generates the phrase

		    INT  INT  *

	  which rewrites to a single overall INT because the datatype grammar has
	  a rule for INTeger multiplication.  We thus have a full-spanning type,
	  INT.  The EXPR is known to make sense.

	  If J is undeclared, then the EXPR "J" generates no phrases.  Not only
	  does J have no phrases, the EXPR:

		    J + 3 * 4

	pic

	  has no full-spanning types.	 Figure 30.1(a) shows the semantic
	  structure for this EXPR.  Each block in that figure has a YES or NO
	  next to it.  YES means there are some full-spanning type(s), and NO
	  means there are no full-spanning types.

	  The block containing J has no full-spanning types, and the "+" block
	  above also has no full-spanning types.	In contrast, each of the "3",
	  "4", and "3*4" blocks does have full-spanning types (e.g., INT).

	  The syntax backtrace is acquired by looking upwards from the
	  offending J block, up towards the top of the tree.	The sequence of
	  rules encountered in that trip forms the syntax backtrace.


30.3.1
How An Error Is Detected

	  An error is detected in some given semantic block when ~both of the
	  following are true of that block:

	     1)   This semantic block generates ~no full-spanning types, and

	     2)   ~All the blocks immediately below it ~do ~have full-spanning
		    types.

	  The first condition is obvious.  Something is wrong if no full-spanning
	  types are generated by this block.

	  However, the actual error might occur ~lower in the semantic structure.
	  That is, this block might generate no types ~because ~a ~lower ~block
	  ~generated ~no ~phrases.

	  The second condition is around to assure that this block be the source
	  of the error, as all blocks below are required to be error-free.

	  For example, two blocks in figure 30.1(a) have no types.	Only the
	  lower NO block has an error (undeclared J).  The upper, "+", block is
	  not the source of the error.  It has a NO precisely because the lower
	  J block failed to generate any full-spanning types.

	  The other parts of this figure show more examples, identifying the
	  faulty block.


30.3.2
An Automatic Modification That Implements Error Reporting

	  Our following modifications can easily be introduced automatically.
	  We are therefore ~not imposing further requirements upon the author of
	  rules' semantics.

	To detect errors, we enclose the original semantic program as follows.
	We maintain a ~list of semantic errors in the global variable
	SEMANTIC_ERRORS:

		OLD_ERRORS:= SEMANTIC_ERRORS ;  "Remember present status of
						 SEMANTIC_ERRORS."

		original program	"e.g.,  <*E1*>; - <*E2*>; - <*B*>; "

		" if any of the sub-blocks generated an error, SEMANTIC_ERRORS
		  will differ from OLD_ERRORS. "

		IF  OLD_ERRORS =:= SEMANTIC_ERRORS  "None of our sub-blocks has
						     errors."
		    &
		    "there are no full-spanning types... "

		    NEVER  P.LEFT =:= LEFT  FOR P $E C;
		    &
		    -DEFINED( FULL_SPANNERS )

		THEN	"We have detected an error at this semantic block.
			 Put our error onto SEMANTIC_ERRORS..."

			SEMANTIC_ERRORS::= $>  THE_BACKTRACE;  "(see below
								 about THE_
								 BACKTRACE)"
		FI

	This program detects an error according to our two criteria shown
	earlier.

	In case of error, it sticks THE_BACKTRACE into SEMANTIC_ERRORS, which
	is supposed to identify the error.


What's In THE_BACKTRACE?

	  What would we like THE_BACKTRACE to hold at the moment we detect our
	  error?  Consider figure 30.1(a).	If we are at the J block, we want to
	  report:

		    J
		    <EXPR> + <expr>

	  It would be nice if THE_BACKTRACE would contain these two lines.

	  Now consider figure 30.1(b).  If we are at the J block
	  there, we want to report, and hence have THE_BACKTRACE contain the
	  lines:

		    J
		    <EXPR> * <expr>
		    <expr> + <EXPR>


	  Except for the varying capitalization, the two lines after the J can
	  be seen just by looking ~upward in the semantic structure.  Each
	  semantic block in the figure has associated text, the ~lefthand ~side
	  of the rule that produced that semantic block (e.g., "<EXPR>+<EXPR>").

	  We wish that THE_BACKTRACE represent those lines, the lines seen by
	  looking upward in the structure.	If such is the case, then at the
	  moment we want to report an error, THE_BACKTRACE holds just the
	  sequence of lines we want to report.

	  We have each semantic block keep THE_BACKTRACE up to date by enclosing
	  the entire semantic program, including the augmentation shown earlier,
	  within:

		    HOLDING THE_BACKTRACE:= the_special_text <$ THE_BACKTRACE;
		    DO
				as before
		    ENDHOLD

	  This introduces one more line onto THE_BACKTRACE (the_special_text).
	  The new line identifies this semantic block by identifying this syntax
	  rule's lefthand phrase, as shown in the figure.  We use the HOLDING
	so that this insertion of an extra line onto THE_BACKTRACE is undone
	upon leaving this semantic routine.

	Since each step downward in the semantic structure introduces a new
	line onto THE_BACKTRACE, this variable does indeed hold the sequence
	of lines seen by looking back up the semantic structure.  Thus, at
	all times, whether or not there is an error, THE_BACKTRACE holds the
	sequence of lines that nearly identifies the semantic block we are
	at presently.

	Following is the EXPR-BOP-EXPR rule as originally specified.  (We
	leave out precedence considerations for simplicity):

		<EXPR:E1>  <BOP:B>  <EXPR:E2>	    ->

			<EXPR:   <*E1*>; - <*E2*>; - <*B*>;    >

	We augment this automatically to perform as though the specification
	were:

		<EXPR:E1>  <BOP:B>  <EXPR:E2>	    ->

		   <EXPR:
			HOLDING  THE_BACKTRACE:= '<expr> <bop> <expr>'  <$
								THE_BACKTRACE;

			DO
			   OLD_ERRORS:= SEMANTIC_ERRORS ;

			   ~<*E1*>; - ~<*E2*>; - ~<*B*>; "The original program"

			   IF  "error..."
				OLD_ERRORS =:= SEMANTIC_ERRORS  &
				NEVER  P.LEFT =:= LEFT  FOR P $E C;  &
				-DEFINED(FULL_SPANNERS)
			   THEN
				SEMANTIC_ERRORS::= $> THE_BACKTRACE;
			   FI
			ENDHOLD
		   >


	   BOX:	BOX:	What conditions must be true for a semantic block
		BOX:	to be considered in error?
		BOX:
		BOX:	What does THE_BACKTRACE hold upon arrival at any
		BOX:	semantic block?
		BOX:
		BOX:	What does SEMANTIC_ERRORS hold upon completion of
		BOX:	semantic processing?



The Capitalization

	Our rendition so far delivers lines that are all lower-case.
	We introduce capitalization by removing the outer HOLDING that
	augments THE_BACKTRACE.  We replace it by several HOLDINGs, where each
	one applies only to the invocation of a single semantic variable.

	That is, after removing the outer HOLDING, let's concentrate on the
	  original program:

		    <*E1*>; - <*E2*>; - <*B*>;

	  We transform this by introducing a HOLDING around each semantic
	  variable's invocation, as follows:

		HOLDING	THE_BACKTRACE:= '<EXPR> <bop> <expr>' <$ THE_BACKTRACE;
		DO
			<*E1*>;
		ENDHOLD
		-
		HOLDING	THE_BACKTRACE:= '<expr> <bop> <EXPR>' <$ THE_BACKTRACE;
		DO
			<*E2*>;
		ENDHOLD
		-
		HOLDING	THE_BACKTRACE:= '<expr> <BOP> <expr>' <$ THE_BACKTRACE;
		DO
			<*B*>;
		ENDHOLD

	By augmenting THE_BACKTRACE differently during the invocation of each
	semantic variable, we introduce the appropriate capitalization.
	For example, during the invocation of E1, the new line appended to
	THE_BACKTRACE is this rule's lefthand phrase where the first EXPR is
	  capitalized.  Again, these HOLDINGs are easily generated automatically.

	  Having removed the outer HOLDING, this rule is ~not represented on
	  THE_BACKTRACE by the time we might enter the THEN-clause, if we
	  determine that this block is in error.	We modify the THEN-clause so
	  as to put our rule onto THE_BACKTRACE, ~before introducing
	  THE_BACKTRACE to SEMANTIC_ERRORS:

		    IF  "error" ...

		    THEN	SEMANTIC_ERRORS::= $>

					  ('<expr> <bop> <expr>'  <$	THE_BACKTRACE);
		    FI

	    BOX:  BOX:	How is capitalization introduced (into THE_BACKTRACE)?
		    BOX:


Supressing The Appearence Of Some Parts-Of-Speech

	  The backtraces just implemented produce a line like:

		    <EXPR> <bop> <expr>

	  instead of what we saw earlier:

		    <EXPR> + <expr>

	  We might like BOPs to show up not as <BOP>, but instead as
	  "+".  In the place of <bop>, we want instead the ~lefthand phrase of
	  the particular rule that generated this BOP.

	  For example, a BOP generated by the rule:

		    +		->	  <BOP>

	  should show up as that rule's lefthand phrase, the "+".

	On a ~per part-of-speech basis, we can dictate whether or not that
	part-of-speech be ~admissable in a backtrace line.  If it's not
	  admissable, we replace the appearence of the part-of-speech in the
	  backtrace line by the lefthand side of the rule that produced this
	  part-of-speech.

	  The implementation of this modification is relatively straightforward,
	  like our previous modifications.