CHAPTER 14


				ACHIEVING TERMINATION
					  AND
		    AN UPPER-BOUND FOR ALL CONTEXT-FREE GRAMMARS



	  Consider two observations about the parser.


14.1
Infinite Execution

	  If the grammar contains a rule like:

		    <A>	->	  <A>

	  or a set of rules like:

		    <A>	->	  <B>
		    <B>	->	  <A>

	  then our parser will never finish execution.	Whenever you pass an <A>
	  to GIVE, GIVE will call the grammar.  The grammar will see that <A> in
	  the IF-statement for the rule:

		    <A>	->	  <B>

	  The matched rule then calls GIVE with <B>.  GIVE calls the grammar
	  again and this time the rule

		    <B>	->	  <A>

	  applies, calling GIVE once more with an <A>.

	  GIVE calls the grammar and the grammar calls GIVE forever.  Neither
	  ever has a chance to return.  GIVE is called recursively with the
	  sequence of parts-of-speech:

		    <A>, <B>, <A>, <B>, <A>, ...


	  One could imagine banning such rules.  However, they come up in the
	  domain of datatypes, e.g.,

		    int	->	  real
		    real	->	  int

	  This particular pair of rules might be objectionable to some people,
	  particularly the rule:

		    real	->	  int

	  where the fractional part of the real is lost.  There are however many
	  such pairs of rules that are very useful (e.g., Section 23.5.3.2).

	  Upward compatability is often achieved by introducing the pair of
	  rules:

		    old_type	  ->	    new_type
		    new_type	  ->	    old_type

	  The "old_type" is used presently in the program, but newer programs
	  need to use instead the "new_type".  This pair of rules allows this
	  to occur without changing any of the old system.  Any place that the
	  old_type is required, you can supply either the old_type or the
	  new_type.	 Similarly, any place the new_type is required, you can once
	  again supply either the old_type or the new_type.  More about this
	  later.


14.2
Exponential Execution

	  pic

	  Even if we were to ban such coercion rules that lead to infinite
	  parser execution, our parser could still consume exponential time and
	  memory, as a function of the length of the input string.	Consider
	  figure 14.1.  It shows three occurences where there are two identical
	  PHRASE blocks in a CHOICE_OF_PHRASES.  The PHRASE blocks are identical
	  in part-of-speech and in the contents of their LEFT fields.

	  Suppose you have the rule:

		    <B>  <A>	  ->	    <X>

	  This rule will see ~four occurences of its want-phrase emenating from 
	  the rightmost CHOICE_OF_PHRASES.	(There are two A's, and to the left
	of each, there are two B's).

	  Figure 14.1(b) shows the result of those four matches.  There are now
	  four <X> phrases.  Now supposed we have another rule:

		    <C>  <X>	  ->	    <Y>

	  This rule's want-phrase has ~eight matches.  Each of the two <C>s
	can participate with four <X>s.  Figure 14.1(c) shows the eight phrases
	produced by this rule.

	In general, a sequence of N duplicates will give rise to 2^N PHRASE
	blocks.  (We've just seen the cases for N=2 and N=3).

		    BOX:	How can our parser presently get into an infinite loop?
		    BOX:
		    BOX:	How can exponential time be consumed?


14.3
Duplicate PHRASE Blocks Cause Only Redundencies

	  When you call GIVE with a PHRASE block, GIVE behaves in a way which
	  is dependent only on the PHRASE block alone.	GIVE reads no other
	  variables.  (Although technically it reads C in the operation:

		    C:= C $> P ;

	  the contents of C affect in no way GIVE's behavior, beyond this one
	statement).

	Since GIVE depends only on the PHRASE block, you can predict GIVE's
	  behavior.	 If you pass to GIVE that same block, GIVE
	  will behave exactly as before.  GIVE behaves as before when given a
	  PHRASE block whose LEFT and POS fields match those of the original.

	  Suppose we find on C a PHRASE block whose LEFT and POS match those
	  of P, the PHRASE block passed to GIVE.	GIVE will behave ~now exactly
	  as it did before, when GIVE received the first one.	 GIVE's execution,
	including its calls to the grammar, will generate now exactly the same
	phrases as it did before.

	If we have GIVE supress its execution when given a duplicate PHRASE
	block, there will be no shortage of phrases.  Any phrases dependent
	on the PHRASE block were already generated, upon first encounter.  We
	redefine GIVE now to supress duplicates:

		DEFINE	GIVE( P: PHRASE ):

		    BEGIN  VAR P1=PHRASE;

			IF  FOR P1 $E C;  NEVER  (P1.POS=P.POS) &
						 (P1.LEFT=P.LEFT)

			THEN	C:= C $> P;

				the_grammar(P);

			FI
		    END
		ENDDEFN

	Paraphrased, this reads as:

		DEFINE	GIVE( P: PHRASE ):

			IF  for all P1 in C, ~never is there a P1 which matches
						    P (where P1 and P have
						    identical POS's and LEFT's)

			THEN	do as before

		ENDDEFN

	GIVE now ignores a given PHRASE block if C already contains an
	identical one.

	We will return to this modification of GIVE when we consider semantics,
	in Chapter 15.  (The old and the new identical PHRASE blocks might
	differ in semantics. Presently, we keep only the semantics associated
	with the ~first occurence of a duplicate PHRASE block, as all
	subsequent duplicates are ignored entirely).

		BOX:	Why does GIVE behave identically the second time,
		BOX:	when given a duplicate PHRASE block?


14.4
Termination And Polynomial Upper Bound For Context-Free Grammars

	With the modification to GIVE, our parser will now finish execution
	in finite time for all context-free grammars, including those
	containing cyclic coercion rules, like:

		<A>	->	<B>
		<B>	->	<A>

	In fact, not only can we prove that the parser terminates, but that it
	consumes at most polynomial time and memory, as a function of N, the
	length of the input string.  What we show here is only ~worst ~case
	behavior.  In practice, things rarely behave in the worst case manner.
	(There may be ~local areas of worst-case behavior, but the locality
	keeps the effective N tolerably small).


14.4.1
Any CHOICE_OF_PHRASES Has At Most N*P Elements

	The following is an essential observation that is true only for
	context-free grammars:

		All calls to GIVE from the grammar program pass in a PHRASE
		block whose LEFT field is taken from an ~existing PHRASE block.

		(The LEFT field is taken from the leftmost PHRASE block in the
		want-phrase).

		Thus, the context-free grammar never generates a new
		CHOICE_OF_PHRASES.  All LEFT values are taken from existing
		PHRASE blocks.

	There is only one place where a LEFT field is ~not taken from an
	existing PHRASE block.  While taking user input, we GIVE a PHRASE block
	whose LEFT is the CHOICE_OF_PHRASES built up for the previously input
	character.  That left CHOICE_OF_PHRASES is new.  It does not yet
	come from some PHRASE block's LEFT field.

	  Since the taking of user input creates N CHOICE_OF_PHRASES when given
	  N characters, there are N CHOICE_OF_PHRASES total.	(None are
	  generated by context-free grammars).  Thus, given any PHRASE block,
	  its LEFT field can take on at most N distinct values.

	  Consider the N'th CHOICE_OF_PHRASES, which contains the N'th input
	  character.  How many PHRASE blocks can it contain?	All its
	  PHRASE blocks can have at most N distinct LEFT fields.  However, two
	  PHRASE blocks with identical LEFTs can both exist if they have
	  ~different parts-of-speech.	 We supress PHRASE blocks only if they
	  are identical in both the LEFT and POS fields.

	  If the grammar has only P parts-of-speech throughout its rules, then
	  we know that the maximum number of distinct PHRASE blocks in the
	  CHOICE_OF_PHRASES is

		    N*P

	  That is, each PHRASE block is a pair:

		    (LEFT,POS)

	  where there are N possible LEFTs and P possible parts-of-speech.

		    BOX:	Why does a CHOICE_OF_PHRASES have at most N*P PHRASE
		    BOX:	blocks for context-free grammars?
		    BOX:
		    BOX:	Why is this still true even if the grammar includes
		    BOX:	the pair of rules:
		    BOX:
		    BOX:		  <A>	    ->	<B>
		    BOX:	and
		    BOX:		  <B>	    ->	<A>		    ?


14.4.2
How Many Times Can The Grammar Be Called Upon Taking In The N'th Character?

	Given the N*P upper bound on the number of PHRASEs in a
	CHOICE_OF_PHRASES, we now discover how many times the grammar
	can be called.

	GIVE calls the grammar only when given a PHRASE block that has no
	duplicate in C.  ~There ~are ~at ~most ~N*P ~such ~PHRASE ~blocks.

		Suppose we discover that the grammar is called N*P+1
		times.  Since each call to the grammar corresponds to the
		introduction of another element onto C, C must now contain
		N*P+1 PHRASE blocks.  This is impossible since there can be at
		most N*P elements on C.

	Thus, upon taking in the N'th character, the grammar will be called at
	  most N*P times.

		    BOX:	Why can the grammar be called only N*P times, while
		    BOX:	processing the N'th character>?


14.4.3
How Many Times Can The Grammar Call GIVE?

	For each call to the grammar, the grammar can itself call GIVE only a
	certain number of times.  The grammar calls GIVE from within the big
	FOR-quantifiers that represent the rules' want-phrases.  How many
	  iterations can those FOR-quantifiers generate?

	  Consider a rule whose want-phrase is of length L.  That rule is
	  represented by:

		    P(l):= P;
		    IF  P(l).POS = <POS(l)>  THEN

				FOR P(l-1) $E P(l).LEFT;  WITH ... ;
				!!
				FOR P(l-2) $E P(l-1).LEFT;  WITH ... ;
				!!
				...
				!!
				FOR P(1) $E P(2).LEFT;	WITH ... ;
				DO
					  call GIVE
				END
		    FI

	  Let's ignore the WITH-clauses.  They serve only to reduce the number
	of iterations.
      ---------------- Parentheses in previous paragraph mean subscripting! ---

	How many iterations can be caused by one FOR-quantifier, e.g.:

		FOR P(i-1) $E P(i).LEFT;

	P(i).LEFT is a CHOICE_OF_PHRASES.  Any CHOICE_OF_PHRASES can contain
	at most N*P elements, and so this quantifer causes at most N*P
	iterations.
      ---------------- Parentheses in previous paragraph mean subscripting! ---

		(P(i).LEFT is a CHOICE_OF_PHRASES to our left, and so if
		it contains the K'th character, then K < N.  This quantifier
		    can cause actually only K*P iterations, which is less than our
		    pessimistic N*P).
	---------------- Parentheses in previous paragraph around "i"
	----------------	mean subscripting! ------------------------------------


	  For a rule whose want-phrase has length L, there are L-1 quantifers
	  nested within one another.	Since each causes at most N*P iterations,
	  the whole grand quantifier can cause

		    (N*P)^(L-1)

	  iterations.

	  Let L be the length of the ~longest want-phrase.  Each rule thus causes
	  ~at ~most (N*P)^(L-1) iterations.	 If there are R rules, then one
	  call to the grammar can cause

		    R * (N*P)^(L-1)

	  iterations.  Each iteration may call GIVE once.

	  We now know the following upon taking in the N'th character:

	   1)	The grammar is called at most N*P times.

	   2)	Each call to the grammar can generate at most

			R*(P*N)^(L-1)

		iterations, or calls to GIVE.

		We also know:

		   a)	Each call to the grammar, ~excluding ~its ~calls ~to
			~GIVE, can consume at most this amount of time (by
			counting the number of iterations).

		   b)	Each call to the grammar can issue at most this many
			calls to GIVE (one per iteration).

	Since the grammar itself can be called at most N*P times, the ~total
	time taken in all calls to the grammar, excluding its calls to GIVE,
	is at most:

		(N*P)  *  R*(N*P)^(L-1)

	This is also the maximum number of times total that GIVE can be called
	from all executions of the grammar.  Simplified, this number is:

		R * (N*P)^L

	BOX:	BOX:	How many iterations, or calls to GIVE, can occur upon
		BOX:	calling the grammar only once?
		BOX:
		BOX:	There is an exponential here, but that exponent is
		BOX:	~not N.  What is the exponent?


14.4.4
How Much Time Is Spent In GIVE Excluding Its Call To The Grammar?

	We know how many times GIVE can be called total.  How much time is
	spent within each call to GIVE?  GIVE must look thru C, which can have
	at most N*P elements in it.  The total time taken inside the GIVE
	program, for all calls to GIVE, is thus N*P times the totally number of
	calls to GIVE:

		R * (N*P)^(L+1)

	At most this much time is spent within the GIVE program.

	As noted earlier, the total time taken inside the grammar program is
	at most:

		R * (N*P)^L

	The dominating factor between these two, GIVE and the grammar, is the
	time taken inside GIVE:

		(N*P)^(L+1)

	(We omit the R because we will understand that the actual time spent
	is ~proportional to (N*P)^(L+1)).


14.4.5
Time Taken To Process All N Characters

	We now know that the time taken to process the N'th character is at
	  most:

		    (N*P)^(L+1)

	  How long does it take to process all characters, one thru N?  Each one
	  takes this much time, and so the processing of N characters takes N
	  times that amount:

		    N^(L+2)

	  (We've dropped the P because P, the number of parts-of-speech, is
	constant no matter what N is.  We'll make it part of the
	  proportionality factor).

	  This number represents the most time it could take to parse N
	  characters, for any context-free grammar.  L is the length of the
	  grammar's ~longest want-phrase.

		BOX:	Suppose you ~double the number of parts-of-speech
		BOX:	in the grammar, how will ~worst-case parsing time
		BOX:	be affected?


14.4.6
What The Upper Bound Means

	We've seen that the processing of N input characters is proportional
	  to:

		    N^(L+2)

	  This is a polynomial upper bound as a function of N.

	  This applies not only to execution time, but to memory as well.

	  The creation of each block consumes some time, say T.  Given only
	  N^(L+2) execution time, the greatest number of memory blocks that can
	  be created is:

		    N^(L+2) / T

	  T is independent of N, and so the total amount of memory used is
	  proportional to N^(L+2).

	  If you wish, you can reform the grammar, automatically, so that the
	  longest want-phrase has length L=2.  Given a rule whose want-phrase
	  is greater than two, say L:

		    <POS(1)> <POS(2)> ... <POS(L)>	   ->	    <GIVE_POS>

	  you can replace it by ~two rules, which involve a brand new part-of-
	  speech, say <X>:

		    <POS(1)> <POS(2)>	    ->	<X>

		    <X> <POS(3)> <POS(4)> ... <POS(L)>	  ->	    <GIVE_POS>

	  The rule whose want-phrase contains <X> has now length L-1.  This
	  process can be repeated in finite time for all rules.  The resulting
	  grammar will have the longest want-phrase, L, be 2.	 A grammar rendered
	  this way is said to be in ~Chomsky ~Normal ~Form.  With such a
	  grammar, our parsing time is N^4.
	---------------- Parentheses in previous paragraph mean subscripting! ---

	  In practice however, rules with long want-phrases typically involve
	  some terminal characters.  The FOR-quantifier for a terminal character
	  causes only one iteration.	While using this parser in a production
	  environment for over ten years, we've never reduced grammars to
	Chomsky Normal Form, and we've had acceptable performance throughout.

	  Still, if you want to introduce more efficiency into this parser, you
	  can reduce the N^4 to N^3, if you forbid general rewrite rules.	 Recall
	  that one of the factors of N came from within GIVE, as it searched C
	  for duplicate PHRASE blocks.  We can remove that factor of N by
	  constructing a one-dimensional array to go along with C.	The array has
	  N entries.  Each entry corresponds to one of the N possible values that
	  can appear in a LEFT field.

	  Each entry in the array contains a list of PHRASE blocks, all of which
	  have the ~same ~value in LEFT.  There are
	  at most P elements in such an entry.  Thus, given
	  a new PHRASE block, we use its LEFT to index into the array, at which
	  point the chosen entry contains at most P list elements.	Those P must
	  be searched, but that time is independent of N.  The factor of N
	  has disappeared by indexing into the array of length N and searching
	  a list of length P, instead of searching a single list of length N*P.


	  We've experimented with long strings, contrived ones to be sure, that
	would cause the parser to behave in the worst case manner.  The N^4
	part of the polynomial had insignificant effect on execution time,
	because its coefficient is very small.  When we observed long execution
	times, we ~doubled the length of the input string, and yet observed not
	16 times as much execution time (as would be the case of the N^4 were
	significant), but only 7 times as much execution time.  Even the N^3
	component wasn't completely dominating the execution time.	In
	  conclusion, the N^3 component made things too bad before the N^4
	  component ever had a chance to contribute to the slow behavior.

	  These experiments used ICL's syntax grammar, which is not in Chomsky
	Normal Form.

	Languages designed to maximize human convenience
	benefit by what is delivered for nonlinear costs.  Allowing the
	programmer to specify

		A + B # C

	without parentheses is a nonlinearity.  However, people rarely make
	long expressions that involve nonlinearities.  For example, this
	nonlinearity may be contained in an assignment statement like:

		P:= A + B # C ;

	N is small for the "A+B#C"

	Another, richer example of valuable non-linearity concerns the
	expression:

		X#Y \BOX U#V  \PAINTED GREEN  \ROTATED  90 \DEGREES

							\UNION  B \BLOATED 10

	Here, we are specifying a box that is painted green and rotated by
	90 degrees, along with (\UNION) the box B bloated by 10 units.  The
	grouping parentheses can be known only after syntax processing, when
	the types of X,Y,U,V, and B become known.  This example takes
	advantage of ICL' syntax which allows functions to be called with an
	  ~infix notation (\function_name).	 ICL's infix notation provides for
	the very readable expression.  If X,Y,U, and V are REALs, and B is a
	box, then the grouping becomes:

		((((X#Y) \BOX (U#V))  \PAINTED GREEN)  \ROTATED (90 \DEGREES))
					\UNION (B \BLOATED 10)

	In the absence of the infix function calling notation, this
	expression would have to be specified in the usual, prefix form:

		UNION( ROTATED( PAINTED( BOX( #(X,Y) , #(U,V) ), GREEN ),

				DEGREES(90) ) ,  BLOATED( B , 10 ) )


	People tend to group their thoughts in a hierarchical fashion, e.g.,
	like sentences in a paragraph, paragraphs in a section, sections in a
	chapter etc.  At each level in the hierarchy, N is usually small,
	perhaps about 7.  (N is rarely, if ever, more than 50).  Thus,
	languages designed specifically for human convenience rarely involve
	large N at any one place.

	People often cannot understand a very long sentence, and the speaker
	may be asked to rephrase it.  For a computer to ask for rephrasing
	would be very natural and acceptable, if the language is convenient
	for humans.  In contrast, if the person had to convolute his or her
	thinking so as to talk in an awkward language, he or she would be
	understandably frustrated at the computer's rejection.

	  Language designed for convenient human use must often involve non-
	  linearities, but the human can express his or her thoughts so briefly
	  that the non-linearities are never significant.
----------- previous paragraph all italics ----------------

	  When computers talk among themselves, their common language need not
	  be convenient for people.  One can make very simple grammars for the
	  computers to use.  Computers can generate and understand long
	  expressions in such simple grammars in linear time.	 Such a language
	  might be postfix polish, where

		    1 2 +

	  represents what we would write as 1+2.	Similarly, 1+2*3 would be
	  written as:

		    1 2 3 * +

	  These kinds of languages will cause our parser to consume only linear
	  time.


		    BOX:	Why is nonlinearity tolerable when dealing with human
		    BOX:	languages?
		    BOX:
		    BOX:	Programming is a human activity.  Should new
		    BOX:	programming languages support briefer and clearer
		    BOX:	expression?
		    BOX:
		    BOX:	Given a programming language that is more human
		    BOX:	oriented than today's languages, why can we then
		BOX:	tolerate nonlinearities?
		BOX:
		BOX:	Which of the three forms of the long programming
		BOX:	expression is easiest for you to read?