CHAPTER 7


				EMBEDDED ASSEMBLY LANGUAGE



	  In our effort to write a compiler, we will need a way for our program,
	  the compiler, to generate machine language programs.  We present
	  here a way to generate assembly language programs ~under ~program
	  ~control.	 The fruits of this endeavor will appear in our sample
	  compiler in Chapter 8.

	  You know that once you have the freedom to program, you can relieve
	  yourself from much work.  We extend that freedom to the ~specification
	  of assembly language programs.

	  Tricks that you think of that help ease the specification task can be
	  ~reused by defining each as a function (as we do in Section 7.7).  A
	  compiler is such a set of tricks along with a special syntax (Chapter
	  8).

	  To render programmable the specification of assembly language programs,
	  there are three ways to proceed.

	  Our compiler's execution could generate a ~text ~file containing an
	assembly language program.  That assembly language file could
	ultimately be turned into machine language by invoking the assembler.
	Invoking the assembler, however, is relatively slow, because the
	assembler naturally has to analyze the text, looking up names, like
	names for our labels, upon each appearence.

	A second way by which our compiler could generate a machine language
	program is to actually write machine language instructions into
	consecutive bins in main memory.  The advantage here is the avoidance
	of invoking the assembler.  We translate non-stop into machine
	language.  The disadvantage of this method is that we, the compiler,
	lose the convenience offered by assembly language's labels.

	  The third is a kind of mix of the first two methods.  Like the second
	  method, let's generate machine language directly into memory, but
	let's also introduce our own concept of labels so that we, the compiler
	  writers, can enjoy the same automation offered by assembly language's
	labels.  This lifts the compiler writer up to at least the same level
	as an assembly language writer, and yet bypasses any subsequent
	assembly step.


7.1
Mimicking Assembly Language By Using Functions

	Let's pursue the third method.

	  We start by defining a small set of functions which mimic what an
	  assembler does.	 We need to invent a function name for each linguistic
	  construct in assembly language.  By providing a ~function notation for
	  assembly language, we can then write programs in our programming
	  language that ~call upon those functions.  That is, under program
	  control, we will be able to generate machine language code, with the
	  same conveniences offered to assembly language programmers.

	  We will use our model programming language, ICL, for all our examples.
	  ICL was used in Chapter 1 to specify semantics.  In Chapter 8 we
	  will implement a compiler via syntax rules.  Their semantics will be
	  specified in ICL, and will call upon the functions we introduce now.

	  (ICL is shown in detail in Part 6).

	  For example, where we wrote

		    LOOP:

	  we now write

		    EQU(LOOP);

	  This latter notation is legal in ICL.  It is a call to the function
	  EQU.  Also, where we use to specify the contents of a bin via:

		    333

	  we will now write the following function call:

		    PUT(333);

	  Where we used to write

		    LOAD  1 , IT

	  we will now write the function call:

		    LOAD(1,IT);


	  Notice that we've captured ~in ~our ~programming ~language all the
	constructs available to the assembly language programmer.  We are
	~embedding assembly language in our programming language ICL.

	Let's see how the following assembly language program will appear when
	  rewritten in terms of our new functions.  Here is the original
	  assembly language source:

		    ONE:	1
		    IT:	2
		    COUNT:	10

		    START:	LOAD	1 , COUNT

		    LOOP:		  LOAD  2 , IT
					  ADD	  2 , 2
					  STORE 2 , IT

					  SUB	  1 , ONE
					  JUMPGT 1, LOOP

		    END:	Next instruction

	  We rewrite this assembly language into the following, which consists
	  only of function calls:

		    EQU(ONE);
				PUT(1);
		    EQU(IT);
				PUT(2);
		    EQU(COUNT);
				PUT(10);

		    EQU(START);
				LOAD(1,COUNT);

		    EQU(LOOP);
					  LOAD(2,IT);
					  ADD(2,2);
					  STORE(2,IT);

					  SUB(1,ONE);
					  JUMPGT(1,LOOP);
		    EQU(END);

	  The execution of this ICL program generates a machine language program.
	  It generates the machine language program that you imagine when you
	  read this example text.

	  We have omitted some surrounding text which must ultimately appear.
	  (See Section 7.5 for the whole specification).

	  We call this new notation the ~embedded assembler language.  It is
	  a valid program, written in our programming language ICL.	 It follows
	  ICL's syntax.  However, its meaning is that of assembly language.  We
	have the semantics of an assembler appearing now ~embedded in ICL's
	  syntax.

	  In general, any language (e.g., assembly language) can be embedded in
	  any programming language.  You simply define functions in that
	  programming language to execute the meaning of each linguistic
	  construct in the original language.  You lose the original language's
	syntax.  The embedded language follows the syntax of the programming
	language, ~and ~hence ~these ~new ~features ~can ~be ~called ~by
	~other ~programs!

	In Section 7.7 we will define new,
	higher-level functions, also called macros, which will call upon the
	basic functions we now define.  This will expose more clearly the
	advantages of embedded assembly language.

	We embark now to provide definitions for the functions referenced
	earlier, e.g., EQU, LOAD, etc.  This effort actually implements
	the embedded assembler language.  This will provide the library of
	functions that will be called from within our compiler.


		BOX:	What are the advantages of embedded assembly language?
		BOX:
		BOX:	What's the advantage of embedded languages in general?


	  The following presents in detail the implementation of these functions.
	  You may wish to gloss over the details.	 The essential set of
	  functions that we'll need for subsequent material appears in the
	summary (Section 7.4.2).


7.2
Assembling One Word  -  The Assembler's PC

	  The first function we implement is PUT.	 We will assume the existence
	  of the following function:

		    PUT_INTO_MEMORY( Data: INT  Address: INT ) ;

	  This function allows us to put any ~data into any ~address.  Repeated
	  use of this function will create the machine language program.

	  The assembler puts data into consecutive memory bins.  Let's declare a
	global variable named PC to hold the address of the ~next bin in memory
	that will receive data:

		VAR   PC = INT ;	" Holds the address of the bin into
					  which the next instruction (or data)
					  will be put. "

	We define now our PUT function that was referenced in the embedded
	assembly language example shown earlier:

		DEFINE	PUT( DATA: INT ):

			PUT_INTO_MEMORY( DATA , PC );
			PC:= PC + 1;

		ENDDEFN

	This new PUT function does not take in an address (as
	PUT_INTO_MEMORY does).  PUT takes in only the data we want to appear
	in the next bin.  PUT supplies the address in PC as the second
	parameter to the PUT_INTO_MEMORY function.

	Notice how the PC is left pointing to the next bin in memory.  That
	PC will be used as the address into which the next PUT will place its
	data.

		BOX:	What does PUT do?
		BOX:
		BOX:	What is the variable PC for?


7.3
Assembling Addresses  -  Labels

	What is a label?  It represents an address.  Unfortunately, ~which
	address might not be known soon enough.  For example, consider the
	following assembly language program:

			JUMP  LOOP
			...
			JUMP  LOOP
			...
		LOOP:	...
			JUMP  LOOP

	Upon seeing the first reference to LOOP, in the first JUMP instruction,
	we don't know yet which address will be represented by LOOP.  We won't
	know that address until the "LOOP:" is seen.  The last
	"JUMP LOOP", however, easily knows the address represented by LOOP.

	The first two references to LOOP are called ~forward ~references.  The
	last reference to LOOP, the easy kind, is called a ~backward
	~reference.

	Although we don't know the value of LOOP upon the first reference, we
	  do know the address of the JUMP instruction, the address in PC.	 It
	  is possible to ~delegate ~to ~the ~future the job
	  of coming back (to PC) and filling in the correct value for LOOP.

	  We begin by declaring a new datatype called LABEL.	A LABEL will
	  hold enough information to support the future task of going back and
	  filling in correct values within each forward reference.

	  A label will contain two fields.	One will hold the LABEL's value, if
	such is known.  That information is known after we've passed the
	  "LOOP:".	All backward references to LOOP occur after LOOP's value is
	known, and so will immediately get LOOP's correct value from this
	  field.

	  We will use the second field in a LABEL to accomodate forward
	  references.  This second field will be a list of addresses.
	  Each address will denote a bin in memory which contains a
	  forward reference to LOOP.	For example, the list will hold the
	  addresses of the first two JUMPs by the time we arrive at "LOOP:".
	  Upon arriving at "LOOP:", which is where LOOP's value becomes known,
	we will go back to each address in LOOP's second field (the first two
	  JUMPs) and finally put in LOOP's correct value.


7.3.1
The Type LABEL

	We declare the datatype LABEL as follows:

	    TYPE   LABEL =  [ VALUE: INT   FORWARD_REFERENCES: REFERENCES ] ;

	This declares LABEL to be a ~record having two fields.  The VALUE field
	in any LABEL is an INTeger, a memory address.  The FORWARD_REFERENCES
	field is represented by the type REFERENCES, which is supposed to be a
	list of addresses.  We declare this other new type as follows:

	    TYPE   REFERENCES  =  { INT }  ;

	A REFERENCES is a list of INTegers.  (The "{}" always
	denotes lists).  Each INTeger in the list is the address
	of a bin that holds a forward reference to the LABEL.

	More about records can be found in Section 21.4.5.2 or 23.4.  More
	about lists appears in Section 21.4.5.1 or 23.3.

	We will always put a zero into the VALUE field initially.  Upon
	encountering the label's definition (e.g., "LOOP:"), we will replace
	  that zero by a non-zero value.  Thus, the appearance of a zero value
	  in the VALUE field indicates that we haven't yet arrived at "LOOP:",
	whereas a non-zero value indicates that we've already passed the
	  "LOOP:".

	  We create a new label by calling the function NEW, defined by:

		    DEFINE	NEW = LABEL :	 [ VALUE: 0 ]	    ENDDEFN

	  Any new label has a zero in its VALUE field (meaning undefined), and
	  the empty set for its FORWARD_REFERENCES field.  (More about
	  function definitions appears in Section 21.4.4.3 or 22.4.3).

	  We now provide two functions:  One will create a new reference to the
	  LABEL, e.g., at any of the "JUMP LOOP"s.  The other will define a
	  LABEL's value, e.g., when we reach the "LOOP:".

		BOX:	What are the two fields in a LABEL?
		BOX:


7.3.2
Inserting A Reference To A LABEL

	The following function puts into the next bin the value of a LABEL.
	For example, the JUMP function will call this function to put in the
	"LOOP" in "JUMP LOOP".  This function acts like PUT, except that it
	takes in not an INT, but a LABEL:

	    DEFINE   PUT_LABEL( L: LABEL ):

		IF  L.VALUE <> 0  THEN  "L's value is defined (non-zero)
						     (We've already passed the 'LOOP:')"

					PUT( L.VALUE );   "We can insert the
							   LABEL's value
									     immediately"

		    ELSE	"L's value is still unknown.  Create a forward
			 reference, and for now, put a zero into the bin"

			@(L).FORWARD_REFERENCES:=  PC <$ L.FORWARD_REFERENCES;

			PUT(0);
		FI
	    ENDDEFN

	If L's value (L.VALUE) is non-zero, meaning that its value has already
	  been defined, then simply PUT that INTeger value.  This case takes
	  care of backward references.

	  If L's value is zero, meaning undefined, we PUT a zero (a wrong value).
	However, we put the address of that bin, PC, into the list of
	FORWARD_REFERENCES.  Thus, the label L accurately represents the need
	to come back and ultimately put the correct value into this bin.

	(The assignment

		@(L).FORWARD_REFERENCES :=  PC  <$  L.FORWARD_REFERENCES ;

	puts the augmented list

		PC  <$  L.FORWARD_REFERENCES

	into L's FORWARD_REFERENCES field.	The "<$" operator forms a new
	  list by appending one element (PC) onto the front of the list
	  (L.FORWARD_REFERENCES).  This new augmented list gets put back into
	  L's FORWARD_REFERENCES list.  Thus, this reference (PC) comes to
	appear on the label's FORWARD_REFERENCES field.	 (All list operations,
	  including this one, appear in Sections 22.1.3 and 23.3).

	  (We use the following notation on the lefthand side of the ":=":

		    @( L ) . FORWARD_REFERENCES

	  The "@" is there to assure that the modification to L's
	FORWARD_REFERENCES is seen from all points of view, including that
	of the ~label ~itself.  More about the "@" appears in Chapter 10 and
	in Section 22.1.11).

		BOX:	How do PUT and PUT_LABEL differ?
		BOX:
		BOX:	How are forward references represented?


7.3.3
Defining A LABEL's Value

	  We now define the other function involving labels, EQU.  EQU defines
	  a label's value.

	Since "EQU(LOOP);" is supposed to mean "LOOP:", we want to set the
	label's value to be the address of the next bin into which new data
	  will be put.  At this moment, the address of that next bin is held in
	  PC.

	  Our EQU function looks something like PUT_LABEL, with an IF-THEN-ELSE
	  that distinguishes between whether the LABEL has a value or not:

		DEFINE  EQU( L: LABEL ):

		  BEGIN  VAR  REF = INT ;

		    IF  L.VALUE <> 0  THEN  HELP;  "The label is defined more than
								once!"

		    ELSE	@(L).VALUE := PC ;	"Define L's value (finally!)"

				" Fix up all forward references, now that we know L's
				  value... "

				FOR  REF  $E  L.FORWARD_REFERENCES;

				  DO	  PUT_INTO_MEMORY( L.VALUE , REF );	    END
		    FI

		  END
		ENDDEFN

	  The body of this function looks like

		    BEGIN  <DECL>
			...
		    END

	  where the <DECL> is

		    VAR  REF = INT ;

	  This creates a new variable, REF, which can hold an INTeger.  (More
	  about the BEGIN...END appears in Section 18.1.2, 21.4.2.2, or 22.2.4).

	  Inside the BEGIN...END, we see a single giant IF-THEN-ELSE (which ends
	  in FI (Section 21.4.2.3 or 22.2.3)).  If L's value is already non-zero,
	then we cry HELP because we are not the first to define that label's
	  value.  Somebody else had already EQU'd this label, and we cannot
	tolerate two definitions for the same label.  (This is a property of
	assembly language).

	In the ELSE clause, where we usually wind up, we define L's VALUE
	  field.  The value for this label is the next bin into which more
	  machine language will be written.	 Thus, if we were to write:

		    EQU(ONE);
				PUT(1);

	  the label ONE will have as its value the address of the bin into which
	  "1" will be put.

	  Having defined L's value, we now go back to fix up all forward
	references.  The statement:

		FOR  REF  $E  L.FORWARD_REFERENCES;

		  DO	...	END

	executes the "..." once for each value taken from the list
	L.FORWARD_REFERENCES.  This "FOR...$E...;" notation, called a
	~quantifier, causes a loop.  It sets REF in each iteration to the
	~next ~value in the list L.FORWARD_REFERENCES.  Thus, it ultimately
	sets REF to each forward reference.

	The "...", between the DO and END, the ~body of the loop:

		PUT_INTO_MEMORY( L.VALUE, REF );

	puts L's value into the bin whose address is in REF.	That bin was
	  noted as a forward reference.  This loop thus updates all forward
	  references to this label.

	  (This "FOR...$E...;" notation appears in Section 22.3.1).


		    BOX:	What does the EQU function do with a label?
		    BOX:
		    BOX:	What is the (briefer) assembly language for EQU?
		    BOX:
		    BOX:	Why might EQU cry HELP?


7.3.4
A Practical Matter

	  The function EQU, which defines a label's value and fixes up all
	forward references, is almost correct.  So far, we've assumed that a
	  label occupies a whole bin in memory.  This contrasts our earlier
	  assumptions.  For example,

		    LOAD  1 , IT

	  should ultimately be

		    LOAD  1 , 333

	  if the label IT's value is 333.  That bin shouldn't contain ~only the
	  333.  Recall that the machine language for

		    LOAD  1 , xxx

	  is
		    102 1 xxx

	  We want the label's value to overwrite only the last three digits, the
	xxx.  Unfortunately, our present functions overwrite all the other
	digits as well.

	It is most appropriate to ~add ~in a label's value rather than to
	  replace the bin's entire contents.  For example, if we translate

		LOAD  1 , IT		(1021xxx)
	into
		LOAD  1 , 0		(1021000)

	then we can fix up this forward reference simply by ~adding IT's
	  value 333.  That addition will not overwrite the first four digits, the
	  "LOAD 1,".

	  We modify our EQU function so as to add in a label's value rather
	than just storing it.  The changes appear in lower case:

	    DEFINE  EQU( L: LABEL ):

	      BEGIN  VAR  REF = INT ;

		IF  L.VALUE <> 0  THEN	HELP;

		ELSE	@(L).VALUE := PC ;

			FOR  REF  $E  L.FORWARD_REFERENCES;

			  DO	put_into_memory( get(ref) + L.VALUE , REF );
									  END
		FI
	      END
	    ENDDEFN

	Instead of putting L.VALUE, we now put

		GET(REF) + L.VALUE

	We assume that the GET function, like our PUT_INTO_MEMORY function, is
	built in.  The GET function takes in an address (REF) and yields the
	present contents of the bin at that address.  Thus, we replace the bin
	at address REF with its present contents ~plus L's value.

	  If we look at the function PUT_LABEL, we see that it always puts the
	  value of the label, or a zero, into the entire bin.	 How do we get the
	  "LOAD 1," into the bin in the first place?

	  We modify PUT_LABEL so as
	  to take in another parameter, called PLUS, which provides the initial
	  contents of the bin.	The modifications appear in lower case:

		DEFINE   PUT_LABEL( L:LABEL  plus:int ):

		    IF  L.VALUE <> 0  THEN  PUT( L.VALUE + plus );

		    ELSE	@(L).FORWARD_REFERENCES:= PC	<$  L.FORWARD_REFERENCES;

				PUT( 0 + plus );
		    FI
		ENDDEFN

	  In both the THEN and ELSE clauses, we add PLUS into the data that we
	  PUT.  The parameter PLUS will typically be a whole instruction, e.g.,

		    LOAD	1,0

	  where the memory operand is always zero.  Adding in the label's
	ultimate value will complete the instruction, filling in its memory
	operand.

		BOX:	What purpose is served by the PLUS parameter in the
		BOX:	PUT_LABEL function?


7.4
Assembling Machine Language Instructions

	Suppose there were no labels.  The instruction

		LOAD  1 , 333

	can be assembled easily.  First, recall that we specify this LOAD
	instruction by actually writing:

		LOAD(1,333);

	Let's define this LOAD function:

		    DEFINE	LOAD( REGISTER: INT  ADDRESS: INT ):

			  PUT(  102*10000 + REGISTER*1000 + ADDRESS  );

		    ENDDEFN

	  The expression inside the PUT, with the multiplications by 10000 and
	  1000, produces the large number that represents LOAD with its operands.
	  Thus,

		    LOAD(1,333);

	  PUTs the number

		    102 1 333


	  Now let's consider labels.  We want to be able to write either of

		ADD( 2 , IT );
	or
		ADD( 2 , 2 );

	The former ADD references a label.  That latter ADD references not a
	label, but a number (register 2).  To make one function, ADD, be able
	to accept either a label or a number, we must declare a new datatype
	which is ~either a label of a number.  We call that type
	ADDRESS_EXPRESSION:

	    TYPE   ADDRESS_EXPRESSION =   EITHER
						ABSOLUTE = INT
						LABEL    = LABEL
					  ENDOR ;

	This declares that an ADDRESS_EXPRESSION is either one of INT and
	LABEL.  Given an ADDRESS_EXPRESSION, it is always known to be in
	either the ABSOLUTE state or the LABEL state.  If it is in the ABSOLUTE
	state, it is an INT.  If it is in the LABEL state, it is a LABEL.
	(Within the EITHER...ENDOR, the states are named to the left of the
	equal signs, and the representation for each state appears on the right
	side of the equal sign).  Section 23.5 shows this kind of datatype,
	called a ~variant.

	We could define ADD now, and begin by writing:

		DEFINE  ADD( REGISTER: INT  MEMORY: ADDRESS_EXPRESSION ):
		...

	This definition will admit in its second parameter either an INT or a
	LABEL.  However, since we're going to want to define similar functions
	  for SUB, LOAD, etc., let's put all the hard work into one function, so
	that each of the functions ADD, SUB, etc. can pass the buck to it.

		BOX:	What does the type ADDRESS_EXPRESSION represent?
		BOX:


7.4.1
The Workhorse Function For Assembling Instructions

	The following function acts just like PUT_LABEL, except that it takes
	in an ADDRESS_EXPRESSION, i.e., ~either a LABEL or an INT:

	    DEFINE   PUT_ADDRESS( X: ADDRESS_EXPRESSION  PLUS: INT ):

		CASE  X  OF

		    ABSOLUTE:	PUT( X + PLUS );	"(X is an INT)"

		    LABEL:	PUT_LABEL( X, PLUS );	"(X is a LABEL)"

		ENDCASE

	    ENDDEFN

	We employ the CASE statement to examine X, the given
	ADDRESS_EXPRESSION.  We use PUT if X is in the ABSOLUTE state (an
	INT), and uses PUT_LABEL if X is in the LABEL state.  (Section 23.5.2
	introduces this CASE construct).

	That is, if X is in the ABSOLUTE state, e.g., the number 2, then we
	put X+PLUS.  As we have seen, PLUS might represent the
	almost completed instruction "ADD 2,0".  Thus, X+PLUS will be the
	desired "ADD 2,2".

	If X is in the LABEL state, e.g., the label IT, then we again try
	to put the value X+PLUS (IT+PLUS).  Unfortunately, X+PLUS
	is not defined because the operator "+" knows nothing about LABELs,
	although "+" is certainly well defined for INTs.  We achieve the
	desired X+PLUS by writing:

		PUT_LABEL( X , PLUS );

	Recall that PUT_LABEL puts into the current bin the label's
	  ultimate value summed with PLUS.	That is, we have put X+PLUS as
	  though X were the INT value ultimately assigned to the label X.

	  Again, to ease the definitions of ADD, LOAD, etc., let's define one
	more intermediate function:

	    DEFINE   PUT_INSTRUCTION( OP_CODE:INT  REGISTER:INT
						   MEMORY:ADDRESS_EXPRESSION ):

		PUT_ADDRESS( MEMORY ,  OP_CODE*10000 + REGISTER*1000 );

	    ENDDEFN

	This convenient function takes in the essence of an instruction:
	an op-code (e.g., 102 for LOAD), the instruction's register operand,
	  and its memory operand.

	  The second parameter we pass to PUT_ADDRESS, the

		    OP_CODE * 10000  +	REGISTER * 1000

	  is PUT_ADDRESS's PLUS parameter.  It represents the whole instruction,
	except that it leaves zero as the instruction's memory operand.
	  The first parameter to PUT_ADDRESS, MEMORY, which is either a LABEL
	  or an INT, will ultimately act as the instruction's memory operand.

	The definitions for our instructions follow:

	   DEFINE   LOAD( REGISTER: INT  MEMORY: ADDRESS_EXPRESSION ):
			PUT_INSTRUCTION( 102, REGISTER, MEMORY );	ENDDEFN

	   DEFINE   STORE( REGISTER: INT  MEMORY: ADDRESS_EXPRESSION ):
			PUT_INSTRUCTION( 103, REGISTER, MEMORY );	ENDDEFN

	   DEFINE   ADD( REGISTER: INT  MEMORY: ADDRESS_EXPRESSION ):
			PUT_INSTRUCTION( 170, REGISTER, MEMORY );	ENDDEFN

	   DEFINE   SUB( REGISTER: INT  MEMORY: ADDRESS_EXPRESSION ):
			PUT_INSTRUCTION( 171, REGISTER, MEMORY );	ENDDEFN

	   DEFINE   JUMPGT( REGISTER: INT  MEMORY: ADDRESS_EXPRESSION ):
			PUT_INSTRUCTION( 123, REGISTER, MEMORY );	ENDDEFN

	   etc.


7.4.2
Summary

	Besides the functions just shown that assemble instructions, we've
	  defined the following functions which we will continue to use:

		    NEW	->	  ~label		"Create a new label"

		    EQU( ~label ) ;			"Define the label's value to
								 be the present PC"

		    PUT_LABEL( ~label ) ;		"Put this label's value into
								 the next bin"

		    PUT( ~int ) ;				"Put the integer into the next
								 bin"

		    PUT_ADDRESS( ~address_expression , ~int ) ;

						    "Put the address expression (either a
						     label or an integer) ~plus the given
						     ~integer into the next bin."

		    PUT_INSTRUCTION( ~int, ~int, ~address_expression );
					  (op-code)
						   (register)
							   (memory operand)

						    "Put into the next bin this complete
						     instruction"


		    BOX:	Is it easy to introduce new instructions into the
		    BOX:	embedded assembly language?
		    BOX:
		    BOX:	What function is the workhorse for assembling
		    BOX:	instructions?


7.5
Using The Embedded Assembly Language

	  The assembly language program

		    ONE:	1
		    IT:	2
		    COUNT:	10

		    START:	LOAD	1 , COUNT

		    LOOP:		  LOAD  2 , IT
					  ADD	  2 , 2
					  STORE 2 , IT

					  SUB	  1 , ONE
					  JUMPGT 1, LOOP

	  can be turned into the embedded assembly language as follows:

		    BEGIN	VAR  ONE,IT,COUNT,START,LOOP = LABEL;

				     "variables to hold labels"

				" Put ~new labels into these LABEL variables... "

				ONE:= NEW;	    IT:= NEW;	  COUNT:= NEW;
				START:=NEW;	    LOOP:=NEW;

				" Now, the actual program... "

				EQU(ONE);
					  PUT(1);
				EQU(IT);
					  PUT(2);
				EQU(COUNT);
					  PUT(10);

				EQU(START);
					  LOAD(1,COUNT);

				EQU(LOOP);
						    LOAD(2,IT);
						    ADD(2,2);
						    STORE(2,IT);

						    SUB(1,ONE);
						    JUMPGT(1,LOOP);

		    END

	  We've enclosed the program within:

		BEGIN	VAR  ONE,IT,COUNT,START,LOOP = LABEL;

			ONE:= NEW;	IT:= NEW;
			COUNT:= NEW;	START:= NEW;
			LOOP:= NEW;

			the actual program
		END

	That surrounding text declares variables of type LABEL so
	that we can use those names as LABELs.  After declaring the variables,
	we assign each a new label.  (We defined the NEW operator earlier).
	Thus, each variable contains a distinct label, all ready to be
	referenced (in instructions) and defined (in EQU).

	We see here one disadvantage of embedded languages.  Where in the
	original language we used names to represent labels, we can continue
	to do so only if we declare variables by those names, and assign into
	each a NEW label.  Those declarations and initializations are extra
	specification.

	This extra writing however has some great advantages.  We will see
	them when we write higher-level functions that call upon the functions
	we've defined here.

	  The execution of this ICL program generates a machine language program.


7.6
Exercises

	  1)	    The following generates a machine language program.  What
		    will be in register 2 after the execution of the generated
		    machine language program?	 How many bins of memory does the
		    generated program occupy?

			 BEGIN   VAR  START,IT = LABEL ;

				START:= NEW;
				IT:= NEW;

				EQU(IT);
					  PUT(25);

				EQU(START);
					  LOAD(2,IT);
					  ADD(2,2);
					  STORE(2,IT);

			 END

	  2)	    Do the same as exercise 1 for this ICL program:

			 BEGIN   VAR  START,IT = LABEL;  I = INT;

				START:= NEW;
				IT:= NEW;

				EQU(IT);
					  PUT(2);

				EQU(START);

					  LOAD(2,IT);

					  FOR I FROM 1 TO 10;
					    DO   ADD(2,2);    END

					  STORE(2,IT);

		    The "FOR I FROM 1 TO 10;" is legal in ICL (Section 22.3.1).
		    It causes 10 iterations, where I takes on the values 1 thru
		    10.  One of the things that it does is to ~increment I.

		    Does that incrementing occur
		    during the execution of this ICL program, or during the
		    execution of the machine language program produced by the
		    this ICL program?  Also, does the STORE from 
		    register 2 into IT occur during the execution of this ICL
		    program, or during the execution of the machine language
		    program produced by this ICL program?

	  3)	    Consider the following function definition:

				DEFINE  MOVE( TO,FROM: ADDRESS_EXPRESSION ):

					  LOAD(1,FROM);
					  STORE(1,TO);

				ENDDEFN

		    If we specify

				MOVE( ONE , TWO );

		    where ONE and TWO are labels, which of these two bins will
		    receive new data?  How many instructions are generated by:

				MOVE( ONE , TWO );
				MOVE( 333 , ONE );

		    What's in bin 333 after the execution of the generated machine
		language program?  What's in bin 333 following the execution
		    of the machine language program generated by:

				LOAD(1,ONE);
				MOVE(ONE,TWO);
				STORE(1,333);

	  4)	    What will be in register 1 after the execution of the program
		    generated by:

			 BEGIN   VAR  START,IT = LABEL;

				START:= NEW;
				IT:= NEW;

				EQU(IT);
					  PUT(25);

				EQU(START);
					  LOAD(1,IT);

				IT:= NEW;

					  LOAD(1,IT);

				EQU(IT);
					  PUT(24);

		    What's odd here is that the same variable, IT, actually holds
		different labels during the execution of this ICL program.

	5)	Let's declare a global variable of type LABEL and assign into
		    it a new label:

				VAR  IT = LABEL;

				IT:= NEW;

		    Let's go on to define the new function:

			DEFINE	ABSOLUTE_VALUE( REGISTER: INT ):

				   JUMPGE( REGISTER, IT );

					LOAD_NEGATIVE( REGISTER, REGISTER );

				   EQU(IT);

			ENDDEFN

		This function generates a machine language program which
		replaces a given register's contents with the mathematical
		    ~absolute ~value of its contents.  (This assumes the existence
		    of a LOAD_NEGATIVE instruction.	 The second operand of any
		    instruction can be a register, as it is here, because in our
		    model, registers are a subset of memory).

		    The following puts the absolute value of register 3's contents
		back into register 3:

			ABSOLUTE_VALUE( 3 );

		The following does the same for register 3 and register 4:

			ABSOLUTE_VALUE( 3 );
			ABSOLUTE_VALUE( 4 );

		This latest program fragment will cause the EQU function to
		cry HELP.  (The former doesn't).  Why?  (Write down the
		    assembly language program generated by this latest program
		    fragment).

		    Redefine ABSOLUTE_VALUE so as to avoid this HELP.	 Do this by
		    inserting a statement before the JUMPGE (an assignment into
		    the variable IT).

		    Try another definition which is safer, by using a local
		    variable within ABSOLUTE_VALUE.	 (Use the BEGIN...END to
		    declare and use the new local variable.  See Section 22.1.6).

	  6)	    How many elements are in the list IT.FORWARD_REFERENCES by the
		    time the following ICL program finished execution?

				IT:= NEW;

					  LOAD(2,IT);
					  ADD(2,2);
					  STORE(2,IT);

				EQU(IT);
					  PUT(2);

		    Answer the same for:

				IT:= NEW;

					  LOAD(2,IT);

					  FOR I FROM 1 TO 10;
					    DO    ADD(2,IT);	  END

				EQU(IT);

					  FOR I FROM 1 TO 9;
					    DO    ADD(2,IT);	  END

7.7
Macros and Conditional Assembly

	  We have seen an assembly language and an embedded assembly language.
	  The embedded assembly language has so far looked clumsier to use than
	  regular assembly language.	The embedded assembly language required
	  extra work, specifying each label name in at least two extra places,
	  in a declaration and in an initializing assignment statement.

	  However, the embedded assembly language is vastly more powerful, in
	  that we can write programs that write machine language programs for
	  us.

	  In practice, assembly languages often come with another language,
	  called the ~macro or ~conditional ~assembly language.  In this higher
	  level language, the user specifies algorithms that will generate
	  machine language instructions.  However, the macro language might not
	  be a full programming language, and hence may restrict the kinds of
	  algorithms you can specify.

	  In embedded assembly language, the programming language itself acts
	  as the macro and conditional assembly language.  For example, let's
	define a function that will let us specify at once a data movement
	between two bins in memory, e.g., so that:

		MOVE( 333 , 444 );

	moves data from bin 444 into bin 333.  This cannot be a single
	instruction, because the machine allows only movement between memory
	and registers.  We define this function, or ~macro, as follows:

		DEFINE	MOVE( TO,FROM: ADDRESS_EXPRESSION ):

			LOAD(1,FROM);
			STORE(1,TO);

		ENDDEFN

	Now we can specify data movements concisely, like:

		MOVE(111,222);
		MOVE(A,B);

	as though they were built-in instructions.  The capability to make
	this definition for MOVE occurs naturally in the programming language.

	To illustrate ~conditional ~assembly, let's reconsider this MOVE
	  function.	 For simplicity, let's require MOVE's two parameters to be
	  only INTegers, and not the more general ADDRESS_EXPRESSION (which
	  could be a label).  We define MOVE again:

		    DEFINE	MOVE( TO,FROM: INT ):

				LOAD(1,FROM);
				STORE(1,TO);

		    ENDDEFN

	  It may be that FROM is a register (an address less than 16).  If
	  such is the case, we can have MOVE generate a more efficient program
	  by making a new definition for MOVE:

		    DEFINE	MOVE( TO,FROM: INT ):

				IF  FROM < 16  THEN	STORE(FROM,TO);

				ELSE	  LOAD(1,FROM);
					  STORE(1,TO);			    FI

		    ENDDEFN

	  Notice that when FROM is a register (an address less than 16), that
	  MOVE issues only one instruction.	 That one STORE is legal only in
	  this case, when FROM is a register.  If FROM isn't a register, MOVE
	resorts to our old method.

	This use of IF statements is called conditional assembly.  Different
	instructions may be issued depending on some condition.

	Following are more examples of macros.  We will eventually introduce
	the operators:

		BEGIN_ASSEMBLY
	and
		END_ASSEMBLY

	which must be executed before and after generating machine language.


		BOX:	Can you write a function that generates more than one
		BOX:	instruction?
		BOX:
		BOX:	Can a function generate code that may depend on
		BOX:	whether or not an operand is in a register?
		BOX:
		BOX:	What is a macro?
		BOX:	What is conditional assembly?


7.7.1
Example 1 - QUADRATIC

	Suppose we want to evaluate expression like

		A*X*X + B*X + C

	To get this computed value into register 1, we might specify:

		QUADRATIC(A,B,C,X);

	We make this notation valid by defining QUADRATIC:

		DEFINE	QUADRATIC( A,B,C,X: ADDRESS_EXPRESSION ):

			LOAD(1,X);	"X"
			MUL(1,1);	"X*X"
			MUL(1,A);	"A*X*X"

			LOAD(2,X);	"X"
			MUL(2,B);	"B*X"

			ADD(1,2);	" A*X*X + B*X "
			ADD(1,C);	" the answer: A*X*X + B*X + C "

		ENDDEFN

	Now we can write

		QUADRATIC(U,V,W,X);

	to get U*X*X+V*X+W into register 1.  Each of U,V,W, and X must be an
	ADDRESS_EXPRESSION, i.e., either a label or an INTeger.

	If the coefficients are in registers 2, 3, and 4, then

		QUADRATIC(2,3,4,X);

	puts into register 1 the evaluation of the quadratic.


7.7.2
Example 2 - MAX

	The following definition will allow us to believe that there is an
	instruction MAX, like ADD, which computes the maximum of its operands.
	That is,

		MAX( 2 , IT );

	will put into register 2 the maximum of register 2's present contents
	  and IT's contents.  Here is the definition:

		DEFINE	MAX( REG:INT  B:ADDRESS_EXPRESSION ):

		   BEGIN   VAR  L = LABEL;

			L:= NEW;	"Create the new label L"

			LOAD(1,B);		"B"
			SUB(1,REG);		"B-REG"
			JUMPLT(1,L);		" Jump if B < REG, (i.e., if
						  REG is aready the largest) "
				LOAD(REG,B);

			EQU(L);

		   END
		ENDDEFN

	This function (macro) creates its own label and puts it into L.
	It then generates machine language which loads B into register 1,
	subtracts REG from it, and finally jumps over the instruction

		LOAD(REG,B);

	if REG already holds the largest number.  As a result, REG will
	wind up holding the largest of the two numbers.

	Our label in L is referenced from the JUMPLT instruction, and is
	defined by the EQU finally.  The label is not the name L, rather, L is
	the variable that holds our label.  We referenced L in the JUMPLT and
	EQU function calls.  Upon exit from this MAX function, the variable L
	disappears, and so our label in L can never be referenced again.

	The overhead of embedded assembly language, the declaration of and
	assignment into the variable L, now begins to pay off.  We've written
	  that overhead only once, but it gets used many times, every time that
	  someone calls MAX.  It will pay off lots more when we write our
	  compiler in Chapter 8.

	  You might think that MAX generates the equivalent of the assembly
	  language program:

		    LOAD	1,B
		    SUB	1,REG
		    JUMPLT	1,L
				LOAD	  REG,B
		 L:

	  If that were literally so, then a second call to MAX should continue
	  and generate:

		    LOAD	1,B
		    SUB	1,REG
		    JUMPLT	1,L
				LOAD	  REG,B
		 L:

	  Taken together, this assembly language program is illegal because the
	  label named L is being defined twice ("L:").

	  In embedded assembly language, L is not a single label.  L is a
	  variable that holds different labels, a different label each time that
	  MAX executes.  Thus the machine language generated by MAX uses two
	  distinct labels upon two occurences of MAX.


		    BOX:	Will "MAX(1,IT)" perform as expected?
		    BOX:
		    BOX:	What limitations must you impose on users of MAX?
		    BOX:
		    BOX:	How many instructions does MAX generate?


7.7.3
Example 3 - Delayed Assembly

	  This and future examples use the type BASIC_PROCESS introduced in
	  Chapter 2.  Recall that by enclosing a program between //...\\, that
	  program's future execution can be passed around as data.  That is, the
	following represents a datum:

		//	LOAD(2,IT);
			ADD(2,2);
			STORE(2,IT);	\\

	This datum can be passed around, e.g., passed into functions.  If a
	function receives this datum in a variable named S, then

		<* S *>;

	causes that program to execute now.  That is, the <*...*> causes
	this short, 3-line program to execute.

	Suppose we want to write embedded assembly language for the following
	high-level program fragment:

		B:= A + 1 ;

	So far, we would write this as:

		LOAD(1,A);
		ADD(1,ONE);
		STORE(1,B);

		EQU(ONE);
			PUT(1);

	That is, to use the constant 1, we've had to allocate a bin called ONE
	  whose contents is the constant 1.	 As this assembly language program
	  stands now, it will actually cause the machine to crash.	We've
	assembled the four bins:

		LOAD	1,A
		ADD	1,ONE
		STORE	1,B
	  ONE:	1

	Following the STORE instruction, the computer will attempt to execute
	the next bin, which contains 1.  This is not a number that represents
	an instruction.

	What we've really wanted to say is something like:

		    LOAD(1,A);
		    ADD(1,ONE);
		    STORE(1,B);

		    and some place much later, please do
								EQU(ONE);
								PUT(1);

	  After the LOAD, ADD, and STORE, we want the next instruction, whatever
	  it may be, to follow the STORE, and not our ONE bin containing 1.

	  We might write the following to get that effect:

		    LOAD(1,A);
		    ADD(1,ONE);
		    STORE(1,B);

		    LATER(	//   EQU(ONE);
				     PUT(1);    \\  );

	  We still specify the EQU(ONE) and PUT(1), but we specify that this
	  be done later, not immediately after the STORE.  Presumably, the
	  EQU and PUT will be executed only after the entire program has
	  finished generating all the machine language code.	We've passed
	into LATER a BASIC_PROCESS, an action which should be invoked later.

	To implement LATER, let's declare a global variable to hold all the
	  BASIC_PROCESSes that are ever passed into LATER:

	     TYPE  ACTIONS =  { BASIC_PROCESS } ; " A set of BASIC_PROCESSs. "

	     VAR  LATER_ACTIONS = ACTIONS ;

	  We've declared a new type, ACTIONS, to represent the plural of
	BASIC_PROCESS.  The variable LATER_ACTIONS is of that plural type
	ACTIONS.

	We define LATER:

		DEFINE	LATER( ACTION: BASIC_PROCESS ):

			LATER_ACTIONS:=   ACTION  <$  LATER_ACTIONS ;

		ENDDEFN

	LATER generates no machine language code, as we expect.  It augments
	the set LATER_ACTIONS, putting the given ACTION onto the front of the
	list (the "<$", Section 22.1.3 or 23.3).

	So far, we've queued up actions to be executed later.	 When is later?
	  Let's introduce two new operators:

		BEGIN_ASSEMBLY;
	and
		END_ASSEMBLY;

	BEGIN_ASSEMBLY must be called prior to generating any machine language
	code.  Similarly, END_ASSEMBLY must be called when we're finished
	  generating machine language code.	 Let's assume that these rules are
	always obeyed.

	We define these two operators:

		DEFINE	BEGIN_ASSEMBLY:

				LATER_ACTIONS:= NIL;

				PC:= an initial address, where the whole
				     program will start ;

		ENDDEFN


		DEFINE	END_ASSEMBLY:

		   BEGIN  VAR  ACTION=BASIC_PROCESS;

			FOR ACTION $E LATER_ACTIONS;

			  DO   <*ACTION*>;   END
		   END

		ENDDEFN

	BEGIN_ASSEMBLY initializes our global variable LATER_ACTIONS to be
	empty (NIL).  END_ASSEMBLY invokes each of the BASIC_PROCESSes (ACTION)
	in the set LATER_ACTIONS.  Thus, "later" is now defined to be when
	END_ASSEMBLY is called.  (Section 7.3.3 shows in more detail this kind
	of loop, which examines each element in a list.  Also, see Section
	22.3.1).

		BOX:	What do BEGIN_ASSEMBLY and END_ASSEMBLY do?
		BOX:
		BOX:	Why are LATER actions always performed later?


	For example, the following:

		BEGIN_ASSEMBLY;

		LOAD(1,A);
		ADD(1,ONE);
		STORE(1,B);

		LATER(    //	EQU(ONE);
				PUT(1);		\\    );

		MUL(1,TWO);
		STORE(1,C);

		LATER(    //	EQU(TWO);
				PUT(2);		\\    );

		END_ASSEMBLY;

	generates the following equivalent assembly language:

		LOAD	1,A
		ADD	1,ONE
		STORE	1,B

		MUL	1,TWO
		STORE	1,C

	   TWO: 2
	   ONE: 1

	(We haven't shown how to end a program's execution.  We would like
	 to end program execution after the last instruction, the STORE into
	 C).

	Strictly speaking, these examples are correct only if the variables ONE
	and TWO are globals.  If ONE were a local variable, as in:

	     BEGIN  VAR ONE=LABEL;

		ONE:= NEW;

		LOAD(1,A);
		ADD(1,ONE);
		STORE(1,B);

		LATER( ... );

	     END

	then our parameter to LATER, which used to be

		//	EQU(ONE);
			PUT(1);		\\

	would have to be specified as:

		//[ONE;]	EQU(ONE);
				PUT(1);		\\

	As discussed in Sections 2.1 and 23.6, we must put all
	local variables, like ONE, to which we want to retain access into the
	square-brackets following the //.  Recall that the "..." in the
	"//...\\" executes in the future.  Local variables exist only in the
	present.  The "[ONE;]" moves ONE's value from the present into the
	  future.

	  (For example, our local variable ONE ceases to exist upon encountering
	  the final END.	Meanwhile, the action we passed to LATER has not been
	  invoked yet.  The action will be invoked only later, when we execute
	  END_ASSEMBLY.  Thus, the "[ONE;]" traps ONE's present value, and
	makes that value available to the future, when the

		EQU(ONE);

	ultimately gets executed).

	We will always use the square-bracket notation from now on.  As we
	shall see, all variables like ONE are typically local variables.
	Our examples so far have been sufficiently incomplete so as to not
	know whether the variables are local or global, and so we've taken
	  the liberty until now to omit the square-bracket part of the //...\\
	  notation.


7.7.4
Example 4 - The ADDRESS_OF Operators

	  Where we specified:

		    LOAD(2,A);
		    ADD(2,ONE);
		    STORE(2,B);

		    LATER(	//[ONE;]   EQU(ONE);
					     PUT(1);	\\  );

	  we might prefer to say the simpler:

		    LOAD(2,A);
		    ADD(2, ADDRESS_OF(1) );
		    STORE(2,B);

	  This latter specification is briefer and clearer than the former.  When
	  we say

		    ADDRESS_OF( a number )

	  we mean the address of a bin containing the given number (1), wherever
	  that bin may be in memory.

	  We define ADDRESS_OF:

		    DEFINE	ADDRESS_OF( NUMBER:INT ) = LABEL:

			 BEGIN  VAR L=LABEL;

				DO	  L:= NEW;

					  LATER(  //[L;NUMBER;]
									  EQU(L);
									  PUT(NUMBER);  \\  );

				GIVE	  L

			 END

		    ENDDEFN

	  This function creates a new label and puts it into L.  It then calls
	  LATER so as to ultimately allocate a bin in memory containing the given
	  NUMBER.  It makes the label in L represent that bin.  This function
	  delivers that label as its result.

	  (The "DO...GIVE..." construct appears in Section 22.1.6.	It
	  represents the value following the ~GIVE.  The statements before it,
	  between the DO and GIVE, are executed prior to the GIVE clause).

	  For example, the specification:

		    ADD( 2 , ADDRESS_OF(1) );

	  translates into the assembly language equivalent:

		    ADD  2,L
		    ...
		L:  1

	  The label yielded by ADDRESS_OF is consumed by ADD's second parameter.
	Of course, L is not the name of a label, but rather a variable that
	holds a label, a new label each time that ADDRESS_OF gets called.

		BOX:	What does ADDRESS_OF(N) represent?
		BOX:
		BOX:	Why does ADDRESS_OF use LATER?


	Let's consider another ADDRESS_OF function.  This new operator takes in
	  not a number, but a BASIC_PROCESS whose invocation generates one or
	  more bins of machine language.  The ADDRESS_OF function yields the
	  address of where that machine language code will ~start.	For example,

		    ADDRESS_OF( //  ADD(1,0);	 \\ )

	  is the label for the bin in memory where the ADD(1,0); ultimately
	  appears.	Similarly,

		    ADDRESS_OF(  // ADD(1,0);
					  MUL(1,0); \\   )

	  is the address (label) of the first bin of the two bins which hold
	  the ADD and MUL.

	  We define the new ADDRESS_OF as follows:

		    DEFINE	ADDRESS_OF( CODE:BASIC_PROCESS ) = LABEL:

			 BEGIN  VAR	 L=LABEL;

				DO	  L:= NEW;

					  LATER(  //[L;CODE;]
									  EQU(L);
									  <*CODE*>;		\\  );

				GIVE	  L

			 END

		    ENDDEFN

	  The BASIC_PROCESS passed to LATER, which will be invoked later,
	  assembles the intended code (<*CODE*>), but first defines L's value to
	be the starting address of that code.  That label is delivered
	immediately as the result of ADDRESS_OF.

	We use this second ADDRESS_OF function in the following example, which
	implements the high-level program fragment:

		IF  A < B  THEN  A:= B;
			   ELSE  B:= A;  FI

	Here is one embedded assembly language for this:

		LOAD(1,A);
		SUB(1,B);
		JUMPLT( 1 ,  ADDRESS_OF(  "the then-clause..."

					  //	MOVE(A,B);
						JUMP(OUT);	\\  ) );

			"else-clause..."

			MOVE(B,A);

		EQU(OUT);

	(This example assumes that the label variables A, B, and OUT are global
	variables that have already been initialized to NEW).


7.7.5
Example 5 - Simulated Indexing

	Consider the following function (macro):

		DEFINE	LOAD_INDIRECT( ADDRESS_REG: INT ):

		   BEGIN  VAR  IT=LABEL;

			IT:= NEW;

				LOAD(1, ADDRESS_REG);
				ADD(1,  ADDRESS_OF( //  LOAD(1,0);  \\ )  );
				STORE(1,IT);

			EQU(IT);
				PUT(0);

		   END
		ENDDEFN

	The machine language program generated by this function loads into
	register 1 the contents of the bin whose address is in ADDRESS_REG.

	This is unusual in that so far, all addresses have been bolted into
	the instructions.  No address was ~computed, e.g., taken from another
	register.

	The machine language program first LOADs the ADDRESS_REG into register
	1.  It then ADDs in the machine language number for the instruction:

		LOAD	1,0

	Thus, if ADDRESS_REG contained the value 333, then after the ADD
	instruction, register 1 holds the machine language instruction:

		LOAD	1,333

	The machine language program finally executes this computed instruction
	by STORing it immediately after the STORE instruction, into location
	IT.  Thus, this computed instruction, which follows the STORE
	instruction itself, executes immediately after the STORE.  It serves
	to load register 1 with the contents of the bin whose address is in
	ADDRESS_REG (e.g., 333).

	The ability to use a computed address instead of a bolted-in constant
	address is very valuable, as we will see in Part 3.  This capability,
	called ~indexing, is provided by nearly all computers.  That is, what
	we've done in four instructions in the LOAD_INDIRECT macro, most
	computers can do it as part of a single instruction.