The Target Language Of Compilation - Machine Language

	  Let's pursue compilation, the most effective method for treating
	programs specified in high-level languages.  Compilation refers to
	the translation from a high-level language into the machine's native
	  language, machine language.

	  To find out what machine language is, we've got to see the computer's
	  ~memory and its ~central ~processing ~unit ~(CPU).

	  Some of the details in the following lean towards a particular
	  computer, just to have a consistent discussion.  Registers are
	  presumed, and their integration into the computer is taken from DEC's
	PDP-10 series computers, perhaps the most fun machine to program.

	This particular model lies about halfway between RISC and CISC
	machines.  RISC machines (Reduced Instruction Set Computers) such as
	MIPS' R3000, Motorola 88000, or the Intel i860, require that all
	  computations be done in registers.  In contrast CISC machines (Complex
	  Instruction Set Computers) like DEC's VAX computers can perform
	computations even in the absence of registers.  Our model requires
	only that ~one of the two operands of a computation be in a register.

	Variations from these assumptions have only a small impact on
	programming in machine language.

Random Access Memory

	The computer's main memory is a set of storage bins, where each bin
	  has a unique address.	 All information resides in those bins.


          Figure 6.1 shows a picture of memory, a big rectangular region divided up into
	  separate bins.	Each bin has a unique address which appears to the left
	  of the bin.  That figure shows that the bin for address 2 contains a 5.
	  Bins addressed 4 and 5 contain the numbers 25 and 36 respectively.

	  This kind of memory is called ~random ~access ~memory, or RAM.
	  You can visit the bins in any order, e.g., you can visit bin 0 and
	  then bin 4 without having to visit the bins between them, bins 1,2, and
	  3.	You can visit any bin at any time.	The order in which you visit
	  bins can be ~random.


	  Main memory connects to the central processing unit (CPU) as shown
	  in figure 6.2.

          The CPU drives the memory.  The CPU specifies the
	  address of a bin to visit, in binary, on the set of ~address wires.
	  (Each wire holds a low voltage or a high voltage, representing binary
	  0 or 1).	The read/write wire carries the CPU's wish to the memory,
	expressing whether to ~read existing data from the bin or to ~write
	new data into the bin.  (Which bin is specified on the address wires).

	The data going to or coming from the memory are held on the ~data
	wires.  If the CPU specifies a ~read, then the memory puts the data
	taken from the addressed bin onto the data wires.  The CPU is then
	free to examine those data.  If the CPU specifies a ~write, then the
	CPU puts the new data onto the data wires.  The memory takes those
	data from the data lines and stuffs them into the addressed bin.


	The CPU follows instructions that are stored in main memory.  Each bin
	holds one instruction.  (On some machines, an instruction can reside
	in half a bin, or in more than one bin).

	At any moment in time, the CPU is executing one instruction.  The
	address of that instruction in main memory can be found in the CPU's
	  PC (Program Counter).

	  Normally, the CPU executes instructions that reside in consecutive
	  bins.  That is, the next instruction that the CPU will execute normally
	  resides in the address PC+1, the bin following the bin containing the
	  current instruction.	Prior to the completion of the current 
	  instruction's execution, the CPU performs

		PC := PC + 1 ;

	so that the PC holds the address of the next instruction.

	The CPU repeatedly executes the following sequence of events:

	   1)	Acquire the next instruction by reading from the memory bin
		whose address is in the PC.

	   2)	Perform PC:=PC+1; (so that the PC normally points to the next
		instruction in memory).

	   3)	Execute the acquired instruction.

	The instruction's execution may cause main memory to be
	  read or written.  Notice that the PC correctly contains the address of
	  the next instruction to execute.

	  A particular class of instructions, called ~jumps, exist to modify the
	  PC.	 The jump instruction puts an entirely new address into the PC.
	  This causes the CPU to follow the instructions starting at that new

The Registers

	  A small piece of memory actually resides in the CPU.  That memory is
	  called the ~registers.  The registers act as the first bins in memory,
	  the bins addressed 0 thru say 15.	 (Any particular machine has some
	  small number of registers).

	  The registers live inside the CPU.  The CPU can read the contents
	  of a register at the very same time that it reads the contents of a bin
	  in main memory!	 This delivers at once ~two numbers, one from a
	  register and one from main memory.  (If both numbers came from
	  main memory, the CPU would have to access main memory ~twice, and would
	  hence have to wait longer to accomodate the second access to main

	  Most operators, like addition, consume two numbers at once, e.g., the
	  "+" in "A+B" consumes the numbers in A and B simultaneously.  We make
	  A and B simultaneously accessible by forcing A to appear in a
	  register, while B can be in another register or in main memory.

Machine Instructions

	  Each instruction, like ADD, includes the specification of two
	  addresses.  Those two addresses denote the two bins whose contents ADD
	  will sum up.  (Some machines allow for a third address, which
	  dictates where the result of the ADD should be stored).

	  We will assume that the first of the two addresses
	  specifies a register (an address less than 16).  This gives
	  simultaneous access to the two numbers that ADD will sum up.

	  We write the ADD instruction as follows:

		    ADD  register_address , memory_address

	  An example follows:

		    ADD  1 , 333

	  This instruction sums up the contents of bin 1 and bin 333, and puts
	  the result back into bin 1 (register 1).

	  All instructions like ADD, which consume two numbers at once, always
	  put their results back into the register, the first address (at least
	  in our model).	For example, if

		    bin 1	contains the number	58
		    bin 333 contains the number	2

	  then upon completion of the ADD instruction, register 1's contents are
	changed, replacing the old 58 with the new 60.  The contents of bin
	333 remain unchanged.

	The instruction

		SUB  5 , 334

	puts into register 5 the ~difference between the numbers in register 5
	and bin 334.

	Other arithmetic and logical instructions are

		MUL  register_address , memory_address		(Multiply)
		DIV  register_address , memory_address		(Divide)

		AND  register_address , memory_address		(Logical AND)
		OR   register_address , memory_address		(Logical OR)
		XOR  register_address , memory_address		(Logical

	They all put the answer back into the register_address.
STORE and LOAD - Instructions That Move Data Between Registers And Main Memory

	The instructions just shown require that at least one of their
	operands resides in a register.  The LOAD and STORE instructions are
	introduced to move data between a register and a bin in main memory.
	Neither instruction performs any arithmetic.  The LOAD instruction
	moves data from memory into a register.  STORE does the opposite,
	moving data from a register into memory.  For example:

		LOAD  1 , 333		moves the data in bin 333 to register
					1.  (The old contents of register 1 are

		STORE  1 , 333		moves the data in register 1 into bin
					333.  (The old contents of bin 333 are

	The reader may notice that to add the contents of bins 222 and 333,
	we must use two instructions:

		LOAD  1 , 222
		ADD   1 , 333

	The LOAD instruction is there to satisfy ADD's requirement that its
	  first operand reside in a register (here register 1).

	  It may appear that our requirement that the first operand always
	  reside in a register in fact doesn't buy us anything.  We've had to
	  insert an extra instruction (LOAD) just to satisfy that requirement.
	  Why not break that rule, and write only

		    ADD  222 , 333	    ?

	  This mythical ADD instruction would have to wait for two memory reads,
	  and would hence take as long as the LOAD followed by the real ADD.

	  LOADs and STOREs can be avoided of course if the desired operand is
	  already in a register.  For example, to compute the expression

		    A * B + C * D

	  we can get by using only two LOADs:

		    LOAD  1 , A	  "Put A*B into register 1"
		    MUL   1 , B

		    LOAD  2 , C	  "Put C*D into register 2"
		    MUL   2 , D

		    ADD   1 , 2	  "Add the two registers"

	  The intermediate results, A*B and C*D, are held in registers (1 and
	  2 resp.), and hence don't need any LOADs to access.  (The text
	appearing within the quotes (") are comments.  They are not part of
	the machine language program).

	By holding intermediate results in registers, or even holding a copy of
	memory data in registers, LOADs and STOREs can be removed, resulting
	in a faster executing program.  Both the human machine language
	programmer or a compiler can produce a more optimal machine language
	program by using registers in such a way as to minimize LOADs and
	STOREs.  The task of using registers intelligently is called the
	~register ~allocation problem.

	1)	Suppose the following memory addresses contain the following

			333:	1
			444:	20
			555:	10

		(The address of the bin appears to the left of the colon, and
		the contents of that bin appear to the right of the colon).
		What will appear in register 1 after each of the following
		machine language programs is executed?

		   a)	LOAD  1 , 444

		   b)	LOAD  1 , 444
			ADD   1 , 333

		   c)	LOAD  1 , 444
			MUL   1 , 555
			ADD   1 , 333

		   d)	LOAD  1 , 444
			MUL   1 , 555
			ADD   1 , 333
			STORE 1 , 444

		   e)	LOAD  1 , 444
			ADD   1 , 333
			LOAD  2 , 555
			SUB   2 , 333
			MUL   1 , 2
			STORE 2 , 444

		What will bin 444 contain after each of the machine language

	2)	Write a machine language program which will put into bin 40
		the sum of the two numbers residing in bins 33 and 34.
Memory For Variables

	To translate the assignment statement:

		C := A + B ;

	we have to allocate a bin for each of the variables A, B, and C.  Let's
	  allocate for those three variables the bins at addresses 222, 333, and
	  444 respectively.

	  We want to perform an ADD.	We must first get one of A and B into a
	  register.	 We get A into a register via the following instruction:

		    LOAD  1 , 222		    "(222 is the address for A)"

	  Now we perform the addition:

		    ADD   1 , 333		    "(333 is the address for B)"

	  The answer is now in register 1.	We complete the specified assignment
	  statement by putting the answer into C:

		    STORE 1 , 444		    "(444 is the address for C)"

	  Taken together, the three instructions that implement

		    C := A + B ;


		    LOAD  1 , 222		    "(222 is A)"
		    ADD   1 , 333		    "(333 is B)"
		    STORE 1 , 444		    "(444 is C)"
Memory For Programs

	  Machine instructions reside in memory, too.  Let's put our program for

		C := A + B ;

	at address 1000:

		1000:	LOAD  1 , 222
		1001:	ADD   1 , 333
		1002:	STORE 1 , 444

	Each line specifies an address (to the left of the colon) and what
	we would like to have in that bin (to the right of the colon).
	We can cause the CPU to execute this program by putting the number
	(the address) 1000 into the CPU's PC.

	  We complete the program by specifying the bins for A, B, and C:

		    222:	0
		    333:	0
		    444:	0

	  Each of A, B and C contains a zero initially.	 Presumably, we could
	  call upon other programs prior to executing our program.	Those other
	  programs might put new values into A, B, and C (bins 222, 333, and
Instructions Are Represented By Numbers

	  We have specified the contents of bins 1000, 1001, and 1002 in terms
	  of easy to read machine instructions.  Actually, each instruction is
	  represented by a single (large) number.	 That single large number
	  represents the instruction (e.g. ADD) along with its two operands'
	addresses (e.g., 1 and 222).

	The computer's manufacturer gives you a number for each instruction.
	  Let's assume that

		LOAD	is represented by the number	102
		STORE	is represented by the number	103

		ADD	is represented by the number	170
		SUB	is represented by the number	171


	The number we put into bin 1000, for example, is a composition of
	three numbers:

		102	for 	LOAD
		1	for	(register) 1
		222	for	(memory address) 222

	We translate

		LOAD  1 , 222

	into the number

		102  1  222

	which appears as follows when written all together:


	Another example follows:

		ADD  1 , 2

	translates to

		170 1 002

	The complete machine language specification for the variables A, B, and
	C and the program for

		C := A + B ;


		222:	0	"(A)"
		333:	0	"(B)"
		444:	0	"(C)"

		1000:	102 1 222
		1001:	170 1 333
		1002:	103 1 444
We've Reached The Bottom

	  At this point, we've reached the bottom.  Everything, including
	programs, are represented by numbers, numbers residing in bins of
	main memory.  This is truly ~machine ~language.  Our easier to read
	notations, using words like LOAD, is actually called ~assembly
	~language.  We will come back to assembly language shortly.

	The CPU executes instructions represented as numbers most quickly.
	All the numbers are actually represented in binary, 1's and 0's, or
	what might be called TRUEs and FALSEs.  The microchips that make up
	the CPU operate in binary.

	In fact, while we call this the lowest level, the microchip designers
	call it the highest level.  There are compilers that translate not
	into machine language, but into full microchip designs.  Because
	microchips have been fabricated primarily in silicon, the process
	of translating a specification into microchip designs is called
	~Silicon ~Compilation.

		BOX:	Can you see how programs can be represented as
		BOX:	sequences of (large) numbers?
		BOX:	Can you imagine how a computer executes such sequences?

	1)	Translate the following into machine language (pure numbers):

			222:	1
			333:	1

			500:	LOAD 1 , 222
			501:	ADD  1 , 1
			502:	ADD  1 , 1
			503	STORE 1, 333

		What number resides in bin 333 after execution of this

	2)	Consider the following program:

			222:	MUL  1 , 1
			333:	1

			500:	LOAD  1 , 333
			501:	LOAD  2 , 222
			502:	STORE 2 , 503
			503:	ADD  1 , 1
			504:	STORE 1 , 333

		Translate it into machine language.  What value will appear
		in 333 upon completion of execution?  What number does register
		2 contain when the instruction at 502 commences execution?
		What is the contents of address 503 upon completion of this
		program's execution?

		    This exercise shows the beginning of compilation.	 Part of this
		    program's execution participates in writing a machine language
		program.  This example modifies itself.

	So far, all our machine language programs contain no jumps.  Execution
	starts at some address, and continues executing consecutive
	instructions until the end of memory is reached, at which time the
	machine crashes.

	The JUMP instruction is very simple.  It merely puts a new address
	into the CPU's PC (Program Counter).  Recall that the CPU repeatedly
	  carries out the following:

		    1)	Acquire the next instruction from the memory bin whose
				address is in the PC.

		    2)	PC := PC + 1 ;  (so to point to the next instruction)

		    3)	Execute the acquired instruction.

	  The execution of the JUMP instruction (part 3 above), e.g.,

		    JUMP  500

	  puts the address 500 into the PC.	 Although the PC was modified in
	  step 2, step 3, the execution of this JUMP, modifies the PC once again.
	  (You might notice that for JUMPs, the second step is entirely
	  unnecessary.  However, the second step is always performed, so that
	  all non-JUMPs lead execution to the next instruction).

	  Consider the program:

		    222:	1

		    500:	LOAD	1 , 222
		    501:	ADD	1 , 1
		    502:	STORE 1 , 222
		    503:	JUMP	500

	  This program doubles the contents of bin 222 each time instructions
	  500 thru 502 are executed.	The JUMP, at bin 503, causes execution to
	  resume at address 500.  This program loops forever.

	  A variation of the JUMP instruction, called conditional JUMPs, jump
	  only some of the time.  They look like:

		    JUMP?  register_address , memory_address

	  The "?" is any of

		    EQ	(Jump if the register's contents are EQual to zero)
		NE	(Jump if Not Equal to zero)
		LE	(Jump if register is Less than or Equal to zero)
		LT	(Jump if Less Than zero)
		GE	(Jump if Greater or Equal)
		GT	(Jump if Greater Than zero)

	Each of these represents a condition.  If the condition is met,
	the jump occurs, otherwise no jump occurs.

	Consider the following program:

		111:	1
		222:	2
		333:	10

		500:	LOAD  1 , 333	"Make register 1 hold the count of 10"

		501:	LOAD  2 , 222	"Double bin 222"
		502:	ADD   2 , 2
		503:	STORE 2 , 222

		504:	SUB   1 , 111	"Subtract 1 from the count"

		505:	JUMPGT 1 , 501	"If the count is still positive, then
					 jump back to address 501. "

	The instructions at 501 thru 503 are executed 10 times.  Each time we
	execute thoses instructions, we then decrement the count held in 
	register 1.  (See instruction bin 504).  After the decrement, we go
	back (to 501) and repeat the whole thing again if the count (held in
	register 1) has not yet reached down to zero.

	When the instruction at 501 executes the first time, register 1
	contains a 10.  (Register 1 was just LOADed with the contents of bin
	333, a 10).  Each subsequent time that instruction at 501 is executed,
	register 1 contains a number that is one smaller than the contents of
	register 1 at during the previous execution.  This decrementing occurs
	when when the instruction at 504 is executed.  Thus, upon reaching 501
	the second time, register 1 contains a 9.  On the third visit, register
	1 contains an 8.  Upon the tenth visit to 501, register 1 contains the
	number 1.  When the instruction at 504 gets executed, it will leave a
	zero in register 1.  Then the JUMPGT instruction (at address 505)
	~doesn't jump.  The contents of register 1 are no longer GT (greater
	  than) zero.  This time, execution continues at address 506 (which we
	  haven't specified yet).

	A clearer way to write this program involves indentation for the
	instructions in the loop:

		111:	1
		222:	2
		333:	10

		500:	LOAD  1 , 333

		501:		LOAD  2 , 222
		502:		ADD   2 , 2
		503:		STORE 2 , 222

		504:		SUB   1 , 111
		505:		JUMPGT 1 , 501

		506:	Next instruction

	1)	Translate this latest program into pure machine language.
		The computer's manufacturer has told us that the instruction
		    JUMPGT is represented by the number 123.

		    Suppose register 1 contains a 2 upon executing the JUMPGT
		    instruction (at bin 505).	 What instruction will execute next?
		    What instruction would be executed next if register 1 contained
		    a zero?

	  2)	    In the following program, how many times will the instruction
		    at 503 be executed?

				111:	  1
				222:	  2
				333:	  11

				500:	  LOAD  1 , 333

				501:		    LOAD  2 , 333

				502:				LOAD	3 , 222
				503:				ADD	3 , 3
				504:				STORE 3 , 222

				505:				SUB	2 , 111
				506:				JUMPGT  2 , 502

				507:		    SUB  1 , 111
				508:		    JUMPGT	1 , 501

				509:	  Next instruction

		    This program is a two-dimensional loop.  The inner loop
		    (502-506) uses register 2 to hold the count.  The outer
		    loop (501, 507, and 508) uses register 1 to hold the count.


	  We've had a good look at machine language.  There are two things
	however that remain to be shown.

	One is called ~indexing.  Any instruction's second operand address,
	  the memory address, can actually be computed.	 The address can be
	  obtained by adding in the contents of a register.  Indexing is
	  introduced in Section 9.5.

	  A related feature we've passed up is the subroutine ~call instruction.
	It is a jump instruction which has an additional side effect:  It puts
	the present contents of the PC into a register.  With this information,
	the called subroutine, when it is done, can ultimately jump back to
	the bin following the call instruction.  Chapter 18 introduces the
	call instruction.

Assembly Language

	Assembly language is one step above machine language.  Assembly
	language and machine language
	correspond one-to-one.  Each line of assembly language produces one
	line of machine language.  (In the presence of ~macros,
	this mapping can be one-to-many.  More about macros shortly).

	Our previous programming examples were written somewhere between
	machine and assembly language.  They were like assembly language in
	that we used easy to read names for instructions.  They were like
	machine language in that we represented all address by numbers, instead
	of more convenient, easier to understand names.

	For example, the last program shown in the Section is
	repeated here:

		111:	1
		222:	2
		333:	10

		500:	LOAD  1 , 333

		501:		LOAD  2 , 222
		502:		ADD   2 , 2
		503:		STORE 2 , 222

		504:		SUB   1 , 111
		505:		JUMPGT 1 , 501

	The following assembly language program does the same thing as
	the machine language program:

		ONE:	1
		IT:	2
		COUNT:	10


		LOOP:		LOAD  2 , IT
				ADD   2 , 2
				STORE 2 , IT

				SUB  1 , ONE
				JUMPGT  1 , LOOP

		END:	Next instruction

	Meaningful names have been used instead of actual addresses.  This
	assembly language program is easier to read.  Those names for addresses
	are called ~labels.

	Assembly language is translated into machine language by a program
	called the ~assembler.  This is indeed a form of compilation (because
	of the translation).  Some people don't think of an assembler as a
	  compiler because the translation is so easy:	It is a one-to-one
	  translation.  Each instruction specified in assembly language turns
	  into one machine language instruction.

Assignment Of Addresses To Labels

	  The assembler actually assigns addresses to our labels, e.g., it
	  assigns addresses to ONE, IT, COUNT, START, LOOP, and END.  This
	  is necessary because machine language understands only addresses,
	  not labels.

	  Each line in the assembly language program goes into a bin in memory.
	  Consecutive lines in the assembly language program go into
	  consecutive bins in memory.	 Each label is assigned the address of
	  the bin into which goes the line containing the "label:".

	  Thus, if the assembler chooses 200 as the address for ONE, the
	  following assignments are made to our labels:

		    ONE	is 200
		    IT	is 201
		    COUNT	is 202
		    START	is 203
		    LOOP	is 204
		    END	is 209

	  Our example program is easy to translate.  Each label is defined before
	  it is referenced.  That is, the line containing "label:" appears
	  before any other line in which the label appears.

	  For example, the label definition "COUNT:" appears in the third line,
	  before the fourth line where COUNT is referenced within the LOAD

	  In general, it is actually possible to reference a label ~before the
	  label is defined.  The following program is equivalent to our first


		    LOOP:		  LOAD  2 , IT
					  ADD	  2 , 2
					  STORE 2 , IT

					  SUB	  1 , ONE
					  JUMPGT 1, LOOP

		    ONE:	1
		    IT:	2
		    COUNT:	10

	  Here, the label COUNT is referenced in the first line, but its
	  definition appears on the last line.  This situation is allowed, and
	  is called a ~forward ~reference.

		    BOX:	What happens when the JUMPGT doesn't jump?  (Example
		BOX:	#3 in Section 7.7 addresses this situation).

Assembly Language's Biggest Win

	  Programs written in assembly language are far easier to modify than
	  programs written in machine language.  Ease of program modification is
	  extremely important, since you will modify the program during
	  debugging, and perhaps later, to change slightly the program's action.

	Suppose we need to introduce a new instruction between the labels START
	and LOOP, e.g., to come up with:

			New Instruction
		LOOP:	LOAD  2 , IT

	This action might be the result of debugging.  Notice how many things
	change because of this modification.  The address assigned to the
	label LOOP must now be one greater than it use to be.  Any appearence
	of LOOP anywhere must now translate into this new address.  The JUMPGT
	instruction references LOOP, and thus is part of what must change.

	The assembly language programmer is freed entirely from all these
	details.  He or she just inserts the new instruction textually as
	shown above.

	In contrast, the machine language programmer has to consider every
	instruction, like the JUMPGT instruction, so as to change all
	occurences of 204 to 205.  (The label LOOP used to represent the
	address 204 but now it represents 205).

	In general, a label can be referenced
	arbitrarily many times, by appearing in that many memory bins.  The
	machine language programmer must examine all of his or her instructions
	in order to replace all occurences of the 204 now by 205.  This large
	examination must be done each time the machine language programmer 
	inserts or deletes any instruction!  That makes machine language
	program modification extremely difficult and error-prone.

	Assembly language's great services allow:

	     1)   The use of names for instructions, e.g., LOAD and ADD (called

	     2)   The use of names for memory addresses, e.g., LOOP and START.
		    These names are called labels.

	     3)   The automatic assignment of addresses to the labels.

	  The third service must be provided during translation into machine
	  language.	 Machine language understands only addresses, not names.
	  (That's why machine language executes so quickly)!

	Assembly language often comes with what are called ~macros and
	~conditional ~assembly.  We will see these in the next chapter.

		BOX:	What are the advantages of assembly language?
		BOX:	Is it easier to program with names or with numbers?
		BOX:	How does assembly language ease program modification?


	1)	Translate the following into machine language.  Assume that
		the assembler assigns address 200 to the label ONE:


		(The contents of location ONE: is not the number 2, but the
		label whose name happens to be TWO).

		Translate it again, but with the following line inserted
		between the lines beginning with "ONE:" and "TWO:"


	2)	There is something fatally wrong with the following.
		The error prevents the completion of translation into machine