CHAPTER 5


		    INTRODUCTION TO COMPILATION



	  Whenever the computer executes a program, the computer is following
	  instructions represented in the computer's native language, called
	machine language.  Each machine language instruction performs only
	a tiny piece of the program, but it performs it fast, typically in
	under a microsecond.  Even if you are running a program that was
	specified in a high-level language, a language different from machine
	language, the computer still executes it by following machine language
	instructions.

	Examples of high-level languages include BASIC, FORTRAN, C, PASCAL
	LISP, ICL, and many more.  High-level languages boost programming
	productivity enormously.  As we shall see shortly, machine language is
	fraught with details.  A vast majority of the details have nothing
	whatsoever to do with the application, the application that required
	programming in the first place.  Mixing the application's domain
	  with the domain of machine language would introduce a multiplicative
	  growth in details, all of which would, of course, invite bugs.

	  Programs written in high-level languages must still ultimately execute
	  as machine language instructions.	 Some form of language translation is
	  necessary.  There are two primary ways by which high-level programs
	  wind up executing as machine language instructions.	 They are called
	  ~interpretation and ~compilation.	 An ~interpreter translates as it
	  goes.  A compiler translates the entire program prior to commencing
	  its execution.


5.1
Interpretation

	  The method called ~interpretation is perhaps the most intuitive.
	  Here, a single machine language program, written once by hand, examines
	  your high-level program character by character, and acts out the
	  intended meaning as it goes.  That same machine language program also
	  acts out the intended meaning of any other program specified in the
	  same high-level language.

	  For example, when given the following
	  fragment of high-level program specification:

		    A+B

	  the interpreter first sees the name A.	It looks up A in a
	  table that contains all names and their present values.  It grabs A's
	present value and then continues to read the program.  It sees the "+",
	at which point it stores in its memory the fact that it is in the
	middle of an addition.  The interpreter then sees the B, and so looks
	up B's value within the same table.	 Having in hand A's value and B's
	  value, and the fact that we are in the middle of an addition, the
	  interpreter adds those two values together, and thus completes the
	  interpretation for A+B.

	  This is probably what you would do if you executed this
	  program in your head.	 That's the main advantage of interpretation.
	You can follow its actions easily.

	Intepreters have the disadvantage of executing slowly.  The
	interpretation of A+B required two searches thru the table that holds
	all variable names and their values.  If there are many different
	variable names, the search will consist of many comparisons, each one
	failing until the name A is found.  This may require the execution of
	hundreds of machine language instructions.  After searching
	for A's and B's values, the actual addition requires only one or two
	machine language instructions.

	Clearly, if you wrote the A+B in machine language yourself, its
	execution could require only one or two instructions.  With
	interpretation, there may be a hundred fold increase in execution time.


5.2
Compilation

	Compilation is the other method by which a high-level program winds up
	executing as a machine language program.  Compilation
	actually takes the time to translate the high-level program into a
	new, custom machine language program.  The entirety of this translation
	occurs prior to beginning execution of the program.

	With compilation, unlike interpretation, each new high-level program
	translates into a distinct, new custom machine language program.

	  

	Figure 5.1 illustrates the decomposition of overall execution into
	compilation's two distinct phases: translation followed by execution.
	  The compiler's translation phase is also refered to as the
	~compilation phase.

	Can compilation ever perform faster than interpretation?  Both have
	to translate texts taken from the same file.  Both have to look up
	each of A and B upon seeing A+B.  Could compilation be inherently
	slower than interpretation, because of the tough looking job of
	producing automatically a new correct machine language program?

	Compilation performs much better than interpretation when you consider
	the existence of ~loops in programs.  Consider the fact that A+B
	probably appears within a loop, like the vast majority of all program
	fragments.  Looping, the repeated execution of a program fragment,
	gives the computer its real power.  Here is a loop that contains A+B,
	in its effort to compute Fibonacci numbers:

		REPEAT  100;

			DO	C:= A+B ;	"Next Fibonacci number"

				A:= B;
				B:= C;
			END

	This loop has 100 iterations.  With interpretation, where translation
	occurs as the program executes, the A+B will be seen 100 times, and
	each time A and B must be looked up.  That's two hundred look ups.
	  With compilation, which translates the whole program once, the look
	  ups for A and B occur only once for the A+B.	The fruits of that one
	  time translation are enjoyed for free in all 100 iterations.

	  You can see the benefits of compilation another way.  If there were
	  only one iteration, compilation and interpretation would take nearly
	  the same time.	For the other 99 iterations, compilation sails for
	  free, while interpretation redoes its translation effort 99 more times.

	  Compilation translates A+B once for each time that A+B ~appears in
	  the program specification.	Interpretation translates A+B each time
	  that A+B is ~executed.  Because of loops, the number of ~executions
	  exceeds the number of ~appearences in the program specification.
	  Compilation thus comes out the most advantageous method.

	  A compiled program often runs 100 times faster than the same program
	  run thru an interpreter.  With compilation, you can run the program
	  a second time without having to recompile it, if you save the machine
	  language program generated by the compiler.

	  A slight disadvantage of compilation is that the space used to
	  represent the machine language program is often greater than the space
	  taken by the program specification, which is all that an interpreter
	  has to hold.

	  One can make a system that lies halfway between interpretation and
	  compilation.  The given program can be translated into a representation
	  that is ~almost machine language.	 The program is translated into
	  ~pseudo-code, a datastructure which is then interpreted.	The speed
	  of the program's execution lies somewhere between that of
	interpretation and compilation.


5.3
Interactive Language Systems

	An interactive language system allows the user to repeatedly
	communicate with the system "live".  The user types in a statement
	and the system gives back an immediate response.  After that, the user
	is free to type in another statement.  This kind of live interaction
	is ideal.

	An example of an interactive language system is the operating system
	that runs on your computer.  You give it commands, such as to delete
	or copy a file, or to type out a file.  You get immediate responses
	for each command.

	In the past, it was thought that only interpreters could support
	interactive language systems.  Compilers of the day performed their
	translation in "batch mode".  The user had to supply the compiler with
	all of his or her statements at once, before seeing any of the
	responses.

	Some systems provided both a
	compiler and an interpreter for the same language.  You could then
	enjoy the efficiency of compilation and retain the interactive nature
	of interpretation.  A user would
	compile the bulk of his or her program specification first, using the
	compiler.  Then he or she would use the interpreter for interactive
	communication.  Those interactive communications could call upon the
	compiled (fast) programs.

	Providing both a compiler and an interpreter for the same language has
	a serious drawback.  Since they are two implementations for the same
	language, a given statement in the language might behave slightly
	differently depending on whether it was interpreted or compiled.  This
	disparity is surprisingly difficult to remove.  (Sometimes, the very
	definition of a language is nailed down only after observing the
	behavior of the finished translator).

	The best solution is to retain only the compiler.  Have the compiler
	translate each given statement into a machine language program, and
	then immediately jump to that machine language program.  When that
	program returns, the compiler goes back into its
	translation phase, ready to accept new user input.  That is, we
	create and run a new custom machine language program for each
	individual interaction with the user.  This method of
	interactive compilation delivers the best of both worlds while
	maintaining a single consistent behavior.

	Another traditional use for interpreters involved debugging.  The
	process of debugging requires that the programmer be able to stop
	program execution and to examine and modify the values in program
	variables.  This must occur interactively.

	The language in which the programmer specified these debugging needs
	was almost always different from the programming language itself.
	Having two languages like this seemed natural enough, because it was
	thought that the interactive debugging language had to be implemented
	by an interpreter, whereas the programming language had to be
	implemented by a compiler.

	Most often, the debugging language sorely lacked programming constructs
	like loops and if-then-else constructs.  Such programming capabilities
	during debugging are nearly essential, e.g., for finding the length of
	a linked list.

	With interactive compilation, the programmer can specify debugging
	needs right in the high-level programming language itself.  The full
	richness of the programming language, including loops and if-then-else
	constructs, is available.  The progammer can examine the values in
	variables exactly as he or she would do in the programming language
	itself.

	(For low-level languages, like machine or assembly language, it is
	useful to use a different, higher-level debugging language.  Only for
	high-level languages does it pay to make the debugging language be
	the same as the programming language).

		BOX:	Which runs faster, a compiled program or an
		BOX:	interpreted program?
		BOX:
		BOX:	Have you ever wished that the debugging language were
		BOX:	as rich as the high-level programming language?