CHAPTER 6
MACHINE AND ASSEMBLY LANGUAGE
6.1
The Target Language Of Compilation - Machine Language
Let's pursue compilation, the most effective method for treating
programs specified in high-level languages. Compilation refers to
the translation from a high-level language into the machine's native
language, machine language.
To find out what machine language is, we've got to see the computer's
~memory and its ~central ~processing ~unit ~(CPU).
Some of the details in the following lean towards a particular
computer, just to have a consistent discussion. Registers are
presumed, and their integration into the computer is taken from DEC's
PDP-10 series computers, perhaps the most fun machine to program.
This particular model lies about halfway between RISC and CISC
machines. RISC machines (Reduced Instruction Set Computers) such as
MIPS' R3000, Motorola 88000, or the Intel i860, require that all
computations be done in registers. In contrast CISC machines (Complex
Instruction Set Computers) like DEC's VAX computers can perform
computations even in the absence of registers. Our model requires
only that ~one of the two operands of a computation be in a register.
Variations from these assumptions have only a small impact on
programming in machine language.
6.1.1
Random Access Memory
The computer's main memory is a set of storage bins, where each bin
has a unique address. All information resides in those bins.
Figure 6.1 shows a picture of memory, a big rectangular region divided up into
separate bins. Each bin has a unique address which appears to the left
of the bin. That figure shows that the bin for address 2 contains a 5.
Bins addressed 4 and 5 contain the numbers 25 and 36 respectively.
This kind of memory is called ~random ~access ~memory, or RAM.
You can visit the bins in any order, e.g., you can visit bin 0 and
then bin 4 without having to visit the bins between them, bins 1,2, and
3. You can visit any bin at any time. The order in which you visit
bins can be ~random.
Main memory connects to the central processing unit (CPU) as shown
in figure 6.2.
The CPU drives the memory. The CPU specifies the
address of a bin to visit, in binary, on the set of ~address wires.
(Each wire holds a low voltage or a high voltage, representing binary
0 or 1). The read/write wire carries the CPU's wish to the memory,
expressing whether to ~read existing data from the bin or to ~write
new data into the bin. (Which bin is specified on the address wires).
The data going to or coming from the memory are held on the ~data
wires. If the CPU specifies a ~read, then the memory puts the data
taken from the addressed bin onto the data wires. The CPU is then
free to examine those data. If the CPU specifies a ~write, then the
CPU puts the new data onto the data wires. The memory takes those
data from the data lines and stuffs them into the addressed bin.
6.1.2
The CPU
The CPU follows instructions that are stored in main memory. Each bin
holds one instruction. (On some machines, an instruction can reside
in half a bin, or in more than one bin).
At any moment in time, the CPU is executing one instruction. The
address of that instruction in main memory can be found in the CPU's
PC (Program Counter).
Normally, the CPU executes instructions that reside in consecutive
bins. That is, the next instruction that the CPU will execute normally
resides in the address PC+1, the bin following the bin containing the
current instruction. Prior to the completion of the current
instruction's execution, the CPU performs
PC := PC + 1 ;
so that the PC holds the address of the next instruction.
The CPU repeatedly executes the following sequence of events:
1) Acquire the next instruction by reading from the memory bin
whose address is in the PC.
2) Perform PC:=PC+1; (so that the PC normally points to the next
instruction in memory).
3) Execute the acquired instruction.
The instruction's execution may cause main memory to be
read or written. Notice that the PC correctly contains the address of
the next instruction to execute.
A particular class of instructions, called ~jumps, exist to modify the
PC. The jump instruction puts an entirely new address into the PC.
This causes the CPU to follow the instructions starting at that new
address.
6.1.3
The Registers
A small piece of memory actually resides in the CPU. That memory is
called the ~registers. The registers act as the first bins in memory,
the bins addressed 0 thru say 15. (Any particular machine has some
small number of registers).
The registers live inside the CPU. The CPU can read the contents
of a register at the very same time that it reads the contents of a bin
in main memory! This delivers at once ~two numbers, one from a
register and one from main memory. (If both numbers came from
main memory, the CPU would have to access main memory ~twice, and would
hence have to wait longer to accomodate the second access to main
memory).
Most operators, like addition, consume two numbers at once, e.g., the
"+" in "A+B" consumes the numbers in A and B simultaneously. We make
A and B simultaneously accessible by forcing A to appear in a
register, while B can be in another register or in main memory.
6.1.4
Machine Instructions
Each instruction, like ADD, includes the specification of two
addresses. Those two addresses denote the two bins whose contents ADD
will sum up. (Some machines allow for a third address, which
dictates where the result of the ADD should be stored).
We will assume that the first of the two addresses
specifies a register (an address less than 16). This gives
simultaneous access to the two numbers that ADD will sum up.
We write the ADD instruction as follows:
ADD register_address , memory_address
An example follows:
ADD 1 , 333
This instruction sums up the contents of bin 1 and bin 333, and puts
the result back into bin 1 (register 1).
All instructions like ADD, which consume two numbers at once, always
put their results back into the register, the first address (at least
in our model). For example, if
bin 1 contains the number 58
and
bin 333 contains the number 2
then upon completion of the ADD instruction, register 1's contents are
changed, replacing the old 58 with the new 60. The contents of bin
333 remain unchanged.
The instruction
SUB 5 , 334
puts into register 5 the ~difference between the numbers in register 5
and bin 334.
Other arithmetic and logical instructions are
MUL register_address , memory_address (Multiply)
DIV register_address , memory_address (Divide)
AND register_address , memory_address (Logical AND)
OR register_address , memory_address (Logical OR)
XOR register_address , memory_address (Logical
exclusive-OR)
They all put the answer back into the register_address.
6.1.4.1
STORE and LOAD - Instructions That Move Data Between Registers And Main Memory
The instructions just shown require that at least one of their
operands resides in a register. The LOAD and STORE instructions are
introduced to move data between a register and a bin in main memory.
Neither instruction performs any arithmetic. The LOAD instruction
moves data from memory into a register. STORE does the opposite,
moving data from a register into memory. For example:
LOAD 1 , 333 moves the data in bin 333 to register
1. (The old contents of register 1 are
lost).
STORE 1 , 333 moves the data in register 1 into bin
333. (The old contents of bin 333 are
lost).
The reader may notice that to add the contents of bins 222 and 333,
we must use two instructions:
LOAD 1 , 222
ADD 1 , 333
The LOAD instruction is there to satisfy ADD's requirement that its
first operand reside in a register (here register 1).
It may appear that our requirement that the first operand always
reside in a register in fact doesn't buy us anything. We've had to
insert an extra instruction (LOAD) just to satisfy that requirement.
Why not break that rule, and write only
ADD 222 , 333 ?
This mythical ADD instruction would have to wait for two memory reads,
and would hence take as long as the LOAD followed by the real ADD.
LOADs and STOREs can be avoided of course if the desired operand is
already in a register. For example, to compute the expression
A * B + C * D
we can get by using only two LOADs:
LOAD 1 , A "Put A*B into register 1"
MUL 1 , B
LOAD 2 , C "Put C*D into register 2"
MUL 2 , D
ADD 1 , 2 "Add the two registers"
The intermediate results, A*B and C*D, are held in registers (1 and
2 resp.), and hence don't need any LOADs to access. (The text
appearing within the quotes (") are comments. They are not part of
the machine language program).
By holding intermediate results in registers, or even holding a copy of
memory data in registers, LOADs and STOREs can be removed, resulting
in a faster executing program. Both the human machine language
programmer or a compiler can produce a more optimal machine language
program by using registers in such a way as to minimize LOADs and
STOREs. The task of using registers intelligently is called the
~register ~allocation problem.
6.1.4.2
Exercises
1) Suppose the following memory addresses contain the following
values:
333: 1
444: 20
555: 10
(The address of the bin appears to the left of the colon, and
the contents of that bin appear to the right of the colon).
What will appear in register 1 after each of the following
machine language programs is executed?
a) LOAD 1 , 444
b) LOAD 1 , 444
ADD 1 , 333
c) LOAD 1 , 444
MUL 1 , 555
ADD 1 , 333
d) LOAD 1 , 444
MUL 1 , 555
ADD 1 , 333
STORE 1 , 444
e) LOAD 1 , 444
ADD 1 , 333
LOAD 2 , 555
SUB 2 , 333
MUL 1 , 2
STORE 2 , 444
What will bin 444 contain after each of the machine language
programs?
2) Write a machine language program which will put into bin 40
the sum of the two numbers residing in bins 33 and 34.
6.1.4.3
Memory For Variables
To translate the assignment statement:
C := A + B ;
we have to allocate a bin for each of the variables A, B, and C. Let's
allocate for those three variables the bins at addresses 222, 333, and
444 respectively.
We want to perform an ADD. We must first get one of A and B into a
register. We get A into a register via the following instruction:
LOAD 1 , 222 "(222 is the address for A)"
Now we perform the addition:
ADD 1 , 333 "(333 is the address for B)"
The answer is now in register 1. We complete the specified assignment
statement by putting the answer into C:
STORE 1 , 444 "(444 is the address for C)"
Taken together, the three instructions that implement
C := A + B ;
are
LOAD 1 , 222 "(222 is A)"
ADD 1 , 333 "(333 is B)"
STORE 1 , 444 "(444 is C)"
6.1.4.4
Memory For Programs
Machine instructions reside in memory, too. Let's put our program for
C := A + B ;
at address 1000:
1000: LOAD 1 , 222
1001: ADD 1 , 333
1002: STORE 1 , 444
Each line specifies an address (to the left of the colon) and what
we would like to have in that bin (to the right of the colon).
We can cause the CPU to execute this program by putting the number
(the address) 1000 into the CPU's PC.
We complete the program by specifying the bins for A, B, and C:
222: 0
333: 0
444: 0
Each of A, B and C contains a zero initially. Presumably, we could
call upon other programs prior to executing our program. Those other
programs might put new values into A, B, and C (bins 222, 333, and
444).
6.1.4.5
Instructions Are Represented By Numbers
We have specified the contents of bins 1000, 1001, and 1002 in terms
of easy to read machine instructions. Actually, each instruction is
represented by a single (large) number. That single large number
represents the instruction (e.g. ADD) along with its two operands'
addresses (e.g., 1 and 222).
The computer's manufacturer gives you a number for each instruction.
Let's assume that
LOAD is represented by the number 102
STORE is represented by the number 103
ADD is represented by the number 170
SUB is represented by the number 171
etc.
The number we put into bin 1000, for example, is a composition of
three numbers:
102 for LOAD
1 for (register) 1
222 for (memory address) 222
We translate
LOAD 1 , 222
into the number
102 1 222
which appears as follows when written all together:
1021222
Another example follows:
ADD 1 , 2
translates to
170 1 002
or
1701002
The complete machine language specification for the variables A, B, and
C and the program for
C := A + B ;
is:
222: 0 "(A)"
333: 0 "(B)"
444: 0 "(C)"
1000: 102 1 222
1001: 170 1 333
1002: 103 1 444
6.1.4.6
We've Reached The Bottom
At this point, we've reached the bottom. Everything, including
programs, are represented by numbers, numbers residing in bins of
main memory. This is truly ~machine ~language. Our easier to read
notations, using words like LOAD, is actually called ~assembly
~language. We will come back to assembly language shortly.
The CPU executes instructions represented as numbers most quickly.
All the numbers are actually represented in binary, 1's and 0's, or
what might be called TRUEs and FALSEs. The microchips that make up
the CPU operate in binary.
In fact, while we call this the lowest level, the microchip designers
call it the highest level. There are compilers that translate not
into machine language, but into full microchip designs. Because
microchips have been fabricated primarily in silicon, the process
of translating a specification into microchip designs is called
~Silicon ~Compilation.
BOX: Can you see how programs can be represented as
BOX: sequences of (large) numbers?
BOX:
BOX: Can you imagine how a computer executes such sequences?
6.1.4.7
Exercises
1) Translate the following into machine language (pure numbers):
222: 1
333: 1
500: LOAD 1 , 222
501: ADD 1 , 1
502: ADD 1 , 1
503 STORE 1, 333
What number resides in bin 333 after execution of this
program?
2) Consider the following program:
222: MUL 1 , 1
333: 1
500: LOAD 1 , 333
501: LOAD 2 , 222
502: STORE 2 , 503
503: ADD 1 , 1
504: STORE 1 , 333
Translate it into machine language. What value will appear
in 333 upon completion of execution? What number does register
2 contain when the instruction at 502 commences execution?
What is the contents of address 503 upon completion of this
program's execution?
This exercise shows the beginning of compilation. Part of this
program's execution participates in writing a machine language
program. This example modifies itself.
6.1.4.8
Jumps
So far, all our machine language programs contain no jumps. Execution
starts at some address, and continues executing consecutive
instructions until the end of memory is reached, at which time the
machine crashes.
The JUMP instruction is very simple. It merely puts a new address
into the CPU's PC (Program Counter). Recall that the CPU repeatedly
carries out the following:
1) Acquire the next instruction from the memory bin whose
address is in the PC.
2) PC := PC + 1 ; (so to point to the next instruction)
3) Execute the acquired instruction.
The execution of the JUMP instruction (part 3 above), e.g.,
JUMP 500
puts the address 500 into the PC. Although the PC was modified in
step 2, step 3, the execution of this JUMP, modifies the PC once again.
(You might notice that for JUMPs, the second step is entirely
unnecessary. However, the second step is always performed, so that
all non-JUMPs lead execution to the next instruction).
Consider the program:
222: 1
500: LOAD 1 , 222
501: ADD 1 , 1
502: STORE 1 , 222
503: JUMP 500
This program doubles the contents of bin 222 each time instructions
500 thru 502 are executed. The JUMP, at bin 503, causes execution to
resume at address 500. This program loops forever.
A variation of the JUMP instruction, called conditional JUMPs, jump
only some of the time. They look like:
JUMP? register_address , memory_address
The "?" is any of
EQ (Jump if the register's contents are EQual to zero)
NE (Jump if Not Equal to zero)
LE (Jump if register is Less than or Equal to zero)
LT (Jump if Less Than zero)
GE (Jump if Greater or Equal)
GT (Jump if Greater Than zero)
Each of these represents a condition. If the condition is met,
the jump occurs, otherwise no jump occurs.
Consider the following program:
111: 1
222: 2
333: 10
500: LOAD 1 , 333 "Make register 1 hold the count of 10"
501: LOAD 2 , 222 "Double bin 222"
502: ADD 2 , 2
503: STORE 2 , 222
504: SUB 1 , 111 "Subtract 1 from the count"
505: JUMPGT 1 , 501 "If the count is still positive, then
jump back to address 501. "
The instructions at 501 thru 503 are executed 10 times. Each time we
execute thoses instructions, we then decrement the count held in
register 1. (See instruction bin 504). After the decrement, we go
back (to 501) and repeat the whole thing again if the count (held in
register 1) has not yet reached down to zero.
When the instruction at 501 executes the first time, register 1
contains a 10. (Register 1 was just LOADed with the contents of bin
333, a 10). Each subsequent time that instruction at 501 is executed,
register 1 contains a number that is one smaller than the contents of
register 1 at during the previous execution. This decrementing occurs
when when the instruction at 504 is executed. Thus, upon reaching 501
the second time, register 1 contains a 9. On the third visit, register
1 contains an 8. Upon the tenth visit to 501, register 1 contains the
number 1. When the instruction at 504 gets executed, it will leave a
zero in register 1. Then the JUMPGT instruction (at address 505)
~doesn't jump. The contents of register 1 are no longer GT (greater
than) zero. This time, execution continues at address 506 (which we
haven't specified yet).
A clearer way to write this program involves indentation for the
instructions in the loop:
111: 1
222: 2
333: 10
500: LOAD 1 , 333
501: LOAD 2 , 222
502: ADD 2 , 2
503: STORE 2 , 222
504: SUB 1 , 111
505: JUMPGT 1 , 501
506: Next instruction
...
6.1.4.9
Exercises
1) Translate this latest program into pure machine language.
The computer's manufacturer has told us that the instruction
JUMPGT is represented by the number 123.
Suppose register 1 contains a 2 upon executing the JUMPGT
instruction (at bin 505). What instruction will execute next?
What instruction would be executed next if register 1 contained
a zero?
2) In the following program, how many times will the instruction
at 503 be executed?
111: 1
222: 2
333: 11
500: LOAD 1 , 333
501: LOAD 2 , 333
502: LOAD 3 , 222
503: ADD 3 , 3
504: STORE 3 , 222
505: SUB 2 , 111
506: JUMPGT 2 , 502
507: SUB 1 , 111
508: JUMPGT 1 , 501
509: Next instruction
This program is a two-dimensional loop. The inner loop
(502-506) uses register 2 to hold the count. The outer
loop (501, 507, and 508) uses register 1 to hold the count.
6.1.5
Epilog
We've had a good look at machine language. There are two things
however that remain to be shown.
One is called ~indexing. Any instruction's second operand address,
the memory address, can actually be computed. The address can be
obtained by adding in the contents of a register. Indexing is
introduced in Section 9.5.
A related feature we've passed up is the subroutine ~call instruction.
It is a jump instruction which has an additional side effect: It puts
the present contents of the PC into a register. With this information,
the called subroutine, when it is done, can ultimately jump back to
the bin following the call instruction. Chapter 18 introduces the
call instruction.
6.2
Assembly Language
Assembly language is one step above machine language. Assembly
language and machine language
correspond one-to-one. Each line of assembly language produces one
line of machine language. (In the presence of ~macros,
this mapping can be one-to-many. More about macros shortly).
Our previous programming examples were written somewhere between
machine and assembly language. They were like assembly language in
that we used easy to read names for instructions. They were like
machine language in that we represented all address by numbers, instead
of more convenient, easier to understand names.
For example, the last program shown in the Section 6.1.4.8 is
repeated here:
111: 1
222: 2
333: 10
500: LOAD 1 , 333
501: LOAD 2 , 222
502: ADD 2 , 2
503: STORE 2 , 222
504: SUB 1 , 111
505: JUMPGT 1 , 501
The following assembly language program does the same thing as
the machine language program:
ONE: 1
IT: 2
COUNT: 10
START: LOAD 1 , COUNT
LOOP: LOAD 2 , IT
ADD 2 , 2
STORE 2 , IT
SUB 1 , ONE
JUMPGT 1 , LOOP
END: Next instruction
Meaningful names have been used instead of actual addresses. This
assembly language program is easier to read. Those names for addresses
are called ~labels.
Assembly language is translated into machine language by a program
called the ~assembler. This is indeed a form of compilation (because
of the translation). Some people don't think of an assembler as a
compiler because the translation is so easy: It is a one-to-one
translation. Each instruction specified in assembly language turns
into one machine language instruction.
6.2.1
Assignment Of Addresses To Labels
The assembler actually assigns addresses to our labels, e.g., it
assigns addresses to ONE, IT, COUNT, START, LOOP, and END. This
is necessary because machine language understands only addresses,
not labels.
Each line in the assembly language program goes into a bin in memory.
Consecutive lines in the assembly language program go into
consecutive bins in memory. Each label is assigned the address of
the bin into which goes the line containing the "label:".
Thus, if the assembler chooses 200 as the address for ONE, the
following assignments are made to our labels:
ONE is 200
IT is 201
COUNT is 202
START is 203
LOOP is 204
END is 209
Our example program is easy to translate. Each label is defined before
it is referenced. That is, the line containing "label:" appears
before any other line in which the label appears.
For example, the label definition "COUNT:" appears in the third line,
before the fourth line where COUNT is referenced within the LOAD
instruction.
In general, it is actually possible to reference a label ~before the
label is defined. The following program is equivalent to our first
example:
START: LOAD 1 , COUNT
LOOP: LOAD 2 , IT
ADD 2 , 2
STORE 2 , IT
SUB 1 , ONE
JUMPGT 1, LOOP
ONE: 1
IT: 2
COUNT: 10
Here, the label COUNT is referenced in the first line, but its
definition appears on the last line. This situation is allowed, and
is called a ~forward ~reference.
BOX: What happens when the JUMPGT doesn't jump? (Example
BOX: #3 in Section 7.7 addresses this situation).
6.2.2
Assembly Language's Biggest Win
Programs written in assembly language are far easier to modify than
programs written in machine language. Ease of program modification is
extremely important, since you will modify the program during
debugging, and perhaps later, to change slightly the program's action.
Suppose we need to introduce a new instruction between the labels START
and LOOP, e.g., to come up with:
START: LOAD 1 , COUNT
New Instruction
LOOP: LOAD 2 , IT
...
JUMPGT 1 , LOOP
This action might be the result of debugging. Notice how many things
change because of this modification. The address assigned to the
label LOOP must now be one greater than it use to be. Any appearence
of LOOP anywhere must now translate into this new address. The JUMPGT
instruction references LOOP, and thus is part of what must change.
The assembly language programmer is freed entirely from all these
details. He or she just inserts the new instruction textually as
shown above.
In contrast, the machine language programmer has to consider every
instruction, like the JUMPGT instruction, so as to change all
occurences of 204 to 205. (The label LOOP used to represent the
address 204 but now it represents 205).
In general, a label can be referenced
arbitrarily many times, by appearing in that many memory bins. The
machine language programmer must examine all of his or her instructions
in order to replace all occurences of the 204 now by 205. This large
examination must be done each time the machine language programmer
inserts or deletes any instruction! That makes machine language
program modification extremely difficult and error-prone.
Assembly language's great services allow:
1) The use of names for instructions, e.g., LOAD and ADD (called
~mnemonics).
2) The use of names for memory addresses, e.g., LOOP and START.
These names are called labels.
3) The automatic assignment of addresses to the labels.
The third service must be provided during translation into machine
language. Machine language understands only addresses, not names.
(That's why machine language executes so quickly)!
Assembly language often comes with what are called ~macros and
~conditional ~assembly. We will see these in the next chapter.
BOX: What are the advantages of assembly language?
BOX:
BOX: Is it easier to program with names or with numbers?
BOX:
BOX: How does assembly language ease program modification?
6.2.3
Exercises
1) Translate the following into machine language. Assume that
the assembler assigns address 200 to the label ONE:
ONE: TWO
TWO: FOUR
THREE: JUMPGT 1,TWO
FOUR: ONE
(The contents of location ONE: is not the number 2, but the
label whose name happens to be TWO).
Translate it again, but with the following line inserted
between the lines beginning with "ONE:" and "TWO:"
FIVE: FIVE
2) There is something fatally wrong with the following.
The error prevents the completion of translation into machine
language.
ONE: TWO
FIVE: FOUR
THREE: JUMPGT 1,TWO
FOUR: ONE