CHAPTER 8
A SAMPLE COMPILER
Armed with the embedded assembly language, we now create a compiler.
We introduce rules of grammar like we did in chapter 1, and
specify the rules' semantics in terms of the embedded assembly
language.
Each rule of grammar with its semantics is very much like a macro
definition. You call such a macro via the syntax expressed in the
rule.
Some of our examples of macros used the type BASIC_PROCESS, such as
the LATER macro, which took in its parameter as a
BASIC_PROCESS (Section 7.7.3). BASIC_PROCESSes are very powerful
and useful, so useful in fact that chapter 4 introduced an
implicit way to make BASIC_PROCESSes in the "<FORM:...>".
Within the semantics of any rule, where we would usually create a
BASIC_PROCESS via the //...\\ notation, we omit the // and \\ (as well
as the context held in [...]).
The ~implicit creation of BASIC_PROCESSes within the
semantic specifications for rules will make our semantic specification
look slightly simpler than our macro definitions.
This sample language is different from ICL. However, it has
expressions including binary operators, statements such as the
assignment statement, complex conditionals, IF's and WHILE-loops. It
does not have functions (until Chapter 19).
We now begin our compiler with notations for expressions that denote
numbers.
8.1
Expressions
The language for which we write this compiler has a part-of-speech
<EXPR> to represent notations that express values. All values in our
language will be integers only.
Beware that the language we introduce
is not a subset of ICL. We use a slightly different syntax, to show
that one can write compilers whose languages are different from the
language in which semantics are specified.
We declare the part-of-speech EXPR as follows:
POS EXPR[8] : - ;
The "8" is there so as to ease our definition of operator precedence
(as shown in Chapter 3). The "-" indicates that the
semantics of any <EXPR> is an action that yields no value. Outside
the semantic specifications in rules, we would normally call this the
type BASIC_PROCESS.
We actually declare the part-of-speech, with an essential comment, as:
POS EXPR[8] : - ; "The action of the semantics is to
generate a machine language program
whose execution will put into register
1 the computed value represented by
the <EXPR>. "
The machine language program generated by an EXPR's semantics leaves
the ~answer in register 1, by the time it finishes executing.
Here are two rules which produce <EXPR>s:
<NUMBER:n> -> <EXPR[1]: LOAD(1, ADDRESS_OF(N) ); >
<ID:i> -> <EXPR[1]: LOAD(1, ADDRESS_OF_VARIABLE(I) ); >
The first rule says that any number is also a valid <EXPR>. The
semantics of that <EXPR> generates a (short) machine language program,
which leaves the answer in register 1. It LOADs into register 1 the
contents of a bin holding the given number N.
The second rule states that any <ID> can also be an <EXPR>. An <ID>
(IDentifier) is any name. For us, an <ID> is the name of a variable.
We presume that a pre-processor recognizes identifiers (names) and thus
creates <ID>s, or that some other rules of grammar generate <ID>s.
We declare the part-of-speech <ID> as follows:
POS ID : TEXT ;
The semantics of an <ID> is text, the name represented by the <ID>.
Our second rule's semantics generates a machine language program
which puts into register 1 the contents of the bin associated with
this <ID>. That is, the function ADDRESS_OF_VARIABLE turns a name
into an address, the address of the bin which always holds that
variable's value.
8.1.1
The ID/Address Associations
We define ADDRESS_OF_VARIABLE as follows. First, we define a new
datatype to represent the necessary table which associates addresses
with identifiers:
TYPE NAME_AND_ADDRESS = [ NAME: TEXT ADDRESS: LABEL ] ;
TYPE SYMBOL_TABLE = { NAME_AND_ADDRESS } ;
The type SYMBOL_TABLE is a set of name/address associations
(NAME_AND_ADDRESS). A NAME_AND_ADDRESS is a ~record having fields
named NAME and ADDRESS. The NAME field is TEXT. The ADDRESS field is
a LABEL, a LABEL which will denote the address of where that variable's
data are kept. (Section 23.4 shows more about records).
The type SYMBOL_TABLE is a ~list of elements of type NAME_AND_ADDRESS.
A SYMBOL_TABLE thus represents many name/address associations.
(Section 23.3 shows more about lists).
We declare a global variable to hold our symbol table:
VAR SYMBOL_TABLE = SYMBOL_TABLE ;
All name/address associations will be represented in the variable
SYMBOL_TABLE.
We define ADDRESS_OF_VARIABLE, which takes in a name and delivers
a label. It sees if there is already a name/address for the given
name, in which case it delivers the associated address. Otherwise, it
creates a new name/address pair for this new name and delivers the
newly allocated bin for that name:
DEFINE ADDRESS_OF_VARIABLE( NAME:TEXT ) = LABEL:
BEGIN VAR NA=NAME_AND_ADDRESS; L=LABEL;
IF FOR NA $E SYMBOL_TABLE;
THERE_IS NA.NAME = NAME
THEN NA.ADDRESS
ELSE DO L:= NEW;
SYMBOL_TABLE:= [NAME:NAME ADDRESS:L] <$
SYMBOL_TABLE;
LATER( //[L;] EQU(L);
PUT(0); \\ );
GIVE L
FI
END
ENDDEFN
This function is paraphrased in English as:
DEFINE ADDRESS_OF_VARIABLE( NAME: TEXT ) = LABEL:
BEGIN VAR NA = NAME_AND_ADDRESS; L = LABEL;
IF for all records NA in the SYMBOL_TABLE,
is there one whose name (NA.NAME) equals our given NAME?
THEN "there is a match."
Deliver the found NAME_AND_ADDRESS's ADDRESS field.
That's the address associated with our given NAME.
ELSE Create a new label L.
Put a new entry into the symbol table, a
NAME_AND_ADDRESS whose name is our given NAME, and
whose ADDRESS is our new label L.
LATER on, define L's value to be the address of a new
bin in memory.
Deliver our new label, (which from now on is associated
with our given NAME).
FI
END
ENDDEFN
BOX: What do the semantics of an <EXPR> do?
BOX:
BOX: If the same <ID> (TEXT) is given to
BOX: ADDRESS_OF_VARIABLE twice, does the same LABEL
BOX: come out both times?
8.2
Binary Operators
As we did in Chapter 3, we declare a part-of-speech BOP, short for
Binary OPerator:
POS BOP[8] : - ; "The action of the semantics is to
generate a machine language program
that puts into register 1 the
combination (e.g., ADD) of register 1
and the address in RIGHT_OPERAND."
A <BOP> takes two operands. One is in register 1, and the other is
in the bin whose address is in a new global variable RIGHT_OPERAND.
RIGHT_OPERAND is top-down context indicating to the <BOP> where the
~second operand resides in memory. The <BOP> combines
the value in that bin with the value in register 1, and puts the result
into register 1. We now declare this top-down context variable:
VAR RIGHT_OPERAND = ADDRESS_EXPRESSION ;
This variable will always be set up prior to invoking any <BOP>'s
semantics.
Here are two <BOP>s, "+" and "*":
+ -> <BOP: ADD( 1 , RIGHT_OPERAND ); >
* -> <BOP: MUL( 1 , RIGHT_OPERAND ); >
Notice how "+" generates a (short) machine language program which
puts into register 1 the sum of register 1 and the contents of the bin
whose address appears in RIGHT_OPERAND. The "+" rule uses the ADD
instruction whereas the "*" uses the MULtiply instruction.
Who sets up the variable RIGHT_OPERAND in the first place?
We set up RIGHT_OPERAND within the next rule, the big rule which
combines two <EXPR>s with a <BOP>. Recall from Section 3.2.1 that the
EXPR-BOP-EXPR rule, the rule that combines two EXPRs via a BOP, looks
like:
<EXPR[i]: left > <BOP[j]: op > <EXPR[k]: right > ->
IF i =< j & k < j THEN
<EXPR[j]:
... >
FI
The IF-THEN appearing here enforces operator precedence. Here it is
again, with the semantics, the "..." filled in:
<EXPR[i]: LEFT > <BOP[j]: OP > <EXPR[k]: RIGHT > ->
IF i =< j & k < j THEN
<EXPR[j]:
HOLDING SPARE_MEMORY:= SPARE_MEMORY + 1; "Allocate spare
bin in memory"
DO
<* RIGHT *>; "Get righthand side into register 1"
STORE(1,SPARE_MEMORY); "Store it in memory"
<* LEFT *>; "Get lefthand side into register 1"
HOLDING RIGHT_OPERAND:= SPARE_MEMORY;
DO <* OP *>; "Operate on register 1 and
SPARE_MEMORY. Answer is in
register 1"
ENDHOLD
ENDHOLD >
FI
The semantics of the resulting <EXPR> is enclosed by the HOLDING:
HOLDING SPARE_MEMORY := SPARE_MEMORY + 1 ;
DO ...
ENDHOLD
This allocates a spare bin in memory. We use that spare bin to hold
the righthand <EXPR>'s value, during the computation of the lefthand
<EXPR>'s value. We can't simply leave RIGHT's answer in register 1,
because LEFT will overwrite register 1 with a different value, the
value of the lefthand expression.
We are assuming that the variable
SPARE_MEMORY holds the highest address of used memory.
SPARE_MEMORY + 1, which we put back into SPARE_MEMORY, thus holds the
address of an unused bin in memory, ~a ~bin ~that ~we ~can ~use.
We have performed the assignment within a ~HOLDING so that SPARE_MEMORY
will return to its old value when we are done. When we're
done, we no longer need this spare memory bin. If someone else asks
for spare memory, like we just did here, they will reuse the bin that
we just finished using.
Notice that the heightened SPARE_MEMORY has an important effect on
the invocations of LEFT and RIGHT, our two sub-<EXPR>s. Because
SPARE_MEMORY points to our bin, the machine language programs produced
by LEFT and RIGHT will not use our bin. (They may use higher bins).
This is important to us because we require our bin (SPARE_MEMORY) to be
left untouched by LEFT's machine language program.
Once inside the HOLDING, we execute:
<* RIGHT *>; "Get righthand side into register 1"
STORE(1,SPARE_MEMORY); "Store it"
<* LEFT *>; "Get lefthand side into register 1"
This leaves register 1 holding the lefthand <EXPR>'s value, and
SPARE_MEMORY holding the righthand <EXPR>'s value.
The next thing we do is invoke the <BOP>. Notice how we tell the
<BOP> where its righthand operand is. We set RIGHT_OPERAND to
SPARE_MEMORY via the inner HOLDING.
The <BOP> (OP) generates a (short) machine language program which
combines the values in register 1 (our lefthand <EXPR>) and in
SPARE_MEMORY (our righthand <EXPR>). The <BOP> leaves the answer in
register 1, as any BOP is supposed to do. Our resulting <EXPR>'s
semantics has thus generated a (bigger) machine language program,
which leaves ~our answer in register 1, as is expected of any <EXPR>.
8.2.1
Examples
Let's see the machine language program generated by this rule for the
<EXPR>:
10 + 20
We will show on the left the semantics of the EXPR-BOP-EXPR rule,
and on the right, the instruction(s) generated by that line.
Let's assume that SPARE_MEMORY holds the address 3314. Presumably,
all higher addresses, like 3315, are free for us to use. Here it is:
HOLDING SPARE_MEMORY:= SPARE_MEMORY+1; "SPARE_MEMORY holds 3315"
DO
<* RIGHT *>; LOAD(1, ADDRESS_OF(10) );
the store; STORE(1, 3315 );
<* LEFT *>; LOAD(1, ADDRESS_OF(20) );
HOLDING RIGHT_OPERAND:= SPARE_MEMORY;
DO
<* OP *>; ADD(1, 3315 );
ENDHOLD
ENDHOLD
"Register 1 has the sum of 10 and 20".
The two LOAD instructions (to the right of each of RIGHT and LEFT) are
generated by our earlier rule:
<NUMBER:n> -> <EXPR: LOAD(1, ADDRESS_OF(n) ); >
Admittedly, the machine language code that we generate is not the most
efficient. More efficient code generation is considered in Section
8.9.
Here is another example. What does the EXPR-BOP-EXPR rule generate
for the follwoing <EXPR> where the <BOP> is "*":
(10+20) * 30 ?
The LEFT <EXPR> is "(10+20)" and the RIGHT <EXPR> is "30". It
generates code as follows:
HOLDING SPARE_MEMORY:= SPARE_MEMORY+1; "SPARE_MEMORY is 3315"
DO
<* RIGHT *>; LOAD( 1, ADDRESS_OF(30) );
the store; STORE( 1, 3315 );
<* LEFT *>; LOAD( 1, ADDRESS_OF(20) );
STORE( 1, 3316 );
LOAD( 1, ADDRESS_OF(10) );
ADD( 1, 3316);
HOLDING RIGHT_OPERAND:= SPARE_MEMORY;
DO
<* OP *>; MUL( 1, 3315 );
ENDHOLD
ENDHOLD
"Register 1 has the value (10+20)*30."
BOX: What would happen if we omit the outer HOLDING,
BOX: resorting to the bare assignment:
BOX:
BOX: SPARE_MEMORY := SPARE_MEMORY + 1 ;
BOX:
BOX: without the HOLDING?
BOX: Can you imagine a more efficient assembly language
BOX: program to evaluate (10+20)*30?
8.3
Statements
We provide one basic statement, an assignment statement:
<ID:i> := <EXPR:e> ->
<STATEMENT[1]:
<* e *>; "Get the righthand side into
register 1"
STORE(1, ADDRESS_OF_VARIABLE(i) );
>
Notice that the semantics of the resulting <STATEMENT> invokes the
righthand <EXPR> to generate code that puts the righthand <EXPR>'s
value into register 1. It then STOREs that value into the lefthand
identifier's associated memory bin.
We declare the new part-of-speech as follows:
POS STATEMENT[2]: - ; "The semantic action is to
generate a machine language
program."
The semantics of a <STATEMENT> are like that of an <EXPR>, except that
nothing is required of register 1.
We allow arbitrary sequences of statements, via the rule:
<STATEMENT[i]: s1 > <STATEMENT[1]: s2 > ->
<STATEMENT[2]: <*s1*> ;
<*s2*> ; >
The semantics of the resulting composite <STATEMENT> generates the
machine language programs for both of the given <STATEMENT>s. (Our
requirement that the second statement be a <STATEMENT[1]>, and the
fact that we produce a <STATEMENT[2]>, assures that statements will
be collected up only one way, left-to-right).
BOX: How does the assignment statement work?
BOX:
BOX: What other kinds of <STATEMENT>s might you like to
BOX: introduce?
8.4
Conditions
Conditions are used to make decisions. A condition represents
ultimately TRUE or FALSE. One example condition follows:
A < B (A less than B)
This condition is TRUE if A is less than B, and is FALSE otherwise.
Conditions form the IF-clause in an IF-THEN-ELSE, as in
IF A < B THEN ... ELSE ... FI
Conditions also form the WHILE-clause in a WHILE-loop, e.g.:
WHILE A < B DO ... END
(This notation for the WHILE statement is not quite legal in ICL, but
we will introduce this notation for our compiler).
The flowchart for a condition is shown in figure 8.1. Our conditions
will expose their decisions by either jumping or not jumping. They
will jump (to FALSE_LABEL) if false, or will fall thru (not jump) if
true.
Here is the machine language program for the condition "A<B":
LOAD(1,A); "A"
SUB(1,B); "A-B"
JUMPGE( 1 , FALSE_LABEL ); "Jump if false"
"Here if A < B"
The condition "A<B" jumps if A is not less than B. It falls thru
(doesn't jump) if A<B is true.
We declare a new part-of-speech, <CONDITION> as follows:
POS CONDITION[3] : - ; "The semantic action is to generate
a machine language program that
jumps to FALSE_LABEL if false,
otherwise it falls thru."
FALSE_LABEL is top-down context for a <CONDITION>. The user of a
<CONDITION> (e.g., in an IF-clause) tells the <CONDITION> where to jump
in case of false. We put into FALSE_LABEL the label we want the
<CONDITION> to jump to.
We EQU that label (define its value) ourselves,
and so we can instruct the <CONDITION> to jump anywhere we please.
Here is the declaration for this top-down context:
VAR FALSE_LABEL = LABEL ;
We form a <CONDITION> from two <EXPR>s and a comparator. That is,
A < B
is a <CONDITION> which is made out of the <COMPARATOR> "<". Let's
introduce the smaller part-of-speech <COMPARATOR>:
POS COMPARATOR: - ; "Like <CONDITION>, the semantic action
is to generate a machine language
program which jumps to FALSE_LABEL if
false, otherwise it falls thru.
Unlike <CONDITION>, a <COMPARATOR>
also requires that the difference
between its two operands, (e.g, A-B)
reside in register 1. "
A <COMPARATOR> thus gains access to its operands by finding their
difference in register 1. Here are two <COMPARATOR> rules:
= -> <COMPARATOR: JUMPNE( 1 , FALSE_LABEL ); >
> -> <COMPARATOR: JUMPLE( 1 , FALSE_LABEL ); >
These <COMPARATOR>s generate very short machine language programs. The
"=" comparator, e.g., in
A = B
finds the difference in register 1. Notice how the "=" rule jumps
to FALSE_LABEL when A-B is non-zero (i.e., A is ~not ~equal to B, the
false condition). Similarly, ">" jumps when false, i.e., when
A-B is less than or equal to zero (the false condition for A>B).
The following rule uses a <COMPARATOR>, and forms our first
<CONDITION>. This rule is meant to capture things like
<EXPR> < <EXPR>
as in "A<B":
<EXPR: LEFT> <COMPARATOR: COMPARATOR > <EXPR: RIGHT > ->
<CONDITION:
HOLDING SPARE_MEMORY:= SPARE_MEMORY + 1;
DO <* RIGHT *>;
STORE(1,SPARE_MEMORY); "Righthand side"
<* LEFT *>; "Lefthand side into register 1"
SUB(1,SPARE_MEMORY); "LEFT-RIGHT into
register 1"
<* COMPARATOR *>;
"Jump to FALSE_LABEL if false,
(otherwise fall thru)"
ENDHOLD >
Like our <EXPR>-<BOP>-<EXPR> rule, the semantics here allocates a
SPARE_MEMORY bin, to hold the right <EXPR>'s value while computing
the left <EXPR>'s value. The program:
<* RIGHT *>;
STORE(1,SPARE_MEMORY);
<* LEFT *>;
SUB(1,SPARE_MEMORY);
puts into register 1 the difference between the left and right <EXPR>s.
At this point, everything is set up sufficiently to invoke the
<COMPARATOR>, which is the last thing we do.
Our job as a <CONDITION> is to possibly jump to FALSE_LABEL. Because
we are a <CONDITION>, we know that somebody has set up FALSE_LABEL for
us to jump to. We let our <COMPARATOR> use the same FALSE_LABEL that
was passed down to us, so that when the <COMPARATOR> jumps, it acts
as our jump to FALSE_LABEL.
In case the comparator reports TRUE, by falling thru, we, the overall
<CONDITION>, will naturally fall thru too.
This passing the buck from <CONDITION> to <COMPARATOR> is possible
because the semantics of these two parts-of-speech are nearly
identical, possibly jumping to FALSE_LABEL. The only difference
between the semantics of <CONDITION> and <COMPARATOR> is the fact
that the <COMPARATOR> requires data to be in register 1. Our
<CONDITION> rule merely sets up rigister 1, and then passes the buck
to the <COMPARATOR>
BOX: Does this use of SPARE_MEMORY remind you of anything?
8.4.1
Example
The <CONDITION>:
10 > 20
generates the following code:
LOAD( 1, ADDRESS_OF(20) ); "the result of <*RIGHT*>;"
STORE( 1, 3315 ); "the store"
LOAD( 1, ADDRESS_OF(10) ); "the result of <*LEFT*>"
SUB( 1, 3315 ); "LEFT-RIGHT into register 1
(10-20)"
JUMPLE( 1, FALSE_LABEL ); "the result of *<COMPARATOR*>;"
This code jumps to FALSE_LABEL if LEFT-RIGHT, 10-20, is less than or
equal to zero. For this example, the false jump is always taken,
as "10>20" is always false.
BOX: What are the conventions for the parts-of-speech
BOX: <CONDITION> and <COMPARATOR>?
BOX:
BOX: How would you introduce another <COMPARATOR> ">="
BOX: (greater than or equal)?
8.5
Using <CONDITION>s - The IF and WHILE Statements
Recall that prior to invoking a <CONDITION> we are required to put a
label into FALSE_LABEL. We are also required ultimately to define the
value of that label (via EQU). Both of the following rules do that:
8.5.1
The IF Statement
Here is the IF-THEN-ELSE rule:
IF <CONDITION:IF> THEN <STATEMENT:THEN>
ELSE <STATEMENT:ELSE> FI ->
<STATEMENT:
BEGIN VAR OUT=LABEL;
HOLDING FALSE_LABEL:= NEW; "Create new label for
FALSE_LABEL"
DO <*IF*>; "Jump if false to
FALSE_LABEL"
"Here if true"
<*THEN*>; "THEN-clause"
OUT:= NEW;
JUMP(OUT); "Get all the
way out"
EQU(FALSE_LABEL); "Here when false"
<*ELSE*>; "ELSE-clause"
EQU(OUT); "(all the way out)"
ENDHOLD
END >
The semantics here assigns into FALSE_LABEL a new label. We do this
assignment in a HOLDING because we want our new FALSE_LABEL to persist
only for us. We want it to retain its original value when we exit.
Having made FALSE_LABEL a new label for us, we invoke the
if-<CONDITION>. If true, it doesn't jump, and we land at the machine
language program generated by the THEN-clause. (The THEN-clause is
supposed to be taken when the condition is true). After the THEN-
clause, we jump all the way out. We have to jump over the ELSE-
clause's machine language program.
After the THEN-clause jumps all the
way out, we generate the ELSE-clause's machine language program.
Notice that we EQU the FALSE_LABEL to be the start of the ELSE-clause.
This means that the IF-clause, if false, jumps right to the ELSE-
clause. (The ELSE-clause is supposed to execute when the condition is
false). We finally EQU the label OUT, so that the jump at the end of
the THEN-clause in fact jumps here, to the end of the whole IF-THEN-
ELSE.
Figure 8.2 shows this all graphically.
The HOLDING that we used to set the FALSE_LABEL is essential.
We've assumed that our FALSE_LABEL is left unchanged by the invocations
of the IF-, THEN-, and ELSE-clauses. However
the THEN-clause (or ELSE-clause) might itself have been formed with the
IF-THEN-ELSE notation, as in:
IF A < B THEN
IF C < D THEN ... ELSE ... FI
ELSE ... FI
This means that our THEN-clause, the inner IF-THEN-ELSE, uses
FALSE_LABEL like we have. Because this inner IF-THEN-ELSE uses
HOLDING upon the variable FALSE_LABEL, its use of FALSE_LABEL is
invisible to us. Thus, we, the outer IF-THEN-ELSE can continue to
assume FALSE_LABEL is ours, and that it appears unaffected no matter
what the THEN- and ELSE-clauses are.
Figure 8.3 shows such a nested occurence of the IF-THEN-ELSE construct.
Notice that there are two FALSE_LABELs, one used within the inner
IF-THEN-ELSE and one used by the outer IF-THEN-ELSE. Our use of
HOLDING lets both FALSE_LABELs exist, without interfering with one
another.
8.5.2
The WHILE Statement
The next rule implements the WHILE-loop. A WHILE-loop takes a
<CONDITION> as its WHILE-clause. It repeatedly executes the body of
the loop (a <STATEMENT>) while the <CONDITION> remains true. Upon
a false value, the loop terminates:
WHILE <CONDITION:C> DO <STATEMENT:S> END ->
<STATEMENT:
BEGIN VAR LOOP=LABEL;
HOLDING FALSE_LABEL:= NEW; "Create new label for
FALSE_LABEL"
DO LOOP:= NEW;
EQU(LOOP); "Beginning of loop"
" Test the condition ... "
<*C*>; "Get out if false (jump to
FALSE_LABEL)"
<*S*>; "the body of the loop"
JUMP(LOOP); "Go back for another
iteration"
EQU(FALSE_LABEL); "Here when we're done,
i.e., the condition
went false"
ENDHOLD
END >
Figure 8.4 shows a flowchart for this.
BOX: What happens in a WHILE statement's code if the
BOX: WHILE-condition is FALSE upon first encounter?
8.6
Combinations Of Conditions
<CONDITION>s may be ANDed and ORed together. Here is the AND (&)
rule:
<CONDITION:a> & <CONDITION:b> ->
<CONDITION: <*A*>;
<*B*>; >
Its semantics looks deceptively simple. The AND of two conditions is
supposed to yield false if either condition, A or B, yields false.
The semantics in fact do that. If A is false, it, and hence we,
jump to FALSE_LABEL. Notice that when A is false, we know the answer,
and do not have to even look at B.
If A is true, then A falls thru to B. If B is also true, we fall all
the way thru. Notice that we fall thru (true) precisely when both A
and B fall thru (are true). Either A or B can jump to FALSE_LABEL,
and so if either A or B is false, we report false.
Figure 8.5 shows the flowchart for this.
While the "&" is easy, the OR ("!") and negation ("NOT") are not so
simple. The asymmetry between "&" and "!" is based on the fact that
we chose to jump when false, instead of say, jumping when true. Here
is the negation rule:
NOT <CONDITION:C> ->
<CONDITION:
BEGIN VAR TRUE_L=LABEL;
TRUE_L:= NEW;
HOLDING FALSE_LABEL:= TRUE_L;
DO <*C*>; "if false, jump to TRUE_L"
ENDHOLD
JUMP(FALSE_LABEL); "It fell thru, so we report
false by jumping"
EQU(TRUE_L); "TRUE_L for us represents falling
thru"
END >
This looks complicated because we must exchange the meanings of
jumping when false and not jumping when true. In this process, we
generate an unconditional JUMP instruction (to FALSE_LABEL).
Notice that within the HOLDING, we are setting up FALSE_LABEL for the
invocation of C, the condition we're supposed to negate. However,
outside the HOLDING, e.g., at the
JUMP( FALSE_LABEL );
FALSE_LABEL is the one that was passed down to us. In other words, the
invocation of C does not see the FALSE_LABEL passed down to us. It
sees what we locally call TRUE_L, and so C jumps to TRUE_L when false.
Of course, when C is false, we're supposed to turn that into a true,
that is, we are supposed to fall thru. We EQU TRUE_L to be the end
of our generated machine language program, so that upon arrival at
TRUE_L, we do indeed fall thru. That is, if C is false, it jumps
to TRUE_L, which is how we fall thru (reporting true).
If C is true, C falls thru to the next instruction, the
JUMP( FALSE_LABEL );
Since we are outside the HOLDING, FALSE_LABEL is where we are supposed
to jump if ~we are false. Thus, if C is true (falls thru), we report
false by jumping to the given FALSE_LABEL.
Here is the rule for ORing, the "!":
<CONDITION:A> ! <CONDITION:B> ->
<CONDITION:
BEGIN VAR TRUE_L=LABEL;
TRUE_L:= NEW;
HOLDING FALSE_LABEL:= NEW;
DO <*A*>; "Jump if false to our local
FALSE_LABEL (just below)"
JUMP(TRUE_L); "A was true: get all the way
out"
EQU(FALSE_LABEL);
"Here when A was false"
ENDHOLD
<*B*>; "A was false, so now B by itself represents our
final answer."
EQU(TRUE_L); "(All the way out. We fall thru to
report true)"
END >
Again, like the "NOT" rule, we issue an unconditional JUMP.
8.7
Other Conventions For <CONDITION> and <COMPARATOR>
The unconditional jumps in the "!" and "NOT" rules can in fact be
eliminated. This will result in faster running programs.
"&" required no extra jumps because of the asymmetry that
says "jump if false" instead of "jump if true". If we change our
assumptions associated with both <CONDITION> and <COMPARATOR>, we
can make "&" and "!" look similar, and in fact have all three rules
produce no unconditional jumps.
We introduce a total of three top-down context global variables to
replace the one FALSE_LABEL:
VAR FALSE_LABEL = LABEL ; "Jump here if false"
TRUE_LABEL = LABEL ; "Jump here if true"
FALL_THRU_IF_TRUE = BOOL; "The situation where falling
thru is okay"
Now, a <CONDITION> or a <COMPARATOR> has two places it can jump to,
TRUE_LABEL and FALSE_LABEL. However, we don't require that a jump
always occur. FALL_THRU_IF_TRUE dictates whether falling thru is
allowed as an option for reporting true or for reporting false.
If FALL_THRU_IF_TRUE is true, then a <CONDITION> can report true by
either falling thru ~or jumping to TRUE_LABEL. This choice is new.
However, a false can be reported only by jumping to FALSE_LABEL.
Similarly, if FALL_THRU_IF_TRUE holds a false, then a false condition
can be reported in either of two ways: It can fall thru or jump to
FALSE_LABEL. This time, the choice occurs for reporting false.
A true condition can be reported only one way, by jumping to
TRUE_LABEL.
Here is the "=" <COMPARATOR> rule written in terms of our 3-variable
conventions:
= -> <COMPARATOR:
IF FALL_THRU_IS_TRUE THEN
"Jump if false..."
JUMPNE(1,FALSE_LABEL);
ELSE "Jump if true..."
JUMPEQ(1,TRUE_LABEL); FI >
This comparator issues a conditional jump. Depending on whether
falling thru means thru or false, we jump out when we're not supposed
to fall thru.
Here is the updated "&" rule:
<CONDITION:a> & <CONDITION:b> ->
<CONDITION:
HOLDING TRUE_LABEL:= NEW;
FALL_THRU_IF_TRUE:= TRUE;
DO <*A*>; "Report false by jumping out to
FALSE_LABEL"
"Here if A fell thru"
EQU(TRUE_LABEL);
"Here if A is true (either way)."
ENDHOLD
"A is true, so B delivers the entire answer..."
<*B*>;
>
8.8
Exercises
Each of these exercises is more involved than usual.
1) Introduce a rule which allows parentheses "()" in <EXPR>s like we
had in Chapter 1. Include the specification of precedence, as well as
the semantics.
2) Write with semantics the rule:
> -> <COMPARATOR>
Use the latest, 3-variable conventions for the part-of-speech
<COMPARATOR>.
3) Write with semantics the rules:
<CONDITION> ! <CONDITION> -> <CONDITION>
NOT <CONDITION> -> <CONDITION>
Use the latest, 3-variable conventions for the <CONDITION> part-of-
speech.
4) We have omitted any precedence requirements among the "&", "!", and
"NOT" rules on <CONDITION>s. As things stand now, &'s and !'s can
occur in any order, and it is not necessarily true the &'s combine
before !'s. Multiple interpretations can arise.
Write the "&", "!", and "NOT" rules, without semantics, so that
negations ("NOT") occur first, followed by "&", followed by "!".
That is, let
<CONDITION[1]> denote conditions devoid of &'s and !'s
<CONDITION[2]> denote conditions devoid of !'s
<CONDITION[3]> denote any condition.
Also, add the rule
( <CONDITION> ) -> <CONDITION>
like we have with <EXPR>s. Include the specification of precedence,
as well as the semantics.
5) Write the machine (or assembly) language program that will be generated
by our grammar for the <EXPR>:
A*B
Assume that SPARE_MEMORY initially holds the address 100. Be sure
to include the allocation of bins for the variables A and B.
6) Do the same for A*B+C.
7) Do the same for A*B+C*D.
8) Write the machine (or assembly) language that will be generated for
the <STATEMENT>:
A := A*B + C*D
9) Do the same for the <STATEMENT>:
IF A < B THEN A:= A*B + C*D
ELSE B:= A FI
10) Do the same for the <STATEMENT>:
IF A < B & C < D THEN A:= B
ELSE B:= A FI
Use our original, 1-variable (FALSE_LABEL) assumptions for <COMPARATOR>
and <CONDITION>.
11) Do the same for the following <STATEMENT>. State whether you are
using the 1-variable or 3-variable conventions for <COMPARATOR> and
<CONDITION>:
IF A < B ! C < D THEN A:= B
ELSE B:= A FI
12) Do the same for the <STATEMENT>:
WHILE A < B & C < D DO A:= A+1
C:= C+1 END
13) Do the same for the <STATEMENT>:
WHILE A < B & C < D DO
IF A < C THEN A:= A+1
ELSE C:= C+1 FI
END
14) Introduce with semantics the rule for an IF-THEN without an ELSE:
IF <CONDITION> THEN <STATEMENT> FI ->
<STATEMENT>
15) Introduce with semantics the following rule:
<ID> ::= <BOP> <EXPR> -> <STATEMENT>
An example using this rule is:
A ::= + 1
This is to mean the same thing as:
A := A + 1
That is, the <ID> on the left must automatically be made to appear
as the first thing to the right of the "::=".
8.9
Using Registers Efficiently In <EXPR>s
So far, our <EXPR>s do all of their computing taking advantage of
only one register, register 1. Also, our present <EXPR>s generate
the following less-than-optimal machine language program for "A+B":
LOAD(1,B); " from the rule <ID> -> <EXPR>
This is the invocation of the righthand
<EXPR>. "
STORE(1,spare_memory); " from in the <EXPR>-<BOP>-<EXPR> rule"
LOAD(1,A); " from the rule <ID> -> <EXPR>
This is the invocation of the lefthand
<EXPR>. "
ADD(1,spare_memory); " from the + -> <BOP> rule
(This is the invocation of the OP)."
This indeed leaves the answer in register 1. A more efficient machine
language program of A+B is:
LOAD(1,A);
ADD(1,B);
The trouble so far has been our requirement that ~all answers go thru
register 1. In the former, less efficient program, we indeed force
B to be in register 1, and then because we also force A to come thru
register 1, we must first save register 1 in SPARE_MEMORY. In the
latter, efficient program, B never makes it into register 1.
Let's change our assumptions about the semantics of <EXPR>. Now, the
invocation of an <EXPR> no longer puts the answer into register 1.
Instead, it puts the answer anywhere it wants, and the invocation
~yields the address where the answer resides:
POS EXPR[8] : ADDRESS_EXPRESSION ;
"The semantics generate a machine language program.
The result of that invocation is an
ADDRESS_EXPRESSION, where to find the answer after
the machine language program executes.
This invocation may ~allocate ~registers.
Our result, the ADDRESS_EXPRESSION, may be a register.
If it is, then it is the user's responsibility to
deallocate that register."
The second paragraph above about the semantics of <EXPR> speaks of
~register ~allocation. That is, we may put intermediate results in
registers. For example, to compute
A*B - C*D
we may ~allocate registers 1 and 2, as we do in the following machine
language implementation:
LOAD(1,A);
MUL(1,B); "A*B is in register 1"
LOAD(2,C);
MUL(2,D); "C*D is in register 2"
SUB(1,2); "The answer is in register 1.
Register 2 is no longer needed."
Register 1 is ~allocated by A*B, and register 2 is ~allocated by
C*D. The overall
<EXPR> - <EXPR>
computes its results from registers 1 and 2. Once that's done,
register 2 is no longer needed. (Register 1 is still required because
it holds our answer). This "<EXPR> - <EXPR>" therefore ~deallocates
register 2, so anybody else is free to use it from now on.
8.9.1
Register Management Routines
We will assume we have the following useful, simple functions:
1) NEW_REGISTER -> INT Allocates a new register and
yields that register's address.
2) FREE_REGISTER( INT ); Frees a previously allocated
register (so that it may
be reused).
3) IN_REGISTER( ADDRESS_EXPRESSION ) -> BOOL
Does the given ADDRESS_
EXPRESSION denote a register?
4) NOT_IN_REGISTER( ADDRESS_EXPRESSION ) -> BOOL
The opposite of (3).
8.9.2
The ID And NUMBER Rules
Here are the <ID> and <NUMBER> rules for our new <EXPR>s:
<ID:i> -> <EXPR[1]: ADDRESS_OF_VARIABLE(i) >
<NUMBER:n> -> <EXPR[1]: ADDRESS_OF(n) >
The semantics in each <EXPR> denotes an ADDRESS_EXPRESSION. This is
required by our new <EXPR> assumptions.
The address in each case tells where the data reside. In the <ID>
rule, the address is that of the bin associated with the <ID>. The
<NUMBER> rule delivers the address of a bin where the number resides.
Neither of these two rules generates any machine language programs
(although they do issue LATER actions). We have thus saved the two
LOADs into register 1 that were required by our old assumptions.
The semantics in an <EXPR> may issue machine language instructions with
the notation:
DO issue machine language instructions
GIVE address_expression
Section 22.1.6 documents this DO-GIVE construct.
The culmination of an <EXPR>'s semantics is always an
ADDRESS_EXPRESSION.
(You can see another DO-GIVE construction in the definition of
ADDRESS_OF_VARIABLE, in the ELSE-clause. There, we are culminating
with a LABEL).
8.9.3
The <EXPR>-<BOP>-<EXPR> Rule
Let's update the other rules that contain <EXPR>s, to accomodate our
new <EXPR> assumptions. To do the <EXPR>-<BOP>-<EXPR> rule, let's
modify our <BOP> assumptions. We redeclare <BOP>:
POS BOP[8] : - ; "Generates a machine language program
that combines the values in the bins
whose addresses are in LEFT_OPERAND
and RIGHT_OPERAND. Leaves the
answer in the bin whose address is in
LEFT_OPERAND."
We declare BOP's top-down global variables:
VAR LEFT_OPERAND = INT; " Always a register! "
RIGHT_OPERAND = ADDRESS_EXPRESSION; " Anywhere "
Let's re-introduce our two <BOP> rules:
+ -> <BOP[4]: ADD( LEFT_OPERAND, RIGHT_OPERAND ); >
* -> <BOP[3]: MUL( LEFT_OPERAND, RIGHT_OPERAND ); >
Indeed, each <BOP> operates on LEFT_OPERAND and RIGHT_OPERAND and
leaves the answer in LEFT_OPERAND.
Notice in the following rendition of the <EXPR>-<BOP>-<EXPR>
rule that we set LEFT_OPERAND and RIGHT_OPERAND prior to invoking
the <BOP>. Here is the updated <EXPR>-<BOP>-<EXPR> rule:
<EXPR[i]:X> <BOP[j]:Y> <EXPR[k]:Z> ->
IF i =< j & k < j THEN
<EXPR[j]:
BEGIN VAR LEFT,RIGHT = ADDRESS_EXPRESSION; REG=INT;
DO LEFT:= <*X*>; "Invoke X. This may generate machine
language code. That invocation yields
the address where the answer will be.
LEFT thus tells where the left
<EXPR>'s value is"
RIGHT:= <*Z*>; "Like LEFT, but for the righthand side"
"See note below"
"Make sure LEFT is in a register, as is required
ultimately by the <BOP>... "
IF NOT_IN_REGISTER( LEFT ) THEN
REG:= NEW_REGISTER; "Allocate a new register"
LOAD( REG, LEFT ); "Get LEFT into that
register"
LEFT:= REG; "From now on, LEFT denotes
this register."
FI
" LEFT now denotes a register. "
HOLDING LEFT_OPERAND:= LEFT;
RIGHT_OPERAND:= RIGHT;
DO <*Y*>; "Invoke the <BOP>, e.g., generate
ADD( LEFT, RIGHT );"
ENDHOLD
" The answer is in LEFT. "
" Free up the register that RIGHT came in... "
IF IN_REGISTER( RIGHT ) THEN
FREE_REGISTER( RIGHT ); FI
GIVE LEFT "Where our answer is."
END
> FI
Again, we start by generating machine language for each of the two
given <EXPR>s, X and Z. We hold in LEFT and RIGHT the addresses of
where those two <EXPR>s' values reside.
We arrange for LEFT to be in a register. If LEFT doesn't happen
already to be a register, we allocate a register (NEW_REGISTER) and
issue a LOAD instruction to get LEFT in there. We then set LEFT to
be the register, so that for the rest of this program, we can go on
assuming that LEFT denotes the address containing the lefthand
<EXPR>'s value.
Then we invoke the <BOP> (Y), having set LEFT_OPERAND and RIGHT_OPERAND
to the addresses containing the values of the lefthand and righthand
<EXPR>s. The code generated by the <BOP> leaves our final answer in
(register) LEFT.
Finally, we deallocate any registers allocated by our two given
<EXPR>s. We free up RIGHT if RIGHT denotes a register. (If RIGHT
is a register, then we know that the righthand <EXPR> allocated that
register, via NEW_REGISTER). The contents of that register will no
longer be required.
We don't free up LEFT, even if it was originally a register. That
register's contents are very important, as that register holds our
overall answer. Note how we deliver LEFT as our result, the
address (register) where our answer lies.
The note made in this rule, following the first two assignments into
LEFT and RIGHT, follows:
NOTE: If LEFT denotes a register, then during X's
invocation, that register was allocated. This keeps
~Z's machine language program from modifying that
register!
This is the bulk of a proof that the second <EXPR>'s machine language
code ~won't change the contents of LEFT. (We don't want the two
<EXPR>s to interfere with one another).
BOX: How do we know when we're done with a register, so
BOX: that we can free it?
8.9.4
The Assignment Statement
Here is the updated assignment rule:
<ID:i> := <EXPR:e> ->
<STATEMENT:
BEGIN VAR ANSWER = ADDRESS_EXPRESSION;
ANSWER:= <*e*>; "Compute the righthand <EXPR>.
ANSWER denotes where the answer is"
IF IN_REGISTER( ANSWER ) THEN
STORE( ANSWER , ADDRESS_OF_VARIABLE(i) );
FREE_REGISTER( ANSWER );
ELSE " Use register 1... "
LOAD( 1 , ANSWER );
STORE(1 , ADDRESS_OF_VARIABLE(i) ); FI
END >
If the <EXPR>'s value resides in a register, the THEN clause is taken,
costing us just one instruction. Notice that we free ANSWER if it is
a register. We're done with it. This action is required by our new
<EXPR> assumptions.
The righthand <EXPR> could be as simple as a single variable's name.
That is, ANSWER might denote a non-register, the bin in which the
variable's value resides. The righthand <EXPR> in:
A := B
is simply B. Since ANSWER (B) is not a register, we take the ELSE
clause, and hence generate the program:
LOAD(1,B);
STORE(1,A);
8.9.5
The <COMPARATOR> Rule
Here is the updated <COMPARATOR> rule (which uses <EXPR>s):
<EXPR:X> <COMPARATOR:C> <EXPR:Y> ->
<CONDITION:
BEGIN VAR LEFT,RIGHT = ADDRESS_EXPRESSION;
LEFT:= <*X*>;
RIGHT:= <*Z*>;
LOAD(1,LEFT);
SUB(1,RIGHT); " The difference in register 1. "
<*C*>; " Generate the conditional branch on
register 1. "
IF IN_REGISTER(LEFT) THEN FREE_REGISTER(LEFT); FI
IF IN_REGISTER(RIGHT) THEN FREE_REGISTER(RIGHT); FI
END >
As usual, we always ~free a register delivered by an <EXPR> as soon
as it is no longer needed. In this rule, if either LEFT or RIGHT
is a register, we free it.
8.9.6
Running Out Of Registers
We now update the <EXPR>-<BOP>-<EXPR> rule so as to be completely
bulletproof. Let's assume that NEW_REGISTER always yields an unused
address, even if that address is not an actual register. The new
things appear in lower case:
<EXPR[I]:X> <BOP[J]:Y> <EXPR[K]:Z> ->
IF I =< J & K < J THEN
<EXPR[J]:
BEGIN VAR LEFT,RIGHT = ADDRESS_EXPRESSION;
REG = INT;
ran_out_of_registers = bool;
DO LEFT:= <*X*>;
RIGHT:= <*Z*>;
ran_out_of_registers:= false; "(optimist)"
IF NOT_IN_REGISTER( LEFT ) THEN
REG:= NEW_REGISTER;
if reg >= 16 then
ran_out_of_registers:= true;
"Use register 1..."
load( 1 , left );
left:= 1;
else
LOAD( REG, LEFT );
LEFT:= REG; fi
FI
"LEFT denotes a register, possibly register 1"
HOLDING LEFT_OPERAND:= LEFT;
RIGHT_OPERAND:=RIGHT;
DO <*Y*>; "(the <BOP>)"
ENDHOLD
if ran_out_of_registers then
store( 1 , reg ); "Put answer into REG,
(a non-register), and
cease using register1"
left:= reg; "where our answer is"
fi
free_if_register( right );
"This new function frees the address
RIGHT if it is something that was
delivered by NEW_REGISTER, even a non-
register, like the address 17"
GIVE LEFT
END >
NOTE: This latest method uses register 1 without worrying about
saving its old contents. To make this method work, we must
forbid NEW_REGISTER from ever delivering register 1. That's
easy, by modifying NEW_REGISTER's table of register usage:
Mark register 1 as always being in use.
Register 1 has been used differently from any other register. We
freely damage its contents. This usage implies that register 1 cannot
be depended upon to remember a value, unless we use it immediately.
We used it immediately after LOADing it, in the <BOP>. Notice also
that we don't deliver register 1 as the answer's address. We deliver
the address in REG, which in this case is not a register.
The fact that we never deliver register 1 as our answer's address
saves us from another potential disaster. Notice that we freely
issued:
LOAD( 1 , LEFT );
This instruction would kill us if RIGHT's answer happened to be in
register 1. But fortunately, RIGHT's (or LEFT's) answer's address
is never register 1.
8.9.7
A Practical Matter
The rules of grammar we've shown aren't quite acceptable as they
stand. They won't make it thru the ICL compiler.
In the <EXPR>-<BOP>-<EXPR> rule, we sometimes write:
LEFT:= REG ;
LEFT has been declared to be an ADDRESS_EXPRESSION, while REG is an
INT. This assignment, shown in terms of its types:
address_expression := integer ;
is legal, because we defined the type ADDRESS_EXPRESSION to be ~either
an integer or a label. An integer is thus automatically convertible
into an ADDRESS_EXPRESSION. In the domain of datatypes, we have the
rule:
integer -> address_expression
However, we also made the assignment:
HOLDING LEFT_OPERAND := LEFT ;
...
LEFT_OPERAND, which has to denote a register, was declared to be an
INTeger. LEFT, in contrast, is a fully general ADDRESS_EXPRESSION.
This assignment appears as
integer := address_expression ;
This other direction of assignment is technically
illegal. After all, as far as ICL knows, LEFT, an ADDRESS_EXPRESSION,
might be a LABEL and not an INT!
However, we can easily ~prove that LEFT will denote a register. Just
look at the comment before the HOLDING. (The IF-statement before
that comment assures that LEFT is put into a register if LEFT isn't
already a register).
Let's introduce a conversion from ADDRESS_EXPRESSION to INT:
THE_REGISTER ( address_expression ) -> int
We use this function as in:
HOLDING LEFT_OPERAND := THE_REGISTER( LEFT ) ;
This legalizes our program. LEFT, an address_expression, becomes an
INT, and can thus be given to the INT variable LEFT_OPERAND.
Let's define this THE_REGISTER function. ICL won't let us go
unchecked, because as far as ICL knows, our proof might be flawed.
We define THE_REGISTER as follows:
DEFINE THE_REGISTER( X: ADDRESS_EXPRESSION ) = INT:
CASE X OF
ABSOLUTE: X "X is an INT here"
LABEL: "X is a LABEL here"
DO HELP;
GIVE 0 "(any integer)"
ENDCASE
ENDDEFN
(The CASE statement is shown in 23.5.2. We used such a
CASE statement like this, upon an ADDRESS_EXPRESSION, back in Section
7.4.1).
ICL requires that we accomodate the possibility of failure, which we
do by including the LABEL clause. THE_REGISTER cries HELP if X is
not an INTeger (a register).
We, on the other hand, have a proof that failure will never occur.
(We have proven that LEFT in fact denotes a register. The HELP is
in there because ICL doesn't know about our proof).
8.9.7.1
Implicit Conversion
We have another option in ICL. We've seen the rule
int -> address_expression
This rule exists because an ADDRESS_EXPRESSION can be either an INT
or a LABEL. That is, any INT can be seen also as an
ADDRESS_EXPRESSION.
THE_REGISTER implements the other direction:
THE_REGISTER( address_expression ) -> int
This turns an ADDRESS_EXPRESSION into an INT, and can possibly cry
HELP.
We can actually introduce the following rule, in which
"THE_REGISTER" never appears:
address_expression -> int
This rule would let us leave unchanged our program sepcifications!
The required conversion would occur implicitly. Consider:
HOLDING LEFT_OPERAND:= LEFT ;
Since the types don't match, the system would apply automatically
the conversion.
We introduce this coercion rule by writing:
LET X: ADDRESS_EXPRESSION BECOME INT BY
THE_REGISTER( X )
ENDDEFN
This is a valid ICL declaration (as shown in Section 21.4.4.5 or
22.4.3.3). It declares a coercion, a function without a name. This
function gets called implicitly, in ICL's effort to make sense in the
domain of datatypes.
The coercion would let us ignore other similar oversights, like in
IF IN_REGISTER( ADDRESS ) THEN STORE( ANSWER, ... ); FI
ANSWER is an ADDRESS_EXPRESSION, but STORE demands an INT. The
coercion will provide that translation implicitly.
Again, we have a proof that failure is not possible. (Look at the
IF-clause). We'll address proofs of program correctness again
in chapter 10.
Coercions, like functions, can be given limited visibility. We might
choose to expose this coercion for only the programs in this section.
8.10
Exercises
1) Write the machine (or assembly) language program generated by our
updated rules for:
a) A*B
b) A*B + C
c) A*B + C*D
d) A*B + C*D + E*F
e) A + C*D + E*F
f) A*B + (C*D + (E*F + G*H)))
g) A - B*C
h) B*C - A
2) Write an <EXPR> that will require at least eight registers by our
methods.