CHAPTER 18
FUNCTIONS AND THEIR IMPLEMENTATION
As time goes on, we intend to program the computer with more and more
complex jobs. One of the most effective ideas in computer science is
a method of building complex programs, known as divide and
conquer. The given large problem can often be broken down into several
smaller problems. Each smaller problem can itself be broken into
several even smaller problems. Eventually, the problem size reduces
so as to be programmed in a straight-forward manner.
18.1
Functions
Each small problem can be packaged as a ~function or ~procedure.
We will use these two terms interchangeably. A function or procedure
is a program along with a name. Once you program the
small problems this way, you have a set of functions, named programs.
For example, suppose you had to compute the ~square ~root of the
~absolute ~value of a given number. The easiest approach is to break
the problem into two smaller problems, the computation of ~square ~root
and the separate computation of ~absolute ~value. Once one figures out
the arithmetic expressions for each of those two functions, one can
package each into a function. Here are the definitions for these two
functions:
DEFINE SQUARE_ROOT( X: REAL ) = REAL:
some expression using X
ENDDEFN
DEFINE ABSOLUTE_VALUE( X: REAL ) = REAL:
some expression using X
ENDDEFN
The DEFINE line in each tells the name of the function, and what
parameters it takes in (one REAL) and what it delivers (a REAL). Each
parameter is assigned a name, X in both examples. That name X becomes
a variable, which contains the input parameter.
Having in hand our two functions, we can compute the original problem.
Here is one way:
Y:= the given number ;
Y:= ABSOLUTE_VALUE( Y ) ;
Y:= SQUARE_ROOT( Y ) ;
We call a function by writing its name followed by its parameter(s)
enclosed in parentheses. Here, we call ABSOLUTE_VALUE first and pass
that result into SQUARE_ROOT. Y holds the answer in the end. We
could write this solution more briefly as:
Y:= SQUARE_ROOT( ABSOLUTE_VALUE( the given number ) ) ;
Like before, the value resulting from ABSOLUTE_VALUE is fed directly
into SQUARE_ROOT.
This chapter introduces ways to implement such function definitions
and subsequent calls to the functions.
18.1.1
Parameters
The line
DEFINE ABSOLUTE_VALUE( X: REAL ) = REAL:
indicates the name of the function and also what parameters it takes
in, one REAL called X, and what it delivers, a REAL. Recall that REAL
denotes numbers including fractions. The ~body of this function
follows the DEFINE line. It can use the variable X. Wherever X
appears in the body, the given REAL parameter will be substituted.
For example, the body is included in:
DEFINE ABSOLUTE_VALUE( X: REAL ) = REAL:
IF X < 0 THEN -X ELSE X FI
ENDDEFN
If someone writes:
ABSOLUTE_VALUE( -3 )
which ~calls the function, the body is executed where X is -3:
IF -3 < 0 THEN -(-3) ELSE -3 FI
This IF statement sees that indeed -3 is less than 0, and so it
delivers its THEN clause, the -(-3), which computes to 3. This 3 is
the result of our call to ABSOLUTE_VALUE.
Great flexibility is achieved by defining functions with parameters.
Each call to a function can take advantage of the function's different
behaviors when given different input value(s) for the parameter(s).
One program, although written only once, can be called many times.
A function definition buries details. All the detail in the
function's body becomes invisible from all points of view outside the
function definition. When we call ABSOLUTE_VALUE, the details inside
its definition are invisible.
The only thing in common between the insides and the outsides of a
function are its parameters, e.g., the REAL coming in and the REAL
coming out of ABSOLUTE_VALUE. This isolation between insides and
outsides reduces drastically the potential for bugs. Nobody outside
of ABSOLUTE_VALUE's definition can damage ABSOLUTE_VALUE's correctness.
The details of ABSOLUTE_VALUE and the details of SQUARE_ROOT remain
permanently separated.
(We wonder in Section 19.10 what happens if the function itself tries
to put a value into the parameter).
18.1.2
Local Variables
In programming the function body, one may need some extra variables.
For example, let's define the function POWER_OF_2 which takes in a
number and yields the smallest power of two which is at least as
large as the given number:
DEFINE POWER_OF_2( X: REAL ) = REAL:
DO Y:= 0;
WHILE 2^Y < X; DO Y:= Y+1 ; END
GIVE 2^Y
ENDDEFN
This function uses a variable Y, to hold possible exponents. It sets
Y to zero initially. While 2-to-the-Y'th power (2^Y) is less than X,
we increment Y. We expect that eventually 2^Y will be as large as X.
When that loop terminates, 2^Y is greater than or equal to X,
and we give 2^Y as the answer.
The variable Y deserves some attention. We intend that its use be
limited to this function alone. If this variable Y were used somewhere
as well, our use of the variable could surprise that other point of
view, as we do put new values into Y.
To assure that the variable Y is ours alone, we declare Y to be a
~local ~variable. We enclose our program that uses Y in the
BEGIN...END construct. Following the BEGIN, we declare Y to be a
variable as in:
VAR Y = INT ;
This makes Y a local variable. Here is the program:
BEGIN
VAR Y = INT ;
DO Y:= 0;
WHILE 2^Y < X; DO Y:= Y+1 ; END
GIVE 2^Y
END
Inside the BEGIN...END, we can use Y freely, and also be assured that
nowhere outside the BEGIN...END will Y be visible. The BEGIN...END
thus defines the ~scope of the VAR (variable) declarations. This
establishes us, the insides of the BEGIN...END, as the sole owners of
Y. The airtight privacy assured by the BEGIN...END makes it possible
to write large systems without interference among its variables.
BEGIN...ENDs almost always appear within function definitions, as the
outermost construct. Our POWER_OF_2 function needs the variable Y,
so we enclose the function body with a BEGIN...END:
DEFINE POWER_OF_2( X: REAL ) = REAL:
BEGIN
VAR Y = INT ;
DO Y:= 0;
WHILE 2^Y < X; DO Y:= Y+1 ; END
GIVE Y
END
ENDDEFN
The function body has access to the parameter X and also the local
variable Y. Because Y is a local variable, the sole entities shared
between the insides of this function and anyone who calls this function
is the input parameter and the output result, and not Y.
BOX: What are parameters?
BOX:
BOX: What are local variables?
BOX:
BOX: Thru which of these two does a function communicate
BOX: with its caller?
18.2
The Implementation Of Function Definitions And Calls
18.2.1
Pushes And Pops - The Stack
A simple and very useful method of using memory is called the ~stack.
A stack is a block of memory (figure 18.1). You can ~push a new value
onto the stack, and ~pop a value off the stack.
The push operator appends a new bin at the end of the stack (figure
18.1(b)), where it stores the new value, push's operand. For each
push, there is ultimately a corresponding pop. Pop removes the last
bin from the end of the stack and yields as its value the contents of
that last bin (figure 18.1(c)).
The last bin in the stack is referenced by what is called the ~stack
~pointer, or ~SP for short. The stack pointer always contains the
address of the bin containing the most recently pushed data. We keep
that pointer in a register called ~SP.
In assembly language, as introduced in Chapters 6 and 7 and Sections
9.5 and 9.8, the push operator is implemented as follows. We increment
the stack pointer (SP) so as to point to the next available bin. We
then store the new data into that bin:
ADD( SP , ADDRESS_OF(1) ); "Increment SP"
STORE( ~new ~value , 0 \OFF_OF SP );
"Store the given value into the
bin whose address is in SP"
The ~new ~value denotes the value that is pushed onto the stack.
(The ADD and STORE instructions were introduced in Section 6.1.4 and
the ADDRESS_OF function appears in Section 7.7.4).
The pop operator undoes the push operator. It acquires the value on
the top of the stack, to become pop's result, and then decrements SP:
LOAD( 1 , 0 \OFF_OF SP ); "Get value on stack top into
register 1"
SUB( SP , ADDRESS_OF(1) ); "Decrement SP"
The result appears in register 1 (due to the LOAD).
The pop operator fetches what's on the end of the stack, that which
was left by the most recent push operation. After the pop, the stack
is in exactly the same shape it was prior to the preceding,
corresponding push.
Our stacks push by using a bin with a ~higher (not lower) address,
(unlike the VAX). Also, many machines can perform a push or a pop as
a single instruction.
BOX: What does ~push do?
BOX:
BOX: What does ~pop do?
BOX:
BOX: Are pushes always related to corresponding pops?
BOX:
BOX: What might happen if someone forgets to ~pop a pushed
BOX: value?
18.2.2
Calling Functions
We first consider the calling of functions ignoring both parameters
and local variables. We will introduce these shortly.
Machines generally have a special instruction for calling functions,
named CALL. It takes two operands, as follows:
CALL( register , address of program we are calling ) ;
This instruction is just like a JUMP in that it jumps to the address
given as the second operand. That address is presumed to be the
beginning of the called function's machine language program.
Besides jumping, the CALL instruction puts the ~return ~address into
the register. We call that register the ~return register. The return
address is always the address of the bin following the bin containing
the CALL instruction (figure 18.2).
The called function, when it is done, will JUMP back to the address
contained in that register. This will cause program execution to
resume at the bin following the CALL instruction. That is precisely
where we want program execution to resume after the called function is
finished.
Let's agree to use always the same register in all CALL instructions.
Let RETURN denote that one register. We call a function by executing:
CALL( RETURN , address of program );
To respond to the CALL, we will introduce at the end of each function
body an extra instruction to cause the function to return:
JUMP( 0 \OFF_OF RETURN );
This JUMP instruction jumps to the address in RETURN, which was set up
by the CALL instructions. (The address expression "0 \OFF_OF RETURN"
denotes the return address, the address in RETURN). Thus, the last
thing a function does is to jump back to the instruction following
the CALL instruction that got this function started.
18.2.2.1
Saving RETURN
We would be all done if it weren't for the fact that one function may
call another function. Consider the following function definition:
DEFINE FUNCTION1:
...
FUNCTION2;
...
ENDDEFN
FUNCTION1 calls FUNCTION2.
In assembly language, FUNCTION1 appears as:
FUNCTION1: ...
CALL( RETURN , FUNCTION2 ); "the call to FUNCTION2"
... "Oops!"
...
JUMP( 0 \OFF_OF RETURN ); "the return from FUNCTION1"
Recall that upon entry to FUNCTION1, RETURN points to where FUNCTION1
should return. Unfortunately, FUNCTION1's call to FUNCTION2 damages
the contents of register RETURN. After executing this CALL, and after
FUNCTION2 returns, RETURN will naturally contain the address of the
bin following this CALL instruction. That ~isn't the address where we,
FUNCTION1, are supposed to return.
In fact, upon executing our JUMP to return, we will not go back to from
where FUNCTION1 was called, but instead we will go back to the bin
following the most recently executed CALL instruction. That most
recently executed CALL instruction is our call to FUNCTION2. Thus,
upon executing our final JUMP instruction, we will go to the point in
our program labelled "Oops!". We will never return. We will
reexecute the tail of this program forever!
Clearly, if we call another function, we must protect the present
contents of the register RETURN. Immediately upon entry to a function,
we will push RETURN onto the stack. Also, just prior to the return
JUMP instruction, we will pop RETURN off the stack. Because we save
and then restore RETURN, in the interim RETURN can be used for other
purposes, such as calling other functions.
We now write FUNCTION1 as:
FUNCTION1: "push RETURN onto the top of the stack:"
ADD( SP , ADDRESS_OF(1) );
STORE( RETURN, 0 \OFF_OF SP );
"The function body:"
...
CALL( RETURN, FUNCTION2 ); "(a call to another
function)"
...
"Our correct return:"
"Pop the stack so as to restore our return address
into RETURN:"
LOAD( RETURN , 0 \OFF_OF SP );
SUB( SP , ADDRESS_OF(1) );
" Return now: "
JUMP( 0 \OFF_OF RETURN );
We save our RETURN address on the stack, and then restore it at the
end, when we need to return. This saving and restoring allows us to
make calls to other functions, and still remember our particular
return address.
FUNCTION1 can even call itself safely. For example, substitute the
appearence of FUNCTION2 by FUNCTION1. When we call ourself, this
second incarnation of ourself saves its return address on the stack,
at a ~higher ~bin than the one used by our first incarnation. A
higher bin is used because we incremented SP each time we enter the
function. Thus all of our incarnations each saves its return address
in a different bin, and hence all return correctly to where each had
been called from.
We now establish the convention that upon return from a called
function, ~the ~stack ~appears ~unchanged. We've made this so by
popping the stack upon return, to undo the push at the beginning.
This stack invariant is in fact necessary. We expect SP to ~remain
~unchanged upon each function call, because we restore RETURN from an
address expression involving SP.
BOX: Why must RETURN be saved?
BOX:
BOX: Why can one function call itself recursively and
BOX: still be able to return correctly in each instance?
18.2.3
Passing Parameters
We pass a parameter to a function by pushing the value onto the stack
~prior to calling the function.
For example, we call POWER_OF_2, which takes one parameter, by:
"Push the parameter onto the stack:"
ADD( SP , ADDRESS_OF(1) );
STORE( parameter_value , 0 \OFF_OF SP );
"The call:"
CALL( RETURN , POWER_OF_2 );
The first thing that POWER_OF_2 does, of course, is to save RETURN on
the stack:
POWER_OF_2: ADD( SP, ADDRESS_OF(1) );
STORE( RETURN , 0 \OFF_OF SP );
"the body:"
...
Upon finally entering "the body", the stack looks like figure 18.3. We
have pushed onto the stack both the parameter and our return address.
At this point, the parameter resides at "-1 \OFF_OF SP" and the return
address is in "0 \OFF_OF SP".
In our definition of POWER_OF_2, we specified that X will be the name
by which we will refer to the parameter. (See how X appeared on the
DEFINE line). Let's associate with X the address expression
"-1 \OFF_OF SP", so that wherever X appears in the function's body,
we substitute X with this address expression.
Thus, we translate:
Y:= X*X ;
into:
LOAD( 1, -1 \OFF_OF SP ); "Get X into register 1"
MUL( 1, -1 \OFF_OF SP ); "X*X in register 1"
STORE( 1, Y );
18.2.3.1
Returning Pops The Parameters Off The Stack
Upon return, POWER_OF_2, like any function, must restore its return
address by popping it off the stack. Not only must it pop the return
address, it must also pop the parameter off the stack. This is
necessary so as to leave the stack appearing unchanged.
Before the full call, the parameter and the return address were ~not
on the stack.
Thus, the entirety of POWER_OF_2 looks like the following. Recall
that upon entry here, the one parameter has already been pushed onto
the stack.
POWER_OF_2: "push RETURN on to the stack:"
ADD( SP , ADDRESS_OF(1) );
STORE( RETURN, 0 \OFF_OF SP );
"Now both the parameter and the return address are on
the stack."
"the body:"
...
"the return:"
LOAD( RETURN, 0 \OFF_OF SP ); "Grab RETURN off stack"
SUB( SP , ADDRESS_OF( 2 ) ); "Pop ~both the return
address and parameter
off the stack"
JUMP( 0 \OFF_OF RETURN );
Notice that we decrement SP by 2 at the end. This effectively pops
off both the return address and the parameter.
Our conventions used so far are summarized:
1) Upon calling a function:
a) Push the parameter(s) onto the stack
b) CALL the function, leaving the return address in
register RETURN.
2) Upon entry into a function:
a) Push RETURN onto the stack
3) Upon exit from the function:
a) Pop RETURN off the stack
b) Decrement SP (the stack pointer) so as to effectively
pop off all the parameters as well.
Indeed, each full call, including parameter passing, leaves the stack
appearing unchanged upon return.
18.2.4
Multiple Parameters
We pass multiple parameters by pushing each one onto the stack. The
last parameter pushed is accessed within the function via the address
expression "-1 \OFF_OF SP", like before. The second to last parameter
resides at "-2 \OFF_OF SP". In general, the K'th to the last parameter
resides at "-K \OFF_OF SP". See figure 18.4.
As before, the called function associates each of these address
expressions with the name of the parameter. For example, the following
function definition makes the association of address expressions to
its parameters:
DEFINE F( A,B,C: INT D: REAL ):
"D now denotes -1 \OFF_OF SP"
"C now denotes -2 \OFF_OF SP"
"B now denotes -3 \OFF_OF SP"
"A now denotes -4 \OFF_OF SP"
...
ENDDEFN
Upon leaving, this function restores RETURN and then pops the four
parameters off the stack:
LOAD( RETURN , 0 \OFF_OF SP );
SUB( SP , ADDRESS_OF( 5 ) );
Figure 18.4 shows the stack after entry and having pushed RETURN onto
the stack. To restore the old SP, we decrement SP by 5. We always
decrement SP by 1 plus the number of parameters, so as to effectively
pop all the parameters and the one return address off the stack.
18.2.5
The Frame Pointer (FP)
Our function calling method is nearly correct. Consider the
following function call to G within the body of F:
DEFINE F( A,B,C: INT D: REAL ):
"D now denotes -1 \OFF_OF SP"
"C now denotes -2 \OFF_OF SP"
"B now denotes -3 \OFF_OF SP"
"A now denotes -4 \OFF_OF SP"
G(A,B);
ENDDEFN
As we are about to call G, the stack looks like figure 18.5(a). Let's
call G. We first push the first parameter, A, onto the stack. Figure
18.5(b) shows the status. Notice how "-1 \OFF_OF SP" has moved up one
bin. This is natural since we've incremented SP. Unfortunately,
having pushed A already onto the stack, SP has moved, and so the
address expression "-3 \OFF_OF SP" no longer denotes B, but C instead
(figure 18.5(b))!
When we are in the middle of pushing parameters onto the stack, SP
moves, which throws off our address expression assignments.
One way to solve this problem is to introduce another register
different from SP which "freezes" the value in SP. We call that
register FP, short for "frozen pointer" or "frame pointer".
Let's have the beginning of all our functions set FP to SP first thing:
F: ADD( SP, ADDRESS_OF(1) ); "Push RETURN as before"
STORE( RETURN, 0 \OFF_OF SP );
LOAD( FP, SP ); "New: set FP to SP"
"the function body:"
...
Now throughout this functions body, let's use FP instead of SP. We
associate with the K'th parameter the address expression:
K-5 \OFF_OF FP instead of K-5 \OFF_OF SP
The first parameter (K=1) resides at "-4 \OFF_OF FP". Figure 18.6
is an updated 18.5. Part (a) shows the stack after F is called and its
initial program fragment is done executing. The only thing new is the
introduction of FP. Part (b) shows us in the middle of calling G. SP
has been incremented, but that no longer hurts us. Upon pushing the
second parameter, B, the address expression "-3 \OFF_OF FP" for B still
denotes F's input parameter B.
18.2.5.1
Saving FP
Since we will be using FP to reference our parameters, we expect FP
to remain unchanged throughout our function body. That is, even after
we call G and G returns, we expect FP to be what it was.
Unfortunately, as we've set things up so far, FP will be changed by
the first function we call. (Recall, that function's initial program
fragment overwrites FP with a new value from SP. Upon return from
that function call, FP is no longer what we expect it to be).
We must therefore have each function save and later restore the value
in FP, so that calls leave FP unchanged.
We now save on the stack not only RETURN, but FP as well. Figure
18.7(a) shows our desired situation after entry into our function,
having executed our initial program fragment. The following is our new
function entry and exit:
F: ADD( SP , ADDRESS_OF(1) ); "Push RETURN"
STORE( RETURN , 0 \OFF_OF SP );
ADD( SP , ADDRESS_OF(1) ); "Push FP"
STORE( FP, 0 \OFF_OF SP );
LOAD( FP, SP ); "Set our FP"
"function body:"
...
"Return process:"
LOAD( RETURN, -1 \OFF_OF FP ); "Restore RETURN"
LOAD( FP, 0 \OFF_OF FP ); "Restore FP"
SUB( SP , ADDRESS_OF( 2 + number_of_arguments ) );
"Restore SP to status before
our being called"
JUMP( 0 \OFF_OF RETURN );
Notice that we decrement SP by two plus the number of arguments in this
function. The two takes off the storage used for RETURN and old FP.
The rest removes the parameters from the stack.
As figure 18.7(a) shows, we must now assign slightly different address
expressions to our parameters:
Parameter K is now K-6 \OFF_OF FP
We now have a correct function calling mechanism.
BOX: What kind of address expression is associated with
BOX: parameter variables?
BOX:
BOX: Why is FP necessary (assuming that pushes and pops can
BOX: occur)?
BOX:
BOX: Upon function exit, how far is SP decremented?
18.2.6
Implementing Local Variables
Now we are ready to introduce local variables. Consider the function:
DEFINE F( A,B,C: INT D: REAL ):
BEGIN VAR X,Y,Z = REAL;
"body"
END
ENDDEFN
Figure 18.7(a) shows how F's initial program fragment leaves the stack.
This accounts for parameters.
Figure 18.8(a) shows how we would like it to be, so as to provide
storage for our three local variables X, Y, and Z. We modify once
more F's initial and final program fragments:
F: "unchanged..."
ADD( SP , ADDRESS_OF(1) ); "Push RETURN"
STORE( RETURN, 0 \OFF_OF SP );
ADD( SP , ADDRESS_OF(1) ); "Push FP"
STORE( FP, 0 \OFF_OF SP );
LOAD( FP, SP ); " Set FP for us "
"new..."
ADD( SP, ADDRESS_OF( number_of_locals ) );
" Allocate space for locals"
"function body:"
...
"Return process:"
LOAD( RETURN , -1 \OFF_OF FP ); " Restore RETURN "
LOAD( FP , 0 \OFF_OF FP ); " Restore FP "
SUB( SP, ADDRESS_OF( 2 + number_of_locals
+ number_of_arguments ) );
JUMP( 0 \OFF_OF RETURN );
The initial program fragment has a new ADD, which increments SP by
the number of locals, thus achieving figure 18.8(a). The final program
fragment is changed so as to decrement SP enough to deallocate the
locals, the parameters, and the two bins that hold RETURN and old FP.
18.2.6.1
A More Efficient Rendition
For purposes of efficiency, let's rewrite the initial program fragment
nearly equivalently as follows:
F: STORE( RETURN , 1 \OFF_OF SP ); "Save RETURN"
STORE( FP , 2 \OFF_OF SP ); "Save FP"
LOAD( FP, SP ); "Set FP for us"
ADD( SP , ADDRESS_OF( 2 + number_of_locals ); "Set SP"
"body:"
...
This more efficient program fragment removes the first two ADD
instructions, and leaves things as shown in figure 18.9.
As indicated in the figure:
the K'th parameter can be accessed by K-4 \OFF_OF FP
and
the I'th local can be accessed by I+2 \OFF_OF FP
The updated final program fragment is:
"Return process:"
LOAD( RETURN , 1 \OFF_OF FP ); " Restore RETURN "
LOAD( FP , 2 \OFF_OF FP ); " Restore FP "
SUB( SP, ADDRESS_OF( 2 + number_of_locals
+ number_of_arguments ) );
JUMP( 0 \OFF_OF RETURN );
We now have a function calling mechanism which accounts for locals and
parameters.
BOX: What kind of address expression is associated with
BOX: local variables?
BOX:
BOX: Why is it that a function can call itself recursively,
BOX: and still have local variables that don't interfere
BOX: with one another thru the recursive call?
BOX:
BOX: What two values, besides parameters and locals, are
BOX: saved on the stack?
BOX:
BOX: Does this calling convention have to be implemented in
BOX: software (as we have done) or can it be done in
BOX: hardware? What would be the advantages and
BOX: disadvantages?
18.2.7
Other Conventions For Parameters And Locals
Section 19.9 introduces another way to implement parameters and locals.