CHAPTER 10
MODIFICATION
IN
ASSOCIATIVE MEMORY
Since you can "write" into memory, it is possible to ~modify existing
data. We introduce a model of memory that accounts for modifications
via two distinct means. Modifications can apply to ~variables or to
~objects. The distinction is very effective. In a programming
language that supports this model, like ICL, proofs of program
correctness become possible, and the task of debugging eases greatly.
Modifying variables is by far the most common intent. For example,
~all modifications associated with arithmetic are always modifications
to variables only. We call this kind of modification ~subjective
~modification.
Beyond arithmetic, most programming languages lose this kind of
simplicity. Unfortunately, in the absence of this model, modifications
~intended ~for ~variables might inadvertantly be implemented as
modifications to objects. Debugging these kinds of errors form some
of the most time consuming of debugging tasks.
Chapter 12 introduces a general parser whose implementation is based
on this model. The safety afforded by the model makes possible the
straightforward and general parsing method, as well as its proof
of correctness (Chapter 13).
We begin with the other kind of modification, modification to objects.
10.1
Objective Modification
Not only can pointers drastically reduce memory consumption, they can
also make the updating of a database very easy.
Consider figure 10.1. The PERSON blocks are shared among the family
and club blocks. This allows us to represent Mary exactly once.
A year from now, we will need to increment Mary's age. We can do that
simply by replacing the 34 in the Mary block with a 35. That one
modification will affect all points of view. From the points of view
of the family and two club blocks, Mary's age looks incremented.
(This is much easier than updating all appearences of the Mary block
as shown in the non-pointer rendition in figure 10.2).
Any modification which is meant to be visible to all points of view
is called an ~objective modification. The term ~objective contrasts
the term ~subjective. An objective fact is seen by everyone. In
contrast, a subjective fact may be seen from only one point of view.
The difference is between ~all or ~exactly ~one points of view.
~All is objective. ~Exactly ~one is subjective.
If P points to the Mary block, then we set her age to 35 ~objectively
via:
@(P).AGE := 35 ;
The @(...) dictates that the modification be apparent from what the
"..." points to. P points to the Mary block, and so the Mary block
itself is objectively modified. The modification becomes apparent
from all points of view that see the Mary block.
In contrast, the subjective modification:
P.AGE := 18 ;
has no @, and it affects only one point of view, the point of view
of the variable P. P now references a block which is just like the
Mary block, except that its AGE is 18.
10.2
Subjective Modification
Whenever we deal with numbers on the computer, all our modifications
are subjective. Consider the following program:
I:= 3 ;
J:= I ;
This leaves 3 in each of the variables I and J. If we then execute:
I:= 4 ;
we expect that I will be 4 and J will be 3.
Each of these assignment statements affects only one point of view,
the point of view of the variable appearing on the lefthand side of
the ":=". We modified variables and ~not numbers.
Would you be surprised if the program left a 4 in both I and J!
That's what would happen if the last assignment were objective,
written as:
I:= 3 ;
J:= I ;
@(I):= 4 ;
This last assignment modifies not the ~variable I, but the ~number
referenced in I, the ~number 3. This changes the number 3 to now be
just like the number 4. Having modified 3 to mean 4, J sees only the
present 4 (where the old 3 was). The following program:
WRITE(3);
prints a 4.
Objective modification on numbers appears absurd. Who wants all 3's
to suddently become 4's? We never mean to modify ~numbers. We mean
only to modify ~variables.
No programming languages intentionally offers objective modification
on numbers. One loophole in FORTRAN however can inadvertently perform
an objective modification:
ADD(1,2,3)
N = 1
...
SUBROUTINE ADD(I,J,K)
I = J + K
END
The call to ADD turns the number 1 into the number 5
instead. N will come out holding a 5. (To see this effect on some
computers, the numbers in this example must be made larger than
a particular constant, like 2^18).
While objective modification upon numbers is absurd, many programming
languages support ~only objective modification for all data represented
by pointers.
Let's define a type of data that is represented by a
pointer. Figure 10.3(a) shows an instance of the type POINT, declared
as follows:
TYPE POINT = [ X,Y : REAL ] ;
POINT is meant to be a vector in two dimensions.
Let's have two variables of type POINT, P and P1. Figure 10.3(b) shows
P holding a point, having executed:
P := [ X: 5 Y: 10 ] ;
Part c shows the situation after executing:
P1 := P ;
P1 and P point to the same place. Using a pointer to represent a POINT
is helping us here. We need move only one bin's worth of data from
P into P1. There are now two points of view, P, and P1, onto the
block. For the moment, memory is saved by sharing that one block.
Let's introduce another statement afterwards:
P.X := 20 ;
P now has 20 as its x-coordinate. What is P1.X? That is, what would
we see upon executing:
P1 := P ;
P.X := 20 ;
WRITE( P1.X ); ?
If P and P1 were numbers, we would expect P1.X to be unchanged. After
all, we never specified a modification to P1. Nowwhere in the program
does it say to modify P1.
Figure 10.3(d) shows the situation after we execute the
P.X := 20 ;
We affect no block. We make the change apparent only from one
point of view, the variable P. Thus, the modification specified upon
P leaves P1 unaffected. The assignment to P.X is subjective.
BOX: What is the difference between subjective and
BOX: objective modification?
BOX:
BOX: Programs that perform arithmetic use always which
BOX: form of modification?
BOX:
BOX: Have you ever been bitten by (someone else's routine)
BOX: that inadvertantly performs objective modification
BOX: (e.g., overwrites one of your blocks)?
10.2.1
Copy-On-Write
If we were to have put the 20 right over the 10, as in:
@(P).X := 20 ;
we would have the situation in figure 10.3(e). This objective
modification affects not the variable P, but the block that P points
to.
This modification is not to a variable, a single point of view,
but to a block, which might be referenced many times over. It
might be referenced from other blocks or variables. This
objective modification is seen from all points of view.
In contrast, the subjective modification:
P.X := 20 ;
affects only P's point of view. We ~copy the block referenced
by P ~before ~writing in the new X value (figure
10.3(d)). We call this ~copy-on-write. To affect no other point of
view, we copy the block and then update it. Only the new copy sees
the change, and all other points of view are oblivious to this
modification.
It may seem unnatural to form a new copy at this point. The copy
we made for this subjective modification, shown in figure 10.3(d)
as the slanted block, is not an extra expense. Consider that the
program:
P.X := 20 ;
is equivalent to the program:
P:= [ X: 20 Y: P.Y ] ;
This latter assignment specifies a new block for P to point to. That
new block has the X as 20. The other field, Y, is preserved from
P. The situation here is identical to the former situation,
where P in any case is shown in figure 10.3(d).
10.2.2
Another Example
Let's consider another example of subjective modification. First, we
make a new type, BOX, which will reference two POINTs:
TYPE BOX = [ LOW, HIGH : POINT ] ;
The box is two points, the lower-left corner (LOW) and the upper-
right corner (HIGH). Consider the following program, assuming that B
is of type BOX:
P:= [ X: 5 Y: 10 ] ;
B:= [ LOW: P HIGH: P+P ] ;
P1:= P ;
P.X := 20 ;
The effect of each of these statements is shown in figure 10.4(a-d).
The last assignment affects P's X-coordinate, but affects nothing
else, because of our subjective interpretation for such assignments.
We do not expect the last assignment to affect B and in fact it
doesn't. The slanted blocks denote new blocks that were created since
the previous frame. Notice in part (d) that only P's point of view is
changed, implemented by creating a new (slanted) block, via
~copy-on-write.
The difference between figure 10.4(d) and (e) is merely the
introduction of another point of view, B1, which is caused by
executing:
B1 := B ;
The following (subjective) modification affects B's point of view:
B.LOW := B.HIGH ;
Figure 10.4(f) shows this new situation. Copy-on-write makes a new
(slanted) copy of the old BOX block, at the bottom of the figure.
That new block is now referenced by B. (B1 still points to the old
BOX block). This latest assignment does not affect P or P1 or B1,
as we expect. Notice in the new block that the LOW and HIGH fields
point to the same block (of type POINT), just where the old box (B1)'s
HIGH field points.
Finally, let's execute:
B.LOW.X := 1 ;
Figure 10.4(g) shows the situation. In this example, we have two
levels of modification. There are two periods in the above assignment.
First, we copy B's LOW point, to implant the 1 into
its X field. That new point is the block at the lower-right in
figure 10.4(g). Then we copy B itself so as to implant in its LOW
field the new point. B now points to the new BOX block.
That's equivalent to the assignment:
B := [ LOW: [X: 1 Y: B.LOW.Y ]
HIGH: B.HIGH ] ;
This assignment specifies a new POINT block, referenced within a new
BOX block, just as shown in figure 10.4(g).
We have left unreferenced the small looking block, which B used to
point to. Assuming there is no other reference to that block, a
periodic process called garbage collection will find a way to reuse
that block (Part 8).
BOX: Do you expect that an assignment statement like:
BOX:
BOX: P.X := 35 ;
BOX:
BOX: to affect other variables or data?
10.2.3
Copy-On-Write Tends To Maximize Sharing
Copy-on-write copies as little as possible, just enough to implant
the specified modification.
Everything that isn't copied is shared between the original
and the new copy (figure 10.5).
The new memory generated by copying-on-write represents the ~difference
between the old and new data. Things that aren't different between
them remain shared for the most part.
Since copy-on-write introduces new sharing, memory may be used more
efficiently. Figure 10.4 was formed by copying-on-write. Notice that
many blocks are shared by other blocks and by variables.
A subjective modification affects of course only the point of view of
one variable. Although the variable's new point of view is shared
with nobody else at present, further appearences of that variable
in the program will lead to new data structures that share the
variable's point of view.
BOX: Have you ever seen an operating system that employs
BOX: a copy-on-write methodology on a ~per ~page basis?
10.2.4
Copy-On-Write Is Required Only For A Few Language Constructs
Only a few language constructs need be concerned about copy-on-write,
for implementing subjective modification. Only those
constructs that specify a modification to existing data need be
concerned with copy-on-write.
The only constructs in ICL that specify modifications to portions of
existing data are record selection (Section 23.4),
P.X := ... ;
and list indexing (Section 23.3):
LIST[5] := ... ;
These constructs, when used on the lefthand side of ":=", are the only
constructs that need to copy on write.
(The modification of ~arrays in ICL has only the possiblity for
objective modification. Array indexing on the lefthand side of
assignment statements must be specified always with the "@"
(Section 23.10)).
These few language constructs introduce a little overhead, the time
taken to make a new copy. All other language constructs are entirely
oblivious to copy-on-write, and incur absolutely no overhead.
One block is copied upon executing
B.LOW := ~something ;
Two blocks are copied upon executing:
B.LOW.X := ~something ;
There are two dots here, selecting the LOW field and then selecting the
X field. In general, if N dots appear on the lefthand side of the
":=", then we copy N blocks. N is usually small, rarely exceeding 2.
Many programming languages offer only one way to build records
(blocks). They require:
P := NEW_MEMORY ;
@(P).X := 20 ;
@(P).Y := 10 ;
instead of:
P := [X:20 Y:10] ;
These programming languages ~force you to specify ~modifications in
order to build any record. Given our "[...]" notation for building
records, most appearences of record modification disappear. Copy-
on-write is thus not involved nearly as much as you might imagine.
BOX: Must the "+" in "X+Y" be concerned with subjective
BOX: modification?
10.2.5
A Block Of Memory Is Younger Than Those It Points To
With subjective modification, no block is ever overwritten with new
data. Subjective modification instead copies the block and then puts
the modification in the new block. This means that the original block
continues to hold its original data forever.
With subjective modification, a block holds the same data that was
put into the block at the time the block was created. Let's think of
a block's age as the time elapsed since that block's creation.
When a block is created, the pointers emenating from that block
reference ~existing blocks. The referenced blocks must exist
presently, because otherwise we couldn't even have pointers to them.
This indicates that the new block is ~younger than the blocks it
points to. When the new block was created, the referenced blocks
already were in existence.
The only way to modify data in an existing block is to use objective
modification. Subjective modification would leave that block unchanged
since it copies the block and writes only into the copied block. With
objective modification, you can insert a pointer into the block at any
time, long after the block was created. That inserted pointer may
point to a very young block, a block just created. That is, the
modified block can now be older than some of the blocks it points to.
With adherence to subjective modification, we can associate a block
with its ~time ~of ~creation. We know that block will persist
unchanged for all time since that time of creation. This kind of fact
is often used to formulate proofs, indicating that what ~was true back
then continues to be true ~now. We will use this kind of proof to show
the completeness of the parser introduced in chapter 12.
BOX: What is the oldest block in a cycle?
10.2.6
Cycles Require Objective Modification To Exist
Using only subjective modification, you can never create a cycle.
Consider the cycle in figure 9.13(a). Let's use the ~time ~of
~creation to prove that subjective modification alone couldn't create
that cycle.
Since each block has a time of creation, we can pick the one block in
the cycle that is the youngest, i.e., has the most recent time of
creation. On purely sequential computers, no two blocks have the same
time of creation. Therefore, there is a unique youngest block in that
cycle.
The youngest block points to another block in the cycle. Since we
assume only subjective modification, that youngest block can point
only to a younger block. Unfortunately, all the other blocks in the
cycle are older, because of our choosing the youngest block. That
youngest block therefore cannot point to another block in the cycle.
We have a contradiction here because the youngest block in the cycle
cannot be a member of the cycle!
Subjective modification can create the whole cycle ~with ~one ~pointer
~missing. Each block points to a younger block. One can therefore
get from the youngest block back to the oldest block. However,
you can't get from the oldest block back to the youngest block.
Implanting that final pointer, so as to complete the cycle, requires
objective modification. We objectively modify the oldest block so
as to point to the youngest block. This objective modification allows
us to violate subjectivity's requirement that you can point only to
older blocks. This is akin to violating causality.
10.3
Proofs Of Program Correctness
Subjective modification, which affects only the point of view
of a variable, is very safe. Such single-point-of-view
modifications have no side effects. There are no unwanted surprises,
surprises apparent from other points of view. All specified
modifications have only local effect. This freedom from side effects
makes possible proofs of program correctness.
If you perform only subjective modification, you can look at a page of
program listing and prove correctness (or detect a bug).
You can follow the action, resting assured that no modifications
except for those apparent on the page will occur!
You can even call functions on (other) pages and rest assured that none
of your blocks of memory will be changed by those functions. Remember,
subjective modification affects only variables.
(In the rare case that a function shares global variables with this
page of program listing, those ~variables might be affected by the
function. However, global variables can be managed safely so as to
minimize surprises. The HOLDING construct shown earlier ~tames
global variables).
If limited to only objective modification, program proofs would have to
become enormously complicated. The foreign notion of machine address
would have to be factored into the proof, ~at ~all ~statements ~in ~the
~program! The possible points of view that
might be affected must be considered in each statement, including
all statements in any functions you might call.
The vast majority of modifications are meant to be subjective, like
we expect with numbers. Subjective modification is so safe that ICL
makes that its default.
However, there are a few occasions where objective modification is
desired. We've seen one example, where we want to update Mary's age,
to be apparent from all points of view. We also used objective
modification upon LABELs in Section 7.3.2.
The @(...) operator exposes all objective modifications in your
program listing. They stand out, and so can be reconsidered easily
upon tracking down a bug or forming a program proof.
By making subjectivity the default, you need be concerned about
points of view only at the relatively rare instances where you use the
@.
@'s usually occur inside relatively few short functions, functions that
were created to have objective effects. Each such function performs
one (or more) objective modifications so as to leave your database
(blocks) in consistent shape. These few functions thus serve to
modify blocks in safe, consistent manners.
Places where you know you want objective modification are easily
proven correct, since that was your intention in the first place. ~All
other modifications (subjective modifications) need no consideration
of points of view, and hence are also easily proven correct.
Proofs of program correctness become more essential within complex
programs. Most often, you can't test the program exhaustively. Moves
towards easing program correctness proofs will fascilitate vastly more
complex programs, as the future will demand.
10.4
Variables Act As Scaffolding Around A Building
Our variables point to blocks, but no block can point to a variable.
Figure 10.6 illustrates this. A block can point to the same place
that a variable points (figure 10.6(b)). This happens when the
variable appears in an expression. In contrast, the block can never
point to the variable itself (figure 10.6(c)).
Since no data can lead to a variable, a variable can be changed only
if you see an assignment ~to ~that ~variable in your program listing.
Also, a change in that variable's contents will truly be visible from
only one point of view, the variable's, which is ~invisible to all
blocks and all other variables.
Program variables, all variables apparent in a program listing, are not
part of the universe over which your program works. They are a part
of the programming language.
We can think of our data structures as a big building. Variables act
as scaffolding around that building. Variables, like scaffolding,
give access to places in the building. They are necessary during
construction.
However, the building itself does not include the scaffolding. The
scaffolding can be removed and the building continues to exist.
Similarly, our data structures do not include program variables. The
data structures continue to exist independently of the contents of
any variables used during the construction.
10.5
Objective Points Of Evolution - Object Variables
Sometimes, your domain may contain its own notion of "variable".
Consider that the domain of arithmetic deals with numbers and has no
notion of "variable". A program that performs arithmetic has
program variables that hold ~numbers. They never hold variables.
In contrast, the domain of algebra does include a notion of
"variable". A program that performs algebra has program variables
that hold equations and possibly their variables. Here, we have
program variables holding algebraic variables.
Variables that are part of your domain, like algebraic variables, are
called ~object ~variables. They can be referenced within other
objects (blocks). They are ~not variables in the programming
language.
To put a new value into an algebraic variable, you use objective
modification, so that the modification becomes apparent from all
points of view, e.g., from the points of view of all equations that
reference this algebraic variable.
10.6
Another Attempt To Achieve Subjective Modification - Copy-On-Reference
The overhead of copy-on-write may be objectionable to some people.
As long as variables hold only pointers (not blocks), there is no
better way to achieve subjective modification than with copy-on-write.
Let's try another approach to achieve subjective modification, called
~copy-on-reference.
We could store in a variable not merely a pointer to data, but
the data itself. Figure 10.7(a) shows the variable B as consisting
of four bins, holding the information in the type BOX. Now if you
say:
B.LOW.Y := 12 ;
we can implement this by storing a 12 directly into B's third bin.
No copying is necessary. Only the variable B will see this
modification.
Keep in mind that we continue to tolerate no pointers to variables.
B's four bins are indeed invisible to all points of view except B's.
Thus, this modification to the box B is in fact subjective.
Suppose B1 is another variable of type BOX. The assignment
B1 := B ;
cannot merely copy a pointer to the B box into B1. Doing so
would leave B1 pointing to the variable B, and that is forbidden.
(Such would render a modification to B also visible to B1, and vice
versa). Thus, this assignment must copy all four bins from B into B1.
As long as we write directly into a block of memory, we must assure
that no two pointers point to that same block of memory, lest a
modification become apparent from two points of view. Instead of
sharing a block of memory, separate copies of that block of memory
must be maintained. With separate copies, each one can be modified
(without copy-on-write) and the modification will be apparent from
exactly one point of view.
We have seen that sharing must be forbidden in order to achieve
subjective modification in the absence of copy-on-write. Without
sharing, pointers cease to bring any benefit, and hence pointers can
just as well be banned.
In the absence of pointers, every assignment, e.g.,
X := Y ;
must perform a complete copy from Y into X. If Y represents a set
of 1000 boxes, then 4000 bins have to be copied! (No box can be
shared between X and Y, lest a modification specified from X's
point of view affects Y's point of view). This cost is especially
counter-intuitive when you discover slow execution times for the simple
looking program:
X := Y ;
Z := Y ;
or for the function call:
F(Y,Y)
Both assignments and the function call must make copies of Y. The
size of each copy can be arbitrarily large. In fact, Y might be a
whole integrated-circuit design, consisting of millions of boxes!
This ~copy-on-reference policy requires the following. Each appearance
of a variable in the program implies that its contents must be copied.
With copy-on-reference, you pay for copying upon each ~reference to a
variable. In contrast, with copy-on-write, you pay for copying upon
each ~modification.
With copy-on-write, each modification requires a small copy, but
each reference to a variable incurs no overhead. In either of:
X := Y ;
X := Y ;
or
F(Y,Y)
only one bin is copied.
What happens most often, modifications or references? Usually we
perform a modification only in preparation for a reference. This
argues that each modification has at least one corresponding
subsequent reference.
With more references that modifications, paying only for the
modifications sounds the cheapest. But even if there were more
modifications than references, the copy-on-write expense is merely the
copying of one or two blocks (see Section 10.2.4). In contrast, the
copy-on-reference (non-pointer) cost for a reference is arbitrarily
large.
Thus, the use of pointers and copy-on-write for subjective
modifications operates the most efficiently in general. However,
simple systems can be implemented efficiently without copy-on-write,
and without the guarantee of subjective modification. With
simple systems, the need for the safety of subjective modification
becomes less important, because the programmer can get his or her head
around the entire program. The programmer can take the time to
discover from which points of view each modification will become
apparent.
However, the current software crisis refers to large programming
tasks, which comprise nearly all real-world programs. Subjective
modification and the use of pointers are necessary to quickly
achieve working, practical programs.
Oddly enough, with subjective modification, the programmer can in fact
believe that the copy-on-reference method is used. He or she can
believe that each appearence of a variable delivers a new copy. The
programmer can then ignore the notion of copy-on-write entirely, at
least until an objective modification is desired. With complex
systems, the safety of subjective modification, and the efficiency of
pointers, provides inexpensively for the same behavior offered by the
more intuitive but more expensive copy-on-reference method.
10.7
Exercises
1) In figure 10.4(f), what variables will "see" the effect of each of the
following:
a) @(P) := [X:1 Y:2] ;
b) @(P1) := [X:1 Y:2] ;
c) @(B).LOW := [X:1 Y:2] ;
d) @(B.LOW) := [X:1 Y:2] ;
e) B.LOW := [X:1 Y:2] ;
f) @(B1.LOW) := [X:1 Y:2] ;
Is @(B.HIGH):= ... ; the same as @(B1.HIGH):= ... ; ?
2) If X references 100 bins of memory, how many bins must be copied for
the assignment:
Y := X ;
with each of the two methods:
a) Copy-on-write
b) Copy-on-reference
3) Copy-on-write copies N blocks for the assignment statement:
X.A.B.C. ... .D := ... ;
when N single dots appear on the lefthand side. What's the largest
value for N have you ever seen in a program?