CHAPTER 23
DATATYPES, TYPE CONSTRUCTORS
AND
TYPE-SPECIFIC EXPRESSIONS
Something must be known about a datum if it is ever to be used. For
example, if you wish to add this datum to another, one must know how
the datum is represented. Is the datum 16-bits, 32-bits, or 64-bits
long? Is it represented as a fixed point number (INTeger), or as a
floating-point (REAL) number? These questions must be answered before
we can even begin to use a datum.
We associate with any datum what we call its ~type. The type tells
how we may operate on the datum. A type is basically a set of
properties or assumptions that we associate with certain data.
There may be many different data all of which follow the properties
associated with a type. Those data are said to be ~instances of that
type. We also say that the data are ~of ~that ~type.
Since different types of data may obey different properties, it is
essential that no two types be confused. For example, the
representation of an INTeger is different from the representation of a
REAL ~even ~for ~the ~same ~number! ~Type-checking refers to the
enforcement that data of different types never become confused.
Type-checking requires that contexts which demand an INTeger number are
in fact always given INTeger numbers. For example, a simple ~factorial
function may require that its operand be of type INTeger. Passing a
REAL instead of an INTeger should be noted as an error by type-
checking.
Types introduce distinctions, like the distinction between INT and
REAL. While these distinctions are necessary for the computer,
sometimes the human programmer would like to ignore such distinctions.
The programmer can declare coercions or polymorhpic functions (Sections
23.4.4.4 thru 23.4.4.6) so as to partially or completely render
distinct types as interchangeable. Thus, the human can overlook the
original distinctions even though the computer can't.
With polymorphism and coercion, ~type-checking serves not only to issue
error messages when data of distinct types are confused, but also
plays an active role in ~completing the program specification.
Type-checking can use coercions to translate from one type to another
type, which happens when you supply data of one type in a context which
requires a datum of a different type.
This chapter deals with ~datatypes (or ~types). We use our model
language ICL to illustrate a rich set of types and ~type ~constructors.
ICL comes with a few builtin types, like INT and REAL. It also
provides a set of distinct means by which to create new types from old
ones. For example, the ~list ~of type constructor forms a new list
type whose elements are all of some specified (other) type.
23.1
The Part-Of-Speech TYPEX
The part-of-speech TYPEX denotes notations that specify new types.
Each type constructor presented is represented by a rule of grammar.
For example, we introduce the rule:
{ TYPEX } -> TYPEX
to allow for the formation of lists. The builtin type INT (which
itself is a TYPEX), enclosed in these curly-brackets, denotes the type
~list ~of ~INTegers.
Along with each type constructor like this one, we introduce ICL's
notations for dealing with ~data of that type. For example, this type
constructor introduces a notation which selects the I'th element from
a list:
EXPR [ EXPR ] -> EXPR
The first EXPR is a list and the second EXPR (in "[...]") is an
INTeger, like I. Thus,
list [ 7 ]
is the seventh element of the given list.
ICL has the following builtin types:
INT, REAL, BOOL, CHAR, TEXT, and POINT.
ICL has type constructors to form:
lists, records, variants, processes, disk-resident data, and
more.
Much of the contribution of any new programming language is the forms
of types that it allows. ICL's set of type constructors can and will
be augmented as new, useful type constructors are imagined. Already,
it is a rich set.
We begin with the builtin types.
23.2
The Simplest TYPEX - Refering To A Type By Its Name
ID -> TYPEX
The ~name of an existing type is the simplest form of TYPEX. We now
introduce ICL's built-in types.
23.2.1
ICL's Builtin Types
1) INT (short for INTEGER)
INTegers are whole numbers, both positive and negative numbers and
zero. ICL represents INTegers by a 32-bit word. The syntax rule:
NUMBER -> EXPR
accomodates INTegers and REALs.
2) REAL
REALs are numbers that may include a decimal point, i.e., contain
fractions.
3) BOOL (short for BOOLEAN)
A BOOLean value is either TRUE or FALSE, as supported by the rules:
true -> EXPR
false -> EXPR
4) CHAR (short for CHARACTER)
A CHAR is specified by enclosing one character between two occurences
of the single quote, e.g.,
'a'
The single quote character is specified via:
''''
The following syntax rule support CHARacters and general TEXT strings:
TEXT -> EXPR
5) TEXT
A TEXT is written by writing any sequence of characters, starting and
ending with a single quote ("'"). You include a single quote within
TEXT by writing two of them. Thus:
'abcd' is the TEXT abcd
'john''s' is the TEXT john's
'b' is the TEXT and the CHAR b
Any characters, including carriage-returns may appear in TEXT. TEXT
containing only one character is also an instance of type CHAR.
Subtlety:
A slight preference is given to the CHARacter interpretation
over the TEXT interpretation. That is, in case of a potential
ambiguity, an interpretation which view the text as a CHAR
will beat an interpretation that views the text as a TEXT.
Such ambiguities are rare. (We haven't seen any yet).
To override this default, use the ~type ~disambiguation
notation and write:
TEXT :: 'A'
This EXPR is of type TEXT and not of type CHAR.
6) POINT
A POINT is a two-dimensional vector, a pair of REALs. POINTs were
introduced into ICL for applications involving integrated circuit
microchip layouts.
A POINT is formed by putting a "#" between two REALs, as in:
~REAL # ~REAL -> ~POINT
(The actual syntax rule that supports this is:
# -> BOP ).
A POINT is examined via either of these two rules:
~POINT . x -> ~REAL
~POINT . y -> ~REAL
Each extracts one coordinate from the given POINT. (The syntax
rule:
EXPR . ID -> EXPR
supports this notation).
The only difference between a POINT and a general record (Section 23.4)
is that the "#" notation is available for forming POINTs and not
for records.
Arithmetic applies to POINTs as complex numbers.
23.3
Lists
{ TYPEX } -> TYPEX
This forms a new type, a list all of whose elements are of the same
type. The type of all the elements is the TYPEX appearing between
the curly brackets.
Curly brackets are used generally for list formation.
This form of TYPEX is used in the following example type declaration:
TYPE REALS = { REAL } ;
A REALS is a list of individual REALs. Each element in an REALS is of
type REAL.
Any list has what is called its ~element ~type, the type of all its
elements, the type appearing between the curly brackets.
Following are all the ICL notations that deal with lists.
{ EXPR ; ... ; EXPR } -> EXPR
Synthesize a list whose elements are specified between the curly
brackets and separated by semicolons.
The EXPRs must be of the same type, a type which is the element-type of
some declared list type. The result is of that list type.
In datatype space, our example declaration of the type REALS
contributes this datatype rule:
{ ~REAL ; ~REAL ; ... ; ~REAL } -> ~REALS
Let's recast our syntax rule involving "{...}" as follows:
{ REXPR ; ... ; REXPR } -> EXPR
where we have:
EXPR -> REXPR
These two rules together support the presentation just given, but
here we introduce a new part-of-speech, REXPR.
An REXPR may be something an EXPR can't be, a "range":
collect EXPR QUANTIFIER -> REXPR
This notation specifies not one element, but rather a dynamic
number of elements. This contributes one element for each iteration
caused by the QUANTIFIER. That is, for each iteration caused by the
QUANTIFIER, evalute the EXPR and contribute that as an element of the
list.
The EXPR in this construct must be of the same type as any other
element EXPRs appearing in the list specification.
NOTE: Elements of value NIL will ~not be included in the resulting
list. In general, NIL values never appear in lists; NIL
values are dropped from lists.
We do guarantee that EXPRs in lists ~will ~be
~evaluated in left-to-right order. This is relevant if any of the
EXPRs produces side-effects.
A variation on this COLLECT notation is available, and is semantically
identical:
QUANTIFIER collect EXPR -> REXPR
You may use one or the other notation, depending on whether you want
to emphasize the QUANTIFIER, as in this latest notation, or the EXPR,
as in the former notation.
Example:
Each of the following two EXPRs denotes a list of ten elements,
the numbers from 1 to 10:
{ 1 ; 2 ; 3 ; 4 ; 5 ; 6 ; 7 ; 8 ; 9 ; 10 }
and
{ COLLECT I FOR I FROM 1 TO 10; }
(The one semicolon you see in this latter example is
part of the FOR-quantifier. It is not part of the list
notation itself).
The following EXPR also produces the same list:
{ 1 ; 2 ; COLLECT I FOR I FROM 3 TO 8; ; 9 ; 10 }
Most lists are specified simply via:
{ collect EXPR QUANTIFIER }
Example:
Given a list L, the following list is identical to the list L:
{COLLECT X FOR X $E L;}
The FOR-quantifier sets X to each element in L. We ~collect
those X elements together to form a new list. Thus, L and this
resulting list have identical elements in exactly the same
order.
The following list is like L except that the elements are one
larger than the originals:
[COLLECT X+1 FOR X $E L;}
The following list is like the list L, except that it contains
only the positive elements in L:
{COLLECT X FOR X $E L; WITH X>0 ; }
(The QUANTIFIER
FOR X $E L; WITH X > 0 ;
sets X to each ~positive element in L).
EXPR [ EXPR ] -> EXPR
Index into the list. The first EXPR must be of a list-type, and
the second EXPR must be of type INTeger. The resulting EXPR, an
element, is of the list-type's element type. Thus, our example
declaration of the type REALS effectively introduces the datatype rule:
~REALS [ ~INT ] -> ~REAL
This notation represents the i'th element in the list, where i is the
second EXPR's INTeger value.
The first EXPR, the list, will need to be enclosed in parentheses
"()" if it contains any BOPs. That is, the righthand "[EXPR]" binds
to the shortest possible EXPR to its left, even if type consistency
is lost. For example:
L1 $$ L2 [5]
groups as
L1 $$ ( L2[5] )
and not as:
(L1 $$ L2) [5]
If the index exceeds the length of the list, you get NIL, 0, or FALSE
as the result.
If the index is zero or negative, you will get a runtime error message.
This construct also has meaning on the lefthand sides of assignment
statements. In this ~target role, this construct means ~replace ~the
~i'th ~element ~of ~the ~list. This is a subjective modification.
Note that since NIL values are never members of lists, the assignment
of a NIL value via this notation will effectively ~delete that element
from the list, thus shortening the list's length by one.
Example:
If L is a list, then
L[1] is the first element in that list, and
L[K] is the K'th element.
Example:
The assignment:
L[5] := an element value ;
affects the variable L so that L's 5'th element appears to
have the new value. If that "element value" happens
to be NIL, then this affects the variable L so that the fifth
element appears to be ~deleted from the list.
NOTE: Indexing on lists is not necessarily an efficient operation.
Its execution time is proportional to the value of the integer
index.
We recommend that you use the "$E" FOR-quantifier
profusely to access elements in a list, instead of indexing.
EXPR [ EXPR - ] -> EXPR
Extract a tail from a list. The first EXPR must be of a list-type,
and the second EXPR must be of type INTeger. The result is an
instance of the list-type (~not the element-type like with indexing).
Our example declaration of the type REALS has thus introduced the
datatype rule:
~REALS [ ~INT - ] -> ~REALS
The result is a ~tail or sub-list of the given list (the first EXPR).
The result is the tail starting from the i'th element all the way to
the end of the list, where i is the INTeger value of the second EXPR.
That is, the result is like the given list except that we skip the
first i-1 elements. The result has length n-i+1 where n is the
length of the given list.
See the previous rule, indexing, for grouping considerations.
If the index exceeds the length of the list, the result is NIL.
If the index is zero or negative, you get a runtime error message.
This construct also has meaning on the lefthand sides of assignment
statements. In this ~target role, this construct means ~replace ~the
~tail, ~from ~i ~onward, of the list. This is a subjective
modification.
Example:
If L is a list of 10 elements, then
L [ 2 - ]
is the same list with the first element removed, a list of
length 9. Similarly, the list:
L [ K - ]
denotes the list L with the first K-1 elements removed.
The list:
L[1-]
is exactly L.
Example:
If L is a list, then the assignment:
L [ 5 - ] := a new list ;
affects the variable L so that its first 4 elements appear
unchanged, but its fifth and higher elements become the
"new list". That is, its fifth element appears to be what was
the first element of the new list, and its 6'th element appears
to be what was the second element of the new list, etc.
In other words, this first removes all but the first
four elements of L, and then appends to L the new list.
If the "new list" is itself NIL, then this assignment clips
the list L so that L has only four elements.
<$ -> BOP
$> -> BOP
$$ -> BOP
Please refer to Section 22.1.3 where all BOPs are presented. "<$" is
left-append, "$>" is right-append, and "$$" is concatenation.
reverse( EXPR ) -> EXPR
The first EXPR must be of a list-type, and the result is of that same
type.
The resulting list appears in reverse order.
refresh( EXPR ) -> EXPR
The given EXPR must be of some list type, and the result is of the
same type. Logically, this operator is an identity.
You might imagine that all lists are represented with elements in
order, as shown in figure 23.1(a). In fact, the curly-bracket notation
for forming lists produces lists that look like this. The left-append
("<$") preserves this property, as shown in figure 23.1(b).
In contrast, the right-append operator ("$>") and concatenation
operator ("$$") produce a different kind of structure. In the
implementation, they always left-append some new block of memory,
although logically it represents the right-appended element or the
concatenated list. Figure 23.1(c) shows what right-append produces.
Figure 23.1(d) shows what concatenation produces.
The right-append and concatenation operators produce memory blocks
each with a special tag within the first bin, to distinguish among
the operators. In the figure, the tag appears enclosed in a circle.
Neither of these two operators takes time to step to
the end of a list. Thus, all three of our operators ("$<", "$>",
and "$$") work very quickly.
Our general tree-like representation requires special interpretation
whenever we wish to step thru elements in a list. This special
interpretation occurs automatically for you.
The REFRESH operator renders ~any list as though it had been created
via only the left-append operator.
If you use the @-operator (Section 22.1.11), you may require that the
list be refreshed. Here, you can rely on the fact that the logical
order matches the implemented order. The REFRESH operator will render
any list to be this standard, refreshed implementation.
23.3.1
Why This Tagged Implementation?
The choice to implement our list operators this quick way is
based on two considerations: efficiency and ICL's default
subjective modification (Chapter 10).
One could coneivably walk to the end of the list and force the
last block now to point to our new last element.
That would indeed gain the effect of right-appending to our
list. Unfortunately, this would be an ~objective modification,
as it affects data in an existing block of memory (the last
block in the original list).
This objective modification has the disadvantage of making
the modification apparent from more than one point of view.
For example, if someone else pointed to our list, or even to
the second block in our list, the modification to the last
block would be visible from that someone else's point of view.
That is, starting from our first (or second) block, they would
reach what was our last block, and then find the new last
block beyond that. They would experience a lengthening of
their lists, which share blocks in common with our list.
We could maintain subjective modification by copying the list
prior to sticking the new last block onto the end, as shown in
figure 23.2(a). This removes the disadvantage of objective
modification, as we affect no existing blocks.
Whether we copy the list, or merely walk the existing list, our
right-append operator would consume time proportional to the
length of the list. Right-appending to a list of length N
would consume N time.
23.3.2
Avoiding An N-Squared Cost For Lists Built With Right-Appends
Consider what happens if you were to create a list of length N
entirely by repeated use of the right-append operator. This
scenario is common, and this time-N implementation of right-
append, repeated N times, would give an N-squared cost to the
construction of the list of length N.
In contrast, our tagged right-append implementation consumes
time independent of the length of the list. Thus, repeated
use of our tagged right-append implementation forms a list of
length N in only N time.
A subsequent access of the list will take
time proportional to N. (In fact, it will take exactly twice
as long as accessing a refreshed list. The interpretation of
a list formed by right-appends requires a temporary
construction of a new list of length N).
If you plan to access the same list many times, it is most
efficient to build the list, then REFRESH it, and then access
it many times. REFRESH takes only time proportional to the
length of the list. Thus, lists built freely with all three
operators, being REFRESHed once, and then accessed repeatedly,
all take time which is linear in N, the length.
23.3.3
Maintaining A Refreshed List At All Times
A few applications may benefit by building
a list that is refreshed at all times. Standard list
processing techniques, using the @-operator, can achieve this
if you can tolerate objective modification upon the list.
(You can always tolerate objective modification if you know
that nobody else points to the list; or if you know that all
points of view that can see the list are in fact also supposed
to see the modification).
For example, given the list in figure 23.2(b), you can keep
with you at all times a pointer to the end of the list, called
LAST in the figure. Figure 23.2(c) shows the list:
{ LAST[1] ; new_last }
The objective assignment:
@(LAST) := { LAST[1] ; new_last } ;
forms figure 23.2(d). Our last block has been overwritten with
the first block of this list of length 2. At this point,
we have preserved REFRESHedness and yet right-appended the new
element quickly. We finish the job by updating LAST so as to
point to the new last element:
LAST := LAST[2-] ;
sorted_by ID : EXPR increasing -> RHUOP
This SORTED_BY construction is a RHUOP, a righthand (postfix) unary
operator. For ease of description, let's include its parameter
(to the left of this new construct), as in the following EXPR:
EXPR sorted_by ID: EXPR increasing
The first EXPR must be a list, and the resulting EXPR is of that same
list type.
The second EXPR, and the "ID:" before it are present to
~rank the elements in the given list. The ID denotes a variable, whose
name you choose. That variable will hold each element in the list.
The ranking EXPR (following the "ID:") can read that variable, and must
deliver a value of type REAL. Elements with higher REAL values will
appear after elements with lower REAL values.
The resulting EXPR (list) is the first EXPR (list) reordered so that
the real-EXPR (following the SORTED_BY) would evaluate to a ~strictly
~increasing sequence of REALs.
Example:
Suppose L is a list of POINTs. The following EXPR is also a
list of POINTs:
L SORTED_BY P: P.X INCREASING
This entire EXPR represents the list L sorted so that earlier
elements (points) lie farther to the left (have smaller
X-coordinates) than do latter elements.
Example:
The list of points:
L SORTED_BY P: ABS(P) INCREASING
is ordered so that later elements lie farther from the origin
than do earlier elements.
Example:
Because this SORTED_BY construct is a RHUOP, we can sort the
list MY_LIST with the "::=" assignment notation, as in:
MY_LIST::= SORTED_BY P: ABS(P) INCREASING ;
Example:
Suppose L is a list of INTegers. The following EXPR is the
same list sorted in increasing order:
L SORTED_BY I: I INCREASING
NOTE: If the given list has more than one element which yields the
same REAL, then all but the first of those elements will be
dropped from the resulting list.
If you wish to preserve "duplicates", use the "non-decreasing"
rule coming up next.
sorted_by ID : EXPR decreasing -> RHUOP
sorted_by ID : EXPR non_increasing -> RHUOP
sorted_by ID : EXPR non_decreasing -> RHUOP
All four of these SORTED_BY rules taken together give you the
capability to sort in INCREASING or DECREASING order (losing duplicate
elements), or in NON_DECREASING or NON_INCREASING order (which preserve
duplicate elements).
Note: In case duplicates are kept, they will appear in the same order
within the resulting list as they appeared within the given
list.
FOR EXPR $e EXPR ; -> QUANTIFIER
This QUANTIFIER is the easiest and most efficient way to access
elements in a list. See Section 22.3.1.
23.4
Records
Records are introduced to overcome a limitation of lists. ~All
elements in a list are required to be of the same type. Records, in
contrast, conglomerate together elements whose types may differ. For
example, a single record can contain a name (TEXT) and a phone number
(INT).
23.4.1
Declaring A Record Type
We declare a new record type by writing a sequence of components,
enclosed within square-brackets:
[ ID : TYPEX ID : TYPEX ... ID : TYPEX ] -> TYPEX
Each component is a pair: a name for that component and a type which
will serve as that component's representation. (Each component
includes the name so that we will have a simple way to refer quickly
to that component without regard for the others). For example, the
record type:
[ NAME: type for a person's name
ADDRESS: type for a geographical location
NUMBER: type for a telephone number ]
can serve well as an entry in a telephone book. We might declare a new
type called TELEPHONE_ENTRY with:
TYPE TELEPHONE_ENTRY = [ NAME: PERSON_NAME
ADDRESS: WORLD_LOCATION
NUMBER: TELEPHONE_NUMBER ] ;
This says that the new type TELEPHONE_ENTRY has three components, named
NAME, ADDRESS, and NUMBER. Each component is represented respectively
by the types PERSON_NAME, WORLD_LOCATION, and TELEPHONE_NUMBER.
Each "ID:TYPEX" can be replaced by the notation:
ID , ID , ... , ID : TYPEX
This is a brief way to declare many components all of which are
represented by the same type. For example:
TYPE T = [ A,B : INT ] ;
is entirely equivalent to:
TYPE T = [ A: INT B: INT ] ;
We can view a record type as the Cartesian product over its component
types. That is, an instance of TELEPHONE_ENTRY is a member of the set
defined by the Cartesian product:
all PERSON_NAMEs X all WORLD_LOCATIONs X all TELEPHONE_NUMBERs
Following are all notations relevant to records:
[ ID : EXPR ID : EXPR ... ID : EXPR ] -> EXPR
Synthesize a record whose components have the names specified by the
IDs and the corresponding values specified by the EXPRs.
That is, the type declaration:
TYPE T = [ ID(1): TYPEX(1) ID(2): TYPEX(2) ...
ID(k): TYPEX(k) ] ;
effectively introduces the following rule for record synthesis:
[ ID(1): ~TYPEX(1) ID(2): ~TYPEX(2) ...
ID(k): ~TYPEX(k) ] -> ~T
See the example coming up shortly.
---------------- Parentheses in previous paragraph mean subscripting! ---
You may omit entire components. Unspecified components, when accessed
later, will show as NIL, 0, or FALSE (depending on the component's
type).
The order of component specification is unimportant.
NOTE: We do ~not guarantee that the EXPRs will be evaluated in order
from left-to-right.
This construct has meaning also on the lefthand sides of assignment
statements: Unload the components of the given record into the
specified EXPRs (variables).
Example:
The declaration:
TYPE T = [ N: INT P: POINT ] ;
makes T denote the record type having components named N and
P, of types INT and POINT respectively.
Example:
If R is a variable of the record type:
[ N: INT P: POINT ]
then the assignment:
R := [ N: 20 P: 3#4 ] ;
sets R to an instance whose N component is 20 and whose P
component is the point 3#4.
The assignment:
R := [ N: 20 ] ;
sets R to a new record whose N component is 20 and whose P
component is NIL.
Example:
If I is an INTeger variable and Q is a POINT variable, then:
[ N: I P: Q ] := R ;
sets I to R's N component, and sets Q to R's P component.
Similarly, the assignment:
[ N: I P: X#Y ] := R ;
sets I to R's N component, and X and Y to the x- and y-
coordinates of R's P component.
Example:
Given the record type TELEPHONE_ENTRY, which represents one
line in a telephone book, we can go on to define the datatype
of which an entire telephone book is an instance, namely:
TYPE TELEPHONE_BOOK = { TELEPHONE_ENTRY } ;
A telephone book thus consists of an arbitrary number of
TELEPHONE_ENTRYs.
If the variable TELEPHONE_BOOK is of type TELEPHONE_BOOK, we
can access all entries in the phone book by:
FOR X $E TELEPHONE_BOOK;
This presumes that X is a variable of type TELEPHONE_ENTRY.
We can access concisely the NAME and ADDRESS components of each
entry in TELEPHONE_BOOK via:
FOR [NAME:N ADDRESS:A] $E TELEPHONE_BOOK;
Each iteration sets the variables N and A to each name and
address in the phone book.
This is entirely equivalent to:
FOR X $E TELEPHONE_BOOK;
EACH_DO [NAME:N ADDRESS:A] := X; ;
except that this latter form also sets the variable X.
EXPR . ID -> EXPR
Extract a field from a record. The first EXPR must be of a record-
type. That record type must have a component of the name denoted by
the ID. The result is of the type associated with that component.
That is, the record declaration:
TYPE T = [ ID(1):TYPEX(1) ID(2):TYPEX(2) ...
ID(k):TYPEX(k) ] ;
introduces the extraction notations:
~T . ID(1) -> ~TYPEX(1)
~T . ID(2) -> ~TYPEX(2)
...
~T . ID(k) -> ~TYPEX(k)
---------------- Parentheses in previous paragraph mean subscripting! ---
This construct has meaning also on the lefthand sides of assignment
statements: Replace the specified component (ID) in the record (EXPR)
by a given value.
Example:
Refer to the example given in the previous rule. The EXPR
R . N
denotes R's N component. The assignment:
R . N := 12 ;
affects the variable R so that its N component is now 12.
This is a subjective modification in that only the variable R
is affected.
The EXPR to the left of the ".ID" may require parentheses: This
notation chooses the smallest possible EXPR to the left, even
disregarding type consistency. For example:
A \FUNCTION B . X
groups as:
A \FUNCTION (B . X)
and not as:
(A \FUNCTION B) . X
23.4.1.1
Example: A Recursive Record
The following declares a new record type, TREE:
TYPE TREE = [ LEFT_PART: TREE
VALUE: INT
RIGHT_PART: TREE ] ;
This type declaration is said to be ~recursive because the new
type TREE also appears in the definition, in the LEFT_PART and
RIGHT_PART fields.
The following is a TREE that has no LEFT_PART nor RIGHT_PART:
[ VALUE: 2 ]
The following is a more complex TREE:
[ LEFT_PART: [ VALUE: 1 ]
VALUE: 2
RIGHT_PART: [ VALUE: 3 ] ]
Figure 23.3 illustrates it. (More complex examples of
constructing nested records appear in Section 9.7.2).
The following function takes in a TREE and yields the sum
of the VALUE fields throughout the TREE:
DEFINE SUM( T: TREE ) = INT:
IF DEFINED(T) THEN
SUM( T.LEFT_PART ) + T.VALUE +
SUM( T.RIGHT_PART )
ELSE 0 FI
ENDDEFN
If the TREE T is defined (non-NIL), then the sum is formed by
acquiring the SUM over the LEFT_PART, the SUM over the
RIGHT_PART, and adding those together with T's VALUE field.
If T isn't defined (i.e., it is NIL), then 0 is returned.
(T might be NIL because the THEN clause calls SUM on
T.LEFT_PART. T.LEFT_PART might be NIL, as in our first
example TREE, "[VALUE:2]" ).
Recursive types, like TREE, are most often processed by
recursive functions, like SUM, which calls itself with sub-
trees.
EXPR ++ [ ID : EXPR ID : EXPR ... ID : EXPR ] -> EXPR
The first EXPR must be of a record-type, the same type as implied by
the record notation following the "++".
The result is of the same type as the first EXPR.
The result is a new record that is identical to the first EXPR except
that the specified components (to the right of the "++") are
overridden.
The first EXPR may require parentheses. This notation chooses the
smallest possible EXPR to the left of the "++", even disregarding type
consistency. For example,
A \FUNCTION B ++ ...
groups as
A \FUNCTION (B ++ ...)
and not as:
(A \FUNCTION B) ++ ...
Example:
The assignment:
B:= B ++ [LOW: 0#1] ;
is equivalent to:
B.LOW := 0#1 ;
Example:
The assignment:
WIRE:= WIRE ++ [PATH:SP WIDTH:0] ;
is equivalent to the two assignments:
WIRE.PATH:= SP ;
WIRE.WIDTH:= 0 ;
NOTE: This new construct may be more efficient than the use of
multiple assignment statements.
In addition:
You may use this new notation with the "::=" assignment notation.
Examples:
B::= ++ [ LOW: 0#1 ] ;
WIRE::= ++ [PATH:SP WIDTH:0] ;
@(WIRE)::= ++ [PATH:SP WIDTH:0] ;
23.5
Variants
either
ID = TYPEX
ID = TYPEX
...
ID = TYPEX
endor -> TYPEX
Our third type constructor gathers together unrelated types to form
a new type which admits a well-defined uncertainty in representation.
A given instance of the new type may be represented by any ~one of the
given types.
To help serve as an example, let's build two new types, a list and a
record:
TYPE POLYGON = { POINT } ; " a sequence of vertices "
TYPE BOX = [LOW: POINT HIGH: POINT]
"the lower-left and upper-right corners of a
rectangle whose edges are horizontal and
vertical"
The details of these two types are irrelevant, except that the they
make the following notations legal:
~POLYGON [ 1 ] -> ~POINT
and
~BOX . LOW -> ~POINT
Let us define a type called SHAPE, which will denote either a single
polygon or a single box. We declare SHAPE by writing:
TYPE SHAPE = EITHER
STATE1 = BOX
STATE2 = POLYGON
ENDOR ;
This says that a SHAPE is either a single BOX or a single POLYGON,
but not both. (This declaration also introduces some seemingly
irrelevant words, STATE1 and STATE2. Let's ignore these for now).
This declaration says two things about SHAPE:
1) Any BOX is a SHAPE and any POLYGON is a SHAPE. That is, we
gain the rules:
~BOX -> ~SHAPE
~POLYGON -> ~SHAPE
2) It is ~not true that all SHAPEs are BOXes nor that all SHAPEs
are POLYGONs. That is, we most definitely do not have the
rules:
~SHAPE -> ~BOX No!
~SHAPE -> ~POLYGON No!
The first property about SHAPE says that:
[LOW: 0#0 HIGH: 1#1]
which is a BOX, is also a SHAPE. Also:
{ 0#0 ; 1#2 ; 2#0 }
which is a POLYGON, is also a SHAPE.
Thus if we declare:
VAR S = SHAPE ;
then the following assignments are legal:
S:= [LOW: 0#0 HIGH: 1#1] ;
S:= { 0#0 ; 1#2 ; 2#0 } ;
The first assignment sets S to represent a BOX. The second assignment
sets S to represent a POLYGON.
In contrast, the second property about SHAPE says that the shape S
itself is uncertain. Since S is not always a BOX, the EXPR:
S.LOW
is illegal because S might be a POLYGON. Similarly, since S is not
always a POLYGON, the EXPR:
S[5]
is illegal since S might be a BOX.
Any type declared with our EITHER...ENDOR is called a ~variant type.
The actual representation of an instance is ~variable in the domain of
datatypes.
23.5.1
Creating Instances Of Variants
We have seen already one way to create instances of a variant type.
We write nothing extra. Any instance of one of the possible types
passes equally well as an instance of the variant type. In general,
a variant type T defined by:
TYPE T = EITHER
STATE1 = T1
STATE2 = T2
...
STATEk = Tk
ENDOR ;
introduces the rules:
~T1 -> ~T
~T2 -> ~T
...
~Tk -> ~T
each of which requires no specification whatsoever beyond an expression
whose type is any one of T1, T2, ..., Tk.
In addition to this implicit creation of instances of a variant type,
we have also a more verbose, and sometimes necessary notation. This
verbose notation makes use of the "noise" words STATE1, STATE2, and so
on. That is, not only do we have the rule:
~T1 -> ~T
but we also have the rule:
state1 :: ~T1 -> ~T
Thus, referring to our SHAPE example, we have not only the rules:
~BOX -> ~SHAPE
~POLYGON -> ~SHAPE
but we also have the rules:
state1 :: ~BOX -> ~SHAPE
state2 :: ~POLYGON -> ~SHAPE
For example:
STATE1:: [LOW:0#0 HIGH:1#1] is a SHAPE, and so is
STATE2:: { 0#0 ; 1#2 ; 2#0 } but
STATE2:: [LOW:0#0 HIGH:1#1] is not a SHAPE.
This "::" notation is supported by the syntax rule:
ID :: EXPR -> EXPR
This rule applies to the smallest EXPR possible, even disregarding type
consistency. For example:
A :: X + Y
groups as
(A :: X) + y
and not as:
A :: (X + Y)
This verbose notation is rarely required. It is required only when an
ambiguity might arise. For example, the declaration:
TYPE T = EITHER
STATE1 = BOX
STATE2 = POLYGON
STATE3 = BOX
ENDOR ;
refers more than once to the same type, BOX. Why we would ever make
such a declaration is not necessarily obvious. But if we did, the BOX
expression:
[ LOW: 0#0 HIGH: 1#1 ]
could be interpreted as a T in two ways: Either it is a T in STATE1
or it is a T in STATE3. Assuming that it makes a difference, we can
write:
STATE1 :: [LOW:0#0 HIGH:1#1]
or
STATE3 :: [LOW:0#0 HIGH:1#1]
to specfy explicitly in which state the BOX is to be represented.
This kind of ambiguity cannot occur with our SHAPE type because it
references no type twice.
23.5.2
Examining Instances Of Variant Types
Because of the inherent uncertainty in any variant type, each
examination of a variant object must always first resolve the
uncertainty. We provide exactly one way to do this, the CASE
statement. The expression:
CASE variant_object OF
STATE1: something1
STATE2: something2
...
STATEk: somethingk
ENDCASE
is something like a giant IF-THEN-ELSE sequence. It read like:
IF variant_object is in STATE1 THEN something1
else IF variant_object is in STATE2 THEN something2
...
else IF variant_object is in STATEk THEN somethingk
However, there is something fundamentally unique about the CASE
construction which is entirely absent from any IF-THEN-ElSE.
The CASE statement maintains a very definite separation between the
uncertainty associated with a variant object and the various
certainties which arise only after the variant object is examined.
That is, if S is a variant object (e.g., a SHAPE), the following
different things are known about S at different positions throughout
the CASE construction:
" S is uncertain "
CASE S OF
STATE1: " S is certain, in fact, S is a BOX "
STATE2: " S is certain, in fact, S is a POLYGON "
ENDCASE
" S is uncertain "
This separation between uncertainty and certainty is actually
implemented by changing the variable S's type throughout the CASE
construction:
" S is of type SHAPE "
CASE S OF
STATE1: " S is of type BOX "
STATE2: " S is of type POLYGON "
ENDCASE
" S is of type SHAPE "
For example, the actual legality of some expressions depends on where
the expression appears. For example:
" S.LOW and S[1] are both illegal "
CASE S OF
STATE1: " S.LOW is legal, but S[1] is illegal "
STATE2: " S[1] is legal, but S.LOW is illegal "
ENDCASE
" S.LOW and S[1] are both illegal "
Following is a more detailed presentation of ICL's variant CASE
constructs. There is one for STATEMENTs and one for EXPRs, just like
the IF-THEN-ELSE has:
case ID of
ID : STATEMENT
ID : STATEMENT
...
ID : STATEMENT
endcase -> STATEMENT
case ID of
ID : EXPR
ID : EXPR
...
ID : EXPR
endcase -> EXPR
This syntax differs from the similar looking scalar CASE construct (yet
to come) only in that the case-EXPR here ~must ~be an ID, the name of a
variable.
That variable must be of a variant type.
That variable ~cannot be a non-variant type, ~even if there is
a coercion that maps the non-variant type into a variant
type.
All the IDs preceeding the colons must be names that appear in the
variant type's declaration (as IDs, e.g., STATE1 and STATE2).
This construction chooses the STATEMENT (or EXPR) that follows the
"ID:" naming the ~one state in which the case-variable's value resides.
This construct ~changes ~the ~type of the case-variable within each
of the clauses. The case-variable takes on the specific type implied
by the "ID:", the type following that ID in the variant type's
declaration.
The first rule, which involves STATEMENTs, does ~nothing if the
case-variable's state (e.g., STATE1) is not included among the
"ID:"s.
The second rule, which involves EXPRs, issues a ~runtime
~error if the case-variable's value resides in a state
not included among the "ID:"s.
This second rule requires, like an IF-THEN-ELSE, that all EXPRs be
of the same type. That type is the type of the result EXPR.
In addition:
You may substitute one occurence of "ID" with the keyword
"ELSE", to capture all representations not covered by the other IDs.
If you include an ELSE clause, the case-variable does ~not change type
in that ELSE clause. It retains its original variant type in the
ELSE clause.
If you include an ELSE clause, something will always be executed, and
no runtime error will ever be issued.
Example:
Suppose we have declared the type NUMBER as follows:
TYPE NUMBER = EITHER
SIMPLE = REAL
COMPLEX= POINT
ENDOR ;
If N is a variable of type NUMBER, then we may write that
number via the following:
CASE N OF
SIMPLE: WRITE(N); "N is of type REAL"
COMPLEX: WRITE(N); "N is of type POINT"
ENDCASE
We can assign into N via either of:
N:= 3.14 ; or
N:= 1#2 ;
In our WRITE example, N as assigned here first, prints 3.14.
If N is assigned via the second assignment, then our WRITE
example will print 1#2.
(It is a coincidence that both clauses of the CASE statement
look identical. There are at least two definitions for WRITE,
one for REALs and one for POINTs. Different WRITE functions
are chosen in the clauses. The SIMPLE clause chooses the
REAL-WRITE, because N is a REAL in that clause. The second
clause calls the POINT-WRITE function, because in this clause
N is of type POINT).
Example:
The following function delivers the ~real ~part of a NUMBER:
DEFINE REAL_PART( N:NUMBER ) = REAL:
CASE N OF
SIMPLE: N "(N is a REAL)"
COMPLEX: N.X "(N is a POINT)"
ENDCASE
ENDDEFN
If N is in the SIMPLE state, then the REAL N is the answer.
If N is in the COMPLEX state, then the POINT's x-coordinate
is the answer.
23.5.3
Example: A Recursive Variant
Suppose we would like to invent a new type PICTURE. Suppose our
application will form PICTUREs by any of three means:
1) Any ~POLYGON is to be a PICTURE
2) Any ~superposition of a set of PICTUREs is to be a PICTURE
3) Any PICTURE ~displaced by some amount is a PICTURE.
When we examine a PICTURE, we intend to deal with each of these three
possibilities.
Let's declare PICTURE as a variant having these three states:
TYPE PICTURE = EITHER
ONE = POLYGON
TWO = PICTURES
THREE = MOVED_PICTURE
ENDOR ;
Presumably the type POLYGON is already defined. The state
TWO references a new type, PICTURES (a set of PICTUREs) which we
declare next:
TYPE PICTURES = { PICTURE } ;
State THREE refers to a new type MOVED_PICTURE. A MOVED_PICTURE is
meant to be a PICTURE and a displacement (POINT) to be applied to that
PICTURE:
TYPE MOVED_PICTURE = [ MOVE: PICTURE BY: POINT ] ;
Presumably when we go to plot a PICTURE, in state THREE we will apply
the displacement (the BY field) to all coordinates as we plot the
original (unmoved) PICTURE (the MOVE field).
Notice that the type PICTURE is recursive. It refers to the types
PICTURES and MOVED_PICTURE, which themselves refer back to the type
PICTURE. These three type declarations together are perfectly valid.
23.5.3.1
A Recursive Function To Examine The Variant
Let's define a recursive function that examines a PICTURE. One very
useful calculation to be made upon a PICTURE is its ~minimum ~bounding
~box. The minimum bounding box (MBB) is the smallest box that
contains the entire PICTURE. (BOXes are rectangles having only
vertical and horizontal edges).
Before we write this MBB function for the type PICTURE, we will need
the following functions, which we will assume are given:
a) polygon_mbb( ~POLYGON ) -> ~BOX
b) ~BOX \max ~BOX -> ~BOX
c) ~BOX \at ~POINT -> ~BOX
The first function delivers the MBB for a single POLYGON in isolation.
The second function takes the ~maximum of two BOXes. It delivers the
smallest BOX that contains both of the given BOXes. The final function
moves a BOX by a displacement in two-dimensions by a POINT.
We define MBB via:
DEFINE MBB( P: PICTURE ) = BOX:
BEGIN VAR Q = PICTURE ;
CASE P OF
ONE: "P is a POLYGON..." POLYGON_MBB(P)
TWO: "P is a set of PICTUREs..."
\MAX MBB(Q) FOR Q $E P;
THREE: "P is a MOVED_PICTURE..."
MBB(P.MOVE) \AT P.BY
ENDCASE
END
ENDDEFN
In the state TWO, P is a PICTURES, and we use Q to get at each PICTURE
in that set. We compute the MBB of each Q (by a recursive call), and
form the cummulative ~maximum of all the Qs' MBBs. (This is done using
the BOP-EXPR-QUANTIFIER rule). The resulting BOX in this case is the
smallest BOX that contains all the Qs, all the sub-pictures. It is
computed in terms of the sub-pictures' MBBs.
The third state computes its MBB by first acquiring the ~unmoved
picture's MBB ("MBB(P.MOVE)"). It then moves that MBB (\AT) by
exactly the same amount that the picture is supposed to be moved
(P.BY). That moved MBB is the MBB of the moved picture.
23.5.3.2
A Major Program Modification Implemented Simply By A Coercion Pair
We will use the type PICTURE and the MBB function to illustrate a
major program modification that involves a change in a datatype. The
change affects the legality of all existing programs that deal in
PICTUREs.
We will introduce a place in PICTURE to hold the PICTURE's
~pre-computed MBB. This will substantially speed up the MBB function.
The trick will be to ~automatically get all the pre-computations of
MBBs inserted at all the right places in all the old programs ~without
~editing ~them ~at ~all.
You may have been in a similar situation yourself: You have a datatype
and many programs dependent on it. You would like to use a new
datatype, and have one or a few functions take advantage of it.
However, you are left with all the other old programs that will break
upon instituting the change in datatype.
The following shows how to make the datatype change and yet in one
fell swoop render all old programs completely compatible with the
modification. There may be ~many old programs involved, tons of
program listings. Rather than modifying them, our method involves the
declaration of a pair of coercions, going from one type to another,
and back again.
Thus, major program modifications can be made with surprising ease.
23.5.3.2.1
An Observation About The Type PICTURE
Our type PICTURE has a delightful efficiency of representation. Figure
23.4 shows a picture where one pattern is repeated several times.
Suppose that repeated picture, a triangle and a box, resides in
variable P. We form figure 23.4 via:
{ P ;
[MOVE: P BY: 100#0] ;
[MOVE: P BY: 200#0] ;
[MOVE: P BY: 300#0] ;
[MOVE: P BY: 400#0] }
This picture P is referenced five times.
Even though you see five triangles and five boxes, only one of each is
represented. Figure 23.5 shows the memory representation of figure
23.4. Near the bottom of the figure, where P points, is the picture P,
formed by:
{ the_polygon ; the_box }
The picture P is shared among five points of view. In general, P
could be a picture having thousands of shapes. By sharing P five ways,
we save much memory.
Despite this nice memory sharing, our MBB function will not take
advantage of it. For each reference to the picture P, it will
recompute P's MBB independently. P's MBB will be computed five
times in this example.
Suppose we wish to compute P's MBB only once. One way to do that
would be to compute P's MBB once and store it with the picture itself.
That way, upon each of the five references to the picture P, P's MBB
will not be recomputed because its precomputed MBB can be taken
instead.
23.5.3.2.2
A Change In Datatypes
This observation may give us a solution to supress the MBB's
recomputation. Let's store a picture's MBB with the picture itself.
Thus, a PICTURE together with its MBB is the kind of object we would
like to deal with.
The following introduces the type M_PICTURE, a PICTURE with its MBB:
TYPE M_PICTURE = [ BODY: PICTURE MBB: BOX ] ;
Given an M_PICTURE, we acquire its MBB instantly by writing:
M_PICTURE . MBB
If we could replace all occurences of the type PICTURE with the
type M_PICTURE, we would not have to recompute MBBs.
Even if there already exists many lengthy programs that deal with the
type PICTURE, we can in one fell swoop replace the type PICTURE with
the type M_PICTURE without having to go back and modify those many
programs. It takes basically four steps to pull this off.
Step 1
First, we modify the type PICTURE so as to reference M_PICTUREs instead
of PICTUREs. This way, whenever a sub-picture is acquired (e.g., P),
that sub-picture's MBB will be included, because that sub-picture will
now be of type M_PICTURE, which has an MBB.
Our modified types follow:
TYPE PICTURE = EITHER
ONE = POLYGON
TWO = ~M_PICTURES
THREE = ~MOVED_PICTURE
ENDOR ;
TYPE ~M_PICTURES = { ~M_PICTURE } ;
TYPE ~MOVED_PICTURE = [ MOVE: ~M_PICTURE BY: POINT ] ;
We've replaced all appearences of the type PICTURE with the type
M_PICTURE, except for the one occurence of the name PICTURE preceding
the EITHER. Thus, the type PICTURE now references only M_PICTUREs.
Unfortunately, upon modifying the type PICTURE, all our old programs
break down. Those programs expect to see a PICTURE where we now
present an M_PICTURE instead.
Step 2
We render all our old programs compatible with our new types not by
modifying those programs, but instead by making the types PICTURE and
M_PICTURE interchangeable. We provide the coercions:
~M_PICTURE -> ~PICTURE
and
~PICTURE -> ~M_PICTURE
These make the two types interchangeable from the programmer's point
of view.
Now, any (old) context that demands a PICTURE will get a
PICTURE even if presented with an M_PICTURE.
Where our new PICTURE type delivers an M_PICTURE (where it used to
deliver a PICTURE), this coercion will apply, changing the given
M_PICTURE into a PICTURE. The first coercion thus renders compatible
old programs that ~examine PICTUREs.
The second coercion renders compatible old programs that ~create
PICTUREs. Our PICTURE specification:
[ MOVE: P BY: 100#0 ]
supplies the PICTURE P as the value for the MOVE field. However, our
change in type definitions requires the MOVE field to be now an
M_PICTURE instead of a PICTURE. The second coercion, from type PICTURE
to M_PICTURE, makes this leap for us.
Figure 23.6 illustrates that the picture P is coerced to an M_PICTURE,
so as to be valid for the specification for the MOVE field.
Similarly, the specification:
{ P ;
[ MOVE: P BY: 100#0 ] ;
[ MOVE: P BY: 200#0 ] ;
[ MOVE: P BY: 300#0 ] ;
[ MOVE: P BY: 400#0 ] }
is a list of PICTUREs. This list of PICTUREs, in order to form a
single PICTURE, must now be interpretted as a list of M_PICTUREs. Here
again, the coercion from PICTURE to M_PICTURE makes this so for each
element, without any modification to this specification.
Figure 23.7 illustrates this.
We actually introduce these two coercions via:
LET X: M_PICTURE BECOME PICTURE BY
X.BODY ENDDEFN
LET X: PICTURE BECOME M_PICTURE BY
[ BODY: X MBB: MBB(X) ] ENDDEFN
The coercion from M_PICTURE to PICTURE merely extracts the M_PICTURE's
BODY field, a PICTURE. The other coercion, from PICTURE to
M_PICTURE, forms an M_PICTURE whose BODY is the given PICTURE, and
whose MBB is the one computed from the given PICTURE.
This overall second step has rendered interchangeable the types
PICTURE and M_PICTURE (from the programmer's point of view, but not
from the computer's). Thus, as far as the programmer is concerned,
all the old programs will still work. (Of course, they must first be
recompiled, in light of the new types and coercions).
Step 3
As of now, we have our new desired type M_PICTURE, and we've rendered
all old programs compatible with this change of types. We could stop
now, and still have working programs, although nothing yet has been
gained. Our MBB function still calls itself recursively to discover
the MBBs of sub-pictures. It still will compute P's MBB five times in
our example.
We finish our overall modification by rewriting just our MBB function,
so as to now take advantage of the precomputed MBBs in sub-pictures
(since sub-pictures are of type M_PICTURE now). Our modified MBB
function follows:
DEFINE MBB( P: PICTURE ) = BOX:
BEGIN VAR Q = M_PICTURE ;
CASE P OF
ONE: POLYGON_MBB( P )
TWO: \MAX Q.MBB FOR Q $E P;
THREE: P.MOVE.MBB \AT P.BY
ENDCASE
END
ENDDEFN
Now this function doesn't call itself recursively upon sub-pictures.
It merely extracts the precomputed MBB from each immediate sub-picture
(of type M_PICTURE).
The MBB function is now fast, as it includes no recursive calls.
We've now taken advantage of the precomputed MBBs, whose existence is
made possible by our new type M_PICTURE. Except for one detail,
we have assured that the picture P will have its MBB computed only
once.
Step 4
Let's make the new PICTURE now be what we've been calling M_PICTURE.
By doing this, all old programs, which still refer to the name PICTURE,
will wind up using our new type (M_PICTURE).
Let's rename our two types:
1) Rename PICTURE now to PICTURE1
and
2) Rename M_PICTURE to PICTURE
Again, these renames need be made only in the small amount of program
text we've introduced or modified just now.
Before we perform this name change, consider that our current
declaration of the variable P has P being of type PICTURE. That means
that after the rename, P will be of the type we called M_PICTURE.
Thus, our assignment into P:
P := the triangle and the box ;
stores an M_PICTURE into P, so P now has its own precomputed MBB.
Upon the five references to P:
{ P ;
[ MOVE: P BY: 100#0 ] ;
[ MOVE: P BY: 200#0 ] ;
[ MOVE: P BY: 300#0 ] ;
[ MOVE: P BY: 400#0 ] }
each reference to P has already its computed MBB. P's MBB will never
again be computed.
23.5.3.2.3
Summary
We took our one type PICTURE and modified it so as to include something
new, a precomputed MBB. This change in datatypes was accomodated by
four steps:
1) Invent the new, desired type, M_PICTURE
Modify the old type PICTURE so as to reference the new type
M_PICTURE instead of PICTURE, where desired.
2) Introduce a pair of coercions, so that the old type (PICTURE)
and the new type (M_PICTURE) become entirely interchangeable
as far as the programmer is concerned.
This step renders all old programs compatible with the change.
3) Modify the function(s) that can take advantage of the new
representation. In our example, that function was MBB.
4) Switch the names of the new and old types. Thus, the old
name, which is used all over, will denote the new type, and
in our case, will have precomputed MBBs.
When all the old programs are recompiled, the new types and coercions
will cause the program to be recompiled differently. The new coercions
will be used everwhere that they are needed, so as to retain
compatibility.
23.6
Processes
A ~process in ICL denotes the ~potential ~for ~future ~execution of
a program. That is, a program may be packaged and passed around as
data, like all other kinds of data. A process can be ~invoked at a
later time. The invocation of a process causes the packaged program
to execute for the first time.
The notation //...\\ is used to package a program, and hence to create
a process. (The "..." is the program).
Given a process, we invoke it with the notation <*...*>. (The "..." is
the process).
For example, the following EXPR is a process:
//
CRLF;
WRITE( TIME_OF_DAY );
\\
This process, when invoked, will print a carriage-return and line-feed
(CRLF) and then write the time of day (as of when this process is
invoked). For example, we can assign this EXPR to a variable:
X:= // CRLF;
WRITE( TIME_OF_DAY ); \\ ;
This does ~not cause the time of day to be printed. It sets X to hold
the potential for this inner program's future execution. In the
future, we might invoke X, as in:
<* X *> ;
At this future time, the packaged program is executed and hence prints
out the time of day as of this invocation. (TIME_OF_DAY is a
parameterless function that delivers possibly a different time of day
each time it's called).
The //...\\ represents a break in time, as shown:
present
// future \\
present
The program in the //...\\ is executed only in the future, when this
process is invoked. Surrounding the //...\\, we are in the present.
Consider the following STATEMENTs:
I:= TIME_OF_DAY ;
X:= // CRLF;
WRITE( TIME_OF_DAY ); \\ ;
WRITE(I);
WRITE(TIME_OF_DAY);
The present execution of these sets I to the present TIME_OF_DAY, sets
X to a process, and then WRITEs I and TIME_OF_DAY. The last two WRITEs
print (nearly) the same thing, the present time of day.
The WRITE within the //...\\ has not executed yet. It will occur in
the future, when we execute:
<* X *> ;
At that future point, we will see the future time of day. This will
be different from the two times of day printed when X was assigned
this process.
23.6.1
Preserving Values From The Present Into The Future
Because the //...\\ contains the future, we must be careful how values
~presently available become available in the ~future. A present value
is the value in I.
Suppose we wanted our process to print not the time
of day as of when the process will be invoked, but instead, print the
value in I, the time of day as of when the process was created (when
we assigned it to X).
One can imagine that the following will set X to a process that ~will
print our ~present time of day:
I:= TIME_OF_DAY ;
X:= // CRLF;
WRITE(I); \\ ;
This process in X ~will print I, a value ~presently in I.
Unfortunately, if I is a local variable, I may be long gone by the time
the future arrives, when we will invoke X.
Because local variables have finite lifetimes, we must forbid access to
local variables from within processes. We simply can't afford to have
a process's future invocation reference a non-existent variable,
like the local variable I. (Global variables, however, can be
referenced).
Therefore, the example process which references I is in fact illegal:
X := // CRLF;
WRITE(I); \\ ;
However, we can bring into the future the values presently held in
local variables. For example, the following retains access to I's
value via a new notation:
I:= TIME_OF_DAY ;
X:= //[I;]
CRLF;
WRITE(I); \\ ;
This is now a legal program.
By enclosing in square-brackets the local variable(s) whose value(s)
you want to retain, you can in fact transfer present values into the
future. You may continue to reference the variable(s) within the
//...\\.
23.6.2
Processes Can Take Inputs And Deliver Values Upon Each Invocation
Processes are broken into four groups, based on the answers to the
two following questions: Does the process take any inputs ~when ~it
~is ~invoked? Does it produce a value as a result of its invocation?
The simplest form of process takes in no inputs and produces no output.
This kind of process is denoted by the following rule:
// \\ -> TYPEX
This deaf-mute type of process is given the name we've used throughout
this text, as declared by:
TYPE SIMPLE_PROCESS = // \\ ;
Instances of SIMPLE_PROCESSes are formed like we have seen (e.g., at
the beginning of the previous section), via the syntax rule:
* // STATEMENT \\ -> EXPR
Following the "//", values of local variable can be retained by
appending the notation:
[ ID ; ID ; ... ID ; ]
as in:
//[ID;ID;...ID;] STATEMENT \\
SIMPLE_PROCESSes are invoked like we have done, via the syntax rule:
* <* EXPR *> ; -> STATEMENT
(The terminating semicolon is required, and was chosen so as to end
in the same way that assignment statements end, with a semicolon).
The next type of process delivers an output upon invocation.
// TYPEX \\ -> TYPEX
This type of process produces a value upon invocation. The ~type
of the invocation's result appears here inside the //...\\.
Example:
Consider the following type declaration:
TYPE CHAR_PRODUCER = // CHAR \\ ;
A CHAR_PRODUCER delivers a CHAR upon each invocation. For
example, let's declare X:
VAR X = CHAR_PRODUCER ;
and assign it a value:
X := // 'A' \\ ;
The future invocation of X, <*X*>, represents the CHARacter
'A'. That is:
WRITE( <*X*> );
prints the character 'A'.
ICL has a function named TTY_INPUT that delivers the next
character typed at the terminal. The CHAR_PRODUCER:
// TTY_INPUT \\
will deliver upon each invocation the next character from the
terminal.
Example:
Suppose we have a function UPPER_CASE which maps a CHAR to a
CHAR:
upper_case( ~CHAR ) -> ~CHAR
We can actually take any CHAR_PRODUCER and render it so that
it always delivers uppercase CHARaracters.
We define UPPER_CASE1, which maps a CHAR_PRODUCER to a
CHAR_PRODUCER:
DEFINE UPPER_CASE1( X: CHAR_PRODUCER ) = CHAR_PRODUCER:
//[X;]
UPPER_CASE( <*X*> )
\\
ENDDEFN
UPPER_CASE1's result is a CHAR_PRODUCER. This CHAR_PRODUCER
retains access to X, the given CHAR_PRODUCER, so that it can
invoke X so as to receive the next character.
The next character, <*X*>, is then rendered UPPER_CASE and it
stands as the result of the new CHAR_PRODUCER. Thus our new
CHAR_PRODUCER always yields each character delivered by the
given CHAR_PRODUCER rendered in uppercase.
Example:
Consider the declaration:
TYPE TEXT_PRODUCER = // TEXT \\ ;
An example TEXT_PRODUCER delivers 'AM' or 'PM' dependent on the
time of day:
// IF TIME_OF_DAY < '12:00:00' THEN 'AM'
ELSE 'PM' FI \\
The general type declaration:
TYPE T0 = // T1 \\ ;
delivers effectively the following rules:
// ~T1 \\ -> ~T0
and
<* ~T0 *> -> ~T1
These are supported by the syntax rules:
* // EXPR \\ -> EXPR
and
* <* EXPR *> -> EXPR
As with all processes, values present in local variables can be
retained by following the "//" with:
[ ID ; ID ; ... ; ID ]
// ( TYPEX , TYPEX , ... , TYPEX ) \\ -> TYPEX
This process type denotes a process that takes inputs and produces no
output. This is our first encounter with processes that take inputs
upon invocation. The ~types of the inputs appear here between the
parentheses.
The declaration:
TYPE T(0) = //(T(1),T(2),...,T(k))\\ ;
effectively delivers two rules, one for invocation and one for
synthesis. The first rule shows invocation:
<* ~T(0) *> ( ~T(1), ~T(2), ..., ~T(k) ) ; -> STATEMENT
It is like our other invocations except that inputs
are passed in upon invocation, enclosed in parentheses following the
<*...*>. Except for the "<*" and "*>", this looks exactly like a
normal procedure call. This rule is supported by the general syntax
rule:
* <* EXPR *> ( EXPR , EXPR , ... , EXPR ) ; -> STATEMENT
---------------- Parentheses in previous paragraph around "0", "1"
---------------- "2", and "k" mean subscripting! -----------------------
The second rule shows synthesis:
// ( ID(1):TYPEX(1) ... ID(k):TYPEX(k) ) STATEMENT \\ -> ~T(0)
It is like our other process's syntheses, except that the inputs must
be specified following the "//". The inputs are specified exactly
as they are for functions. Within the parentheses, each input is
specified as:
ID : TYPEX
This declares the input's type (the TYPEX). The ID names the variable
that will hold that input. Such input variables may be referenced
within the STATEMENT enclosed in //...\\. This synthesis notation
is supported by the general syntax rule:
* // ( ID : TYPEX ... ID : TYPEX ) STATEMENT \\ -> EXPR
---------------- Parentheses in previous paragraph around "0", "1"
---------------- and "k" mean subscripting! ---------------------------
Example:
Consider the type declaration:
TYPE CHAR_CONSUMER = // ( CHAR ) \\ ;
This type of process takes in one input, of type CHAR. (The
parentheses indicate that CHAR is an ~input, as opposed to an
output). Here is a CHAR_CONSUMER:
// ( C : CHAR )
WRITE( C );
\\
We assign it into the variable X:
X:= //(C:CHAR) WRITE(C); \\ ;
We invoke X and pass in the required character via:
<*X*>( 'A' ) ;
This invocation causes the CHARacter 'A' to be printed.
One can create similar CHAR_CONSUMERs that put the character
out to the disk, tape, or a UNIX pipe. The type CHAR_CONSUMER
thus provides a device-independent representation for any
device that consumes a character.
Example:
The following function will deliver a CHAR_CONSUMER that feeds
a given CHAR_CONSUMER only uppercase characters:
DEFINE UPPER_CASE2( X: CHAR_CONSUMER ) = CHAR_CONSUMER:
//[X;] (C:CHAR)
<*X*>( UPPER_CASE(C) );
\\
ENDDEFN
The resulting CHAR_CONSUMER, when invoked, takes in a CHAR via
the variable C. The invocation causes the UPPER_CASE rendition
of that character to be fed to X, the given CHAR_CONSUMER.
Notice how we invoke X so as to feed it that new character.
(X is enclosed in square-brackets after the "//" so as to
retain access to the given CHAR_CONSUMER, X).
This UPPER_CASE2 function provides a device-independent way to
render any device as one which translates its given characters
into uppercase.
Finally, here is the fourth kind of process:
// TYPEX ( TYPEX , TYPEX , ... , TYPEX ) \\ -> TYPEX
This process type denotes processes that take inputs and produce a
result. The declaration:
TYPE T = // T(0) ( T(1), T(2), ..., T(k) ) \\ ;
effectively delivers the following rule for invocation:
<* ~T *> ( ~T(1) , ~T(2) , ... , ~T(k) ) -> ~T(0)
That is, the invocation of an object of type T, given the correct types
of data for the inputs (T(1),T(2),...,T(k)), delivers an object of
type T(0), the process T's output type.
---------------- Parentheses in previous paragraph around "0", "1"
---------------- "2", and "k" mean subscripting! -----------------------
This notation is identical to the notation used earlier to invoke a
non-value-returning process, except that here the terminating
semicolon is absent. This notation is supported by the general syntax
syntax rule:
* <* EXPR *> ( EXPR , EXPR , ... , EXPR ) -> EXPR
The following rule shows synthesis:
// ( ID(1):TYPEX(1) ... ID(k):TYPEX(k) ) ~T(0) \\ -> ~T
This notation is identical to the notation used earlier to synthesize
a non-value-returning process, except that inside the //...\\, an EXPR
(of type T(0)) is required instead of a STATEMENT.
---------------- Parentheses in previous paragraph around "0", "1"
---------------- and "k" mean subscripting! -----------------------
This notation is supported by the general syntax rule:
* // ( ID:TYPEX ... ID:TYPEX ) EXPR \\ -> EXPR
The inputs are specified like we've seen already, just like within the
first line of a function.
Example:
Consider the type declaration:
TYPE FUNCTION = // REAL ( REAL ) \\ ;
A FUNCTION is a process that takes in one REAL and delivers a
REAL.
Suppose F is a variable of type FUNCTION. We assign it the
function "SIN(X)+1" via:
F:= //(X:REAL) SIN(X)+1 \\ ;
We invoke F and pass it its input via:
<*F*>( 3.6 )
This EXPR denotes the value SIN(3.6)+1. That is:
WRITE( <*F*>( 3.6 ) ) ;
prints that value.
Example:
The following function squares any given FUNCTION:
DEFINE SQUARE( F: FUNCTION ) = FUNCTION:
//[F;](X:REAL)
<*F*>(X) ^ 2
\\
ENDDEFN
F is retained in square-brackets so that the resulting FUNCTION
can invoke F in its effort to compute the square of F's value.
We pass our input, X, into that invocation of F.
Example:
The following function performs numeric integration:
DEFINE INTEGRATE( F: FUNCTION LOW,HIGH,DX: REAL )
= REAL :
BEGIN VAR X=REAL;
+ <*F*>(X) * DX FOR X FROM LOW TO HIGH BY DX;
END
ENDDEFN
(The function body is built from the rule that supports
cummulative operators:
BOP EXPR QUANTIFIER ).
23.6.3
In All Processes
1)
Following the "//", you can retain values in local variables by
writing either:
[ ID ; ID ; ... ID ; ]
or
{ ID ; ID ; ... ID ; }
Such specification does not alter the type associated with the //...\\.
There is a difference between the square- and curly-brackets. This
difference revolves around what might happen if the process assigns a
new value into any of the variables (the IDs).
With curly-brackets, the new values will be retained for the next
invocation. With square-brackets, only the original values are
retained, and the newly assigned values are lost upon the next
invocation.
For example:
I:= 5;
X:= //{I;} CRLF; WRITE(I);
I::= + 1 ; \\ ;
sets X to a process whose repeated invocation produces the lines:
5
6
7
8
...
If square-brackets were used instead of the curly-brackets, all the
lines would show 5.
2)
Each "ID;" between square- or curly-brackets can in fact be a general
assignment statement, as in the EXPR:
// [ I:= TIME_OF_DAY; ]
CRLF;
WRITE(I);
\\
Such enclosed assignments are carried out now, when the process is
being synthesized, and not later (during invocation). This example is
equivalent to:
DO I:= TIME_OF_DAY ;
GIVE //[I;] CRLF;
WRITE(I); \\
3)
New local variables can be declared within a process, as in:
// BEGIN VAR J=TEXT;
J:= TIME_OF_DAY;
WRITE(J);
END
\\
It is only local variables declared ~outside the //...\\ that may need
to be enclosed by the square- or curly-brackets.
Local variables declared inside the //...\\ naturally need to be
initialized upon each invocation, like within any function.
23.7
private TYPEX -> TYPEX
This forms a new type whose representation is identical to the given
TYPEX. The new and original types however are distinguished
linguistically. They are not interchangeable except via an
explicit notation.
For example, we might have a type INTEGERS, a list of any old integers.
In contrast, we may be writing programs that deal only with ~sorted
lists of integers, where the integers are assumed to be in increasing
order. A sorted list of integers is represented exactly like any list
of integers. However, we want to distinguish sorted lists from
unsorted lists.
We introduce a ~new type SORTED_INTEGERS as follows:
TYPE SORTED_INTEGERS = PRIVATE INTEGERS ;
SORTED_INTEGERS and INTEGERS are now distinct types although they are
represented alike.
You can associate with new PRIVATE types any properties you wish, such
as sortedness. While you are aware of such special properties, ICL
need not know of them. ICL blindly keeps the two types distinct
except for explicit notations that turn one type into the other.
To turn an INTEGERS into a SORTED_INTEGERS, the following rule is
contributed by the PRIVATE declaration:
sorted_integers ::: ( ~INTEGERS ) -> ~SORTED_INTEGERS
The name of the private type, SORTED_INTEGERS, followed by three
colons is the only way to create new instances of type SORTED_INTEGERS.
This "sorted_integers:::(...)" is a ~stamp of approval. Thus, you
control explicitly the creation of SORTED_INTEGERS. It is your
responsibility to guarantee that the INTEGERS in this case is sorted.
While a stamp of approval is required to go from INTEGERS to
SORTED_INTEGERS, a different stamp of approval goes the other way:
publicize ::: ( ~SORTED_INTEGERS ) -> ~INTEGERS
Here, the word "publicize" followed by three colons lets the private
type be seen as the original (public) type.
Thus, to go between the original type and the private type in either
direction requires a special stamp of approval.
To check a program's integrity, you need verify only that the "..." in
each occurence of "SORTED_INTEGERS:::(...)" is in fact sorted. For the
rest of your program specification, ICL assures that SORTED_INTEGERS
and any other type (like INTEGERS) can't be confused.
23.7.1
Coercions To Sew Up The Distinction
Often, like in this example, we may wish that the private type
(SORTED_INTEGERS) be available also as an instance of type INTEGERS
without requiring any stamp of approval. After all, a sorted list of
integers is also a perfectly valid list of integers. We introduce a
coercion from type SORTED_INTEGERS to INTEGERS as follows:
LET X: SORTED_INTEGERS BECOME INTEGERS BY
PUBLICIZE:::(X) ENDDEFN
Within this coercion, the "PUBLICIZE:::(...)" stamp of approval is
required. However, from now on, no such stamp is required. A
SORTED_INTEGERS appearing in a context that demands an INTEGERS will
be allowed, as this coercion will provide the translation implicitly.
Now, the only stamp of approval required is the
"SORTED_INTEGERS:::(...)", in going from INTEGERS to SORTED_INTEGERS.
We can make this latter stamp also implicit, by introducing a second
coercion:
LET X: INTEGERS BECOME SORTED_INTEGERS BY
SORTED_INTEGERS:::( X SORTED_BY I:I INCREASING )
ENDDEFN
Here we sort the integers and simultaneously supply the stamp of
approval.
Given both coercions, the type INTEGERS and SORTED_INTEGERS become
entirely interchangeable. If you want sorted integers, demand the type
SORTED_INTEGERS.
For example, suppose we define a function WRITE_SORTED which needs as
its input a sorted list of integers:
DEFINE WRITE_SORTED( X: SORTED_INTEGERS ):
BEGIN VAR I=INT;
FOR I $E X; DO CRLF; WRITE(I); END
END
ENDDEFN
This function can assume its input is sorted, even if this function
is given an unsorted list. (If it were called with an
unsorted list, an instance of type INTEGERS, then to make the call
possible, the coercion from INTEGERS to SORTED_INTEGERS will be
invoked, thus delivering sorted integers).
In looking at a program listing, you may have variables of type
INTEGERS. You may discover perhaps greater program efficiency if you
require one or more of those variables to contain always sorted lists.
To make the change, all you have to do is change the variable's
declaration to use the type SORTED_INTEGERS instead of INTEGERS. Upon
recompilation, you can rest assured that no matter what, the variable
will always hold a sorted list.
23.7.2
Another Example - The Incorporation Of Units
Some applications deal with physical units, like ~seconds, ~hours,
~inches, or ~feet. For clear specification, you may wish to require
the specification of units. For example, a function BOIL_WATER may
require an ~amount ~of ~time to specify how long to boil. We may wish
it to be called via either of:
BOIL_WATER( 20 \MINUTES ) ;
or
BOIL_WATER( 1.5 \HOURS ) ;
This specification is clearer than:
BOIL_WATER( 20 ) ;
because here the units are unspecified. Just upon reading this,
to make sense of it, you have to recall what units BOIL_WATER assumes.
Let's go on to require the specification of units as follows. First,
we invent the type TIME:
TYPE TIME = PRIVATE REAL ; "Units are ~seconds"
This makes TIME a new type which is represented by a REAL, whose units
we will agree are always seconds, say. The specification "20" is of
type REAL, but ~not of type TIME.
Let's provide for the specification of TIME in terms of minutes and
hours. We introduce the rules:
~REAL \minutes -> ~TIME
~REAL \hours -> ~TIME
The following function definitions makes this so:
DEFINE MINUTES( X:REAL ) = TIME:
TIME:::( X*60 )
ENDDEFN
DEFINE HOURS( X:REAL ) = TIME:
TIME:::( X*3600 )
ENDDEFN
When we apply the stamp of approval "TIME:::(...)", we first turn the
given REAL into a number of seconds, as is required by our agreement
about the type TIME.
Notice that we provide no coercion from REAL to TIME, and so "20" is
itself not TIME.
However, we may wish to allow TIME to become a REAL implicitly. (That
REAL will always be a number of seconds). For example, many programs
may want to get at the REAL value of TIME. We relieve the need for
"PUBLICIZE:::(...)" via the coercion:
LET T: TIME BECOME REAL BY
PUBLICIZE:::( T ) ENDDEFN
Thus, expressions like:
A * T + B
can be written even if T is of type TIME. This expression is
definitely of type REAL and ~not TIME.
23.8
scalar ( ID , ID , ... , ID ) -> TYPEX
This forms a new type. Instances of this new type are precisely
the set of IDs, and one more, namely NIL. (NIL is the value taken
on by an unspecified scalar, e.g., an unspecified component in a
record).
Consider the following declaration:
TYPE TRAFFIC_LIGHT = SCALAR( GREEN, YELLOW, RED ) ;
Instances of TRAFFIC_LIGHT are precisely the names GREEN, YELLOW, RED,
and NIL. This is supported by the syntax rule:
ID -> EXPR
Scalars may be compared for equality via the syntax rules:
= -> BOP (equal)
<> -> BOP (not equal)
Scalars may also be examined by the ~scalar CASE construction:
case EXPR of
ID: STATEMENT
ID: STATEMENT
...
ID: STATEMENT
endcase -> STATEMENT
This rule also exists for EXPRs:
case EXPR of
ID: EXPR
ID: EXPR
...
ID: EXPR
endcase -> EXPR
The case-EXPR must be of a scalar type, like TRAFFIC_LIGHT. Each of
the IDs must be one of the names in the scalar type, or the name ELSE.
The case-EXPR is evaluated and then one of the "ID: STATEMENT" (or
"ID: EXPR") is executed, the one having the matching name (ID).
If no ID matches the value in the case-EXPR, then nothing is executed
unless there is an ELSE clause, in which case the ELSE clause
executes. In the absence of an ELSE-clause, when no match
is found, the CASE construct for STATEMENTs executed nothing. For the
EXPR form, a runtime error is issued and the value NIL becomes the
value for the whole CASE construct.
For example, the following examines the TRAFFIC_LIGHT variable
PRESENT_LIGHT:
CASE PRESENT_LIGHT OF
GREEN: NEXT_LIGHT:= YELLOW;
YELLOW: NEXT_LIGHT:= RED;
RED: NEXT_LIGHT:= GREEN;
ENDCASE
This can also be rendered as:
NEXT_LIGHT:= CASE PRESENT_LIGHT OF
GREEN: YELLOW
YELLOW: RED
RED: GREEN
ENDCASE ;
23.9
disk TYPEX -> TYPEX
This forms a new type, a potentially ~disk-resident version of the
given TYPEX.
Let's use the following example declaration:
TYPE DISK_OBJECT = DISK CORE_OBJECT ;
No matter how large the "core_object" might be, its "disk_object"
representation consumes minimal memory when the core_object is swapped
out. Figure 23.8 shows a disk_object along with its core_object.
If you hold onto an instance of DISK_OBJECT, you give ICL the freedom
to swap-out its core_object. In contrast, if you hold onto an instance
of CORE_OBJECT, you supress the possibility of its being swapped out.
This declaration immediately delivers the following rules. The first
concerns swap-in. We get the coercion
~DISK_OBJECT -> ~CORE_OBJECT
That is, you needn't say anything to gain access to the core_object.
Just treat the disk_object as though it were a core_object. If the
core_object is presently swapped out, it will be swapped in
automatically.
The following rule creates a new disk_object given a core_object:
~CORE_OBJECT \disk -> ~DISK_OBJECT
After obtaining a disk_object this way, if you hold on only to the
disk_object and not the core_object, ICL will be given the freedom to
swap out the core_object, in order to make more memory available.
(An example using this notation follows shortly).
Subtleties:
The ~sharing of memory blocks is exposed in ICL by the use of
the @-operator. Therefore, we must mention that sharing can
be lost upon the first swap-out/swap-in sequence.
1) Sharing between distinct disk_objects:
Initially, two disk objects' in-core representations may share
some memory blocks in common (figure 23.9). This sharing is
lost when either of the two disk objects has its in-core
representation swapped out. Such initial sharing is lost
forever. This is an effect like ~entropy.
For example, let's declare the following:
VAR D1, D2 = DISK_OBJECT ;
C1 = CORE_OBJECT ;
Supposing we've set something into C1, and then say:
D1:= C1 \DISK ;
D2:= C1 \DISK ;
Before either of D1 or D2 gets swapped out, any objective
modification to C1 will be apparent from the points of view
of D1 and D2, e.g., the assignment:
@(C1) := something ;
will modify C1, and each of D1 and D2 will see the same effect.
However, once D1 and D2 swap out, the effect of that
assignment will not be seen by D1 or D2.
Because you cannot control what swaps when, you cannot rely on
the preservation of sharing within the in-core representations
of ~distinct disk objects.
Consolations:
Sharing is always preserved within a single disk object.
Figure 23.10 shows such sharing.
If a ~disk ~object is shared between two other objects, even
disk objects, like in figure 23.11, that sharing will be
preserved forever.
2) Locking structures in memory:
You can control swapping to the extent that you can lock an
object in core.
You lock a object in memory by maintaining a reference to the
disk_object's in-core representation. For example, the
following locks in the object D1:
C1:= D1 ;
The in-core representation of D1 stays in core at least until
the reference from C1 ceases to exist, e.g., until you write:
C1:= NIL ;
or the local variable C1 goes away, e.g., at function
termination.
3) Modifications via the @-operator
Applications of the @-operator upon the in-core representation
of a disk_object ~may or ~may ~not be remembered from the
points of view of disk_objects!
An @-assignment is remembered securely only if the disk_object
was locked in-core at the time you performed the @-assignment.
(You don't have to keep it locked forever. Just keep it locked
during the actual @-assignment).
All the subtleties mentioned here are of concern only from the points
of view of disk_objects.
Nothing here, including swap-out, affects any other part of ICL. That
is, references to memory blocks within in-core representations will not
be ripped out from under you just because a disk_object swaps out.
Rather, those memory blocks cease to be part of the disk_object, even
when the disk_object swaps back in.
See Chapter 33 on memory management to see how disk_objects are
implemented.
23.9.1
Virtual Memory
All disk-resident data resides in what is called a ~virtual ~memory
(VM) file(s). Thus, entire databases can be kept in a VM file.
Whenever you start up ICL, you specify a VM file which holds the
database you want to use. That VM file is opened for read-only access.
All changes to data go not back into the original VM file, but into
a brand new VM file, whose name you also specify upon ICL start up.
Thus, each session with ICL reads from one VM file and writes to
another VM file.
Upon completion of an ICL session, the second, (write), VM file
becomes a valid database. It contains only the ~differences between
the new database and the old. Both the new and old databases can
persist. After a sequence of ICL sessions, you wind up with a sequence
of VM files making up your database.
We provide a program that ~collapses such a sequence of VM
files into one, stand-alone VM file (database). The program doesn't
change the data in the database, it merely organizes it into one file.
It also ~garbage ~collects the database, so that unreferenced data
cease to exist and take up disk space.
Databases of size 150 megabytes incur a cost of about two hours
elapsed time. Such full collapses have in practice been required
only about twice a year. Partial collapses (where some of the
original sequence of files is preserved) may take 15 minutes, and
might occur once a week.
Chapter 35 describes an implementation of this ~piggy-back
virtual memory system. Chapter 33 describes garbage collection in
conjunction with the disk.
23.10
Arrays
array of TYPEX -> TYPEX
Arrays are like lists in that each represents repeated occurences of
data of some type. Lists are the best to use because of the rich
set of operators associated with lists (like appending). Lists cease
to be advantageous when random access to its elements is required
often, and the number of elements involved is not small. (ICL existed
without arrays for nearly a decade. On the few occasions when they
were needed, tree structures were used instead).
Consider the declaration:
TYPE RANDOM_ACCESSED_REALS = ARRAY OF REAL ;
This introduces the following rules. First, we get:
new( ~INT ) -> ~RANDOM_ACCESSED_REALS
This creates a new array having the number of elements specified by
the INTeger. All the elements are initialized to 0, FALSE, or NIL.
The next rule provides access to individual elements, a method called
~indexing, like we have with lists:
~RANDOM_ACCESSED_REALS [ ~INT ] -> ~REAL
Unlike with lists though, this notation is ~not valid on the lefthand
sides of assignment statements.
On the lefthand sides of assignment statements, you must use the
following notation:
@( ~RANDOM_ACCESSED_REALS ) [ ~INT ] -> EXPR
All modifications to arrays are objective, and hence the appearence of
"@". (Subjective modification, a copy-on-write policy, is generally
useless in conjunction with random access).
An array is represented by one big block of memory, whereas a list is
represented by many small blocks of memory.
A two-dimensional array is created by forming an array of arrays, as
in:
TYPE TWO_DIMENSIONS = ARRAY OF RANDOM_ACCESSED_REALS ;
A 256 by 512 TWO_DIMENSIONS is created via:
TWO_DIMENSIONS:= NEW(256) ;
FOR I FROM 1 TO 256; DO @(TWO_DIMENSIONS)[I] := NEW(512); END
The two-dimensional index "3,5" is written as:
TWO_DIMENSIONS[3][5]