CHAPTER 13
THE PARSER ALWAYS WORKS
This entire chapter may be skipped. It resorts to mathematical logic
to prove that this parser works. It may lend clarity to the parser's
properties and operation.
The following arguments are meant to apply to general rewrite grammars
as well as context-free grammars. You might choose, however to think
of a give-phrase as having length one.
Let's prove that if the parser terminates, then it works. It works
if the following is true:
If the input string is a member of the grammar's language,
then the grammar's root part-of-speech will exist, spanning
the whole input string.
Also the opposite: If the input string is ~not a member
of the language, then there will be no
full-spanning occurence of the root
part-of-speech.
The appearence of the root part-of-speech can directly commence
semantic processing, as discussed in the Section 12.2.1.
13.1
What It Means For A CHOICE_OF_PHRASES To Be ~Fully ~Ambiguated
We introduce now a formalism from which we can prove that the parser
works.
A CHOICE_OF_PHRASES is ~fully ~ambiguated when all possible rules have
been applied to all visible phrases throughout the CHOICE_OF_PHRASES.
That is, all rules have applied everywhere possible in the
CHOICE_OF_PHRASES.
Let's restate this definition more specifically, in two parts:
A CHOICE_OF_PHRASES, C, is ~fully ~ambiguated when both of
the following are true:
a) for every visible phrase emenating from C:
If that phrase matches some rule's want-phrase,
then also in C is the rule's give-phrase.
The want- and the give-phrases have identical
LEFTs (spans).
Figure 13.1 shows a want-phrase together with a give-
phrase that have identical spans.
and
b) The LEFT fields in all PHRASE blocks in C are
themselves ~fully ~ambiguated.
For example, the righthand end of figure 12.6 shows a fully ambiguated
CHOICE_OF_PHRASES, C. The visible phrase
<FORM> + <TERM>
emenates from C. The variables P3,P2,P1 point out this visible phrase.
This visible phrase is an occurence of the want-phrase of the rule:
<FORM> + <TERM> -> <FORM>
C is fully ambiguated because this rule's give-phrase, <FORM>, also
emenates from C. Both of the phrases have identical LEFTs. Both
point to the left end of the figure.
Consider the following scenario. Let C be a CHOICE_OF_PHRASES which is
fully ambiguated:
1) A friend picks ~any visible phrase anywhere in C.
The visible phrase does ~not have to emenate from C, it
could be any visible phrase emenating from any
CHOICE_OF_PHRASE ~accessible from C's point-of-view.
Figure 13.2 shows a visible phrase somewhere throughout
C. Your friend's phrase is labelled in the figure as
the "want-phrase".
2) You pick the CHOICE_OF_PHRASES from which that
visible phrase emenates. Call it RIGHTHAND.
3) RIGHTHAND is fully ambiguated, because:
Since RIGHTHAND is accessible from C, we can get from
C to RIGHTHAND by taking LEFT pointers, from one
CHOICE_OF_PHRASES to the left neighboring
CHOICE_OF_PHRASES, repeatedly. We start at C.
Since C is fully ambiguated, part B of the definition
assures that each of its PHRASE block's LEFT fields
are themselves also fully ambiguated.
Each time we take a step leftwards, we are leaving one
fully ambiguated CHOICE_OF_PHRASES, and entering
another fully ambiguated CHOICE_OF_PHRASES. Thus,
when we reach RIGHTHAND, we have deduced that
RIGHTHAND, like all CHOICE_OF_PHRASES encountered, is
fully ambiguated.
4) If your friend's chosen visible phrase happens to match some
rule's want-phrase, then the same rule's ~give-phrase ~also
~appears ~in ~RIGHTHAND, by part A of the definition, applied
to RIGHTHAND. The give-phrase and the want-phrase share the
same span.
All this means is that if you see a rule's want-phrase ~anywhere
throughout a fully ambiguated CHOICE_OF_PHRASES, that rule's give-
phrase also appears, sharing the same span.
~Sharing ~the ~same ~span means:
a) Both phrases emenate from the same CHOICE_OF_PHRASES,
and
b) The leftmost PHRASE block in each phrase have identical
values in their LEFT fields.
Figure 13.1(a) shows a want- and a give-phrase having the same span in
a visible phrase. Figure 13.1(b) shows the same for a general
rewrite rule. The give-phrase has length greater than one.
13.2
Full-Spanning Visible Phrases
Imagine the input string. Imagine now a sequence of full-spanning
strings, the first of which is the input string. Each string in the
sequence is identical to the previous string, except that one rule has
been applied. Some occurence of this rule's want-phrase in the
previous string has been replaced now by an occurence of the rule's
give-phrase.
Suppose a string in that sequence is a full-spanning visible phrase
emenating from C. If C is fully ambiguated, then the next string in
the sequence also appears as a full-spanning visible phrase, emenating
from C. Why?
The first string, a visible phrase, is a leftward going
sequence of PHRASE blocks. Somewhere in that sequence is a
sub-sequence that represents the occurence of the want-phrase,
shown in figure 13.2.
Take the whole sequence of PHRASE blocks and form a new
sequence by abandoning the sub-sequence that represents the
occurence of the rule's want-phrase. Take instead the
guaranteed occurence of the give-phrase, whose span matches
that of the want-phrase (figure 13.1(a)).
Because the spans are identical, we know that the rightmost
PHRASE block from each of the want-phrase and give-phrase
reside in the same CHOICE_OF_PHRASES. This is assured by
part A of the definition of "fully ambiguated". Our new
sequence thus chooses the give-phrases's rightmost PHRASE block
instead of the want-phrase's rightmost PHRASE block.
We substitute the want-phrase sub-sequence by the give-phrase
sequence. Since both the want- and give-phrases share the same
LEFT, both phrases meet towards the left. No matter which
of the phrases you travel left from, the leftward sequence is
identical. Thus, this new sequence of PHRASE blocks, which
contains the give-phrase, is itself a visible phrase. The new
sequence (like the original) obeys the property that each
PHRASE block is taken from the LEFT of a previous (right
neighboring) PHRASE block.
The new string, a sequence of PHRASE blocks, represents the
next string in the envisioned sequence of rewritten, full-
spanning strings.
Thus, if C is fully ambiguated, and if the input string is a visible
phrase in C, then all strings in the envisioned sequence of strings are
visible phrases in C.
------ The previous three lines are to be italics ------
If the input string is a member of the grammar's
language, then you can envision a sequence of rewritten strings which
ends with the grammar's root part-of-speech, sharing the same span with
the input string. That sequence too is represented in C, as is any
imagined sequence of rewrites, starting from the input string.
We will show that GIVE preserves the property of being fully
ambiguated. First, we assume this behavior of GIVE and show
completeness.
13.3
Completeness
Our method for taking user input has the following properties:
1) The input string is a full-spanning visible phrase in the
rightmost CHOICE_OF_PHRASES, the final value in C.
(See Section 12.4).
2) Because we set C to NIL before calling GIVE, C is always
fully ambiguated.
a) C upon entry to GIVE is fully ambiguated.
(A NIL CHOICE_OF_PHRASES is fully ambiguated
because it has no phrases. No rule's want-
phrase appears, and so it is vacuously true
that every want-phrase has a give-phrase).
b) C after GIVE is fully ambiguated, because GIVE
preserves the property of being fully ambiguated
(Section 13.3.1).
3) In the end, C is fully ambiguated.
4) If the given input string is a member of the language defined
by the grammar, then in the end, C has at least two full-
spanning visible phrases: the input string and a full-spanning
occurence of the grammar's root part-of-speech.
(This was just proven in the previous section).
This shows that the parser works at least when given a legal string,
a string that is a member of the language. Both the input string
and a full-spanning occurence of the root part-of-speech exist as
visible phrases.
We have relied on the following.
13.3.1
GIVE Preserves The Fully Ambiguated Property
Suppose C is presently fully ambiguated. A call to GIVE will leave C
still fully ambiguated.
Let C0 denote C prior to the call to GIVE. Let C1 be C after the call.
Let's prove part A of the definition of "fully ambiguated":
1) Pick any visible phrase emenating from C1. Suppose that it
matches a rule's want-phrase.
2a) If the chosen phrase is also in C0, then because C0 is
fully ambiguated (by assumption), the rule's give-
phrase already exists in C0. It shares the same span
with the want-phrase. Because GIVE never removes
elements from C, that give-phrase is also in C1.
2b) If the chosen phrase is not in C0, then we know that its
rightmost PHRASE block was introduced into C by a call to GIVE.
(GIVE is the only place that C is affected).
3) GIVE called the grammar at that time. At that time, the chosen
phrase was a visible phrase, starting from that rightmost
PHRASE block.
We know that ~now the chosen want-phrase is a visible
phrase starting from that rightmost PHRASE block.
Our consistent use of subjective
modification means that we never affect existing data.
Thus, the PHRASE block's LEFT has remained unchanged
since that PHRASE block was created. Thus, what we
see now starting from that PHRASE block was also
visible when the PHRASE block was created.
4) Since the chosen want-phrase was visible when the grammar was
called, the rule's (outermost) IF-statement matched the new
PHRASE block. The complex quantifier inside the THEN-clause
matched the rest of the chosen visible phrase (among others).
One of the iterations of that quantifier set P(1),
P(2), ... P(n) to the chosen occurence of the want-
phrase.
---------------- Parentheses in previous paragraph mean subscripting! ---
(We deduce that this iteration exists because
the chosen want-phrase,
PHRASE(k+l) ... PHRASE(k+1) PHRASE(k)
obeys the property that
PHRASE(i) is a member of PHRASE(i-1).LEFT
since it is a visible phrase. Each dimension
of the quantifier looks at ~all PHRASE(i)s
emenating from PHRASE(i-1).LEFT, and so it will
see ours).
---------------- Parentheses in previous paragraph mean subscripting! ---
5) Upon matching the chosen want-phrase, the rule then applied.
It called GIVE with the give-phrase. Both the want-phrase and
give-phrase share the same span, because:
a) Both phrases emenate from C:
When we call GIVE, it puts the new give-phrase
into C. C thus contains (the rightmost block
of) the give-phrase.
Now we show that at that time, C contained
also the rightmost block of the want-phrase.
Because of pure subjective modification,
what appeared then still appears now.
Note that we, the grammar, got called
from GIVE. GIVE had just been given a PHRASE
block. ~That ~PHRASE ~is ~the ~rightmost
~element ~in ~our ~want-phrase.
GIVE sees only visible phrases that
have as their rightmost PHRASE block,
the block passed into GIVE. Since our
rule is applying upon that want-phrase,
GIVE was passed our want-phrase's
rightmost PHRASE block.
Prior to calling us (the grammar), GIVE had
just put that PHRASE block onto C. Now both
the want- and give-phrases emenate from C.
(Their rightmost PHRASE blocks are
members of C).
b) The LEFT of both phrases are identical, because the
PHRASE passed into GIVE has as its LEFT the LEFT of the
matched phrase, P(1).LEFT, the LEFT field of the
want-phrase's leftmost PHRASE block.
---------------- Parentheses in previous paragraph mean subscripting! ---
6) Since the chosen want-phrase and the new give-phrase share the
same span, part A of the definition is proven.
13.3.2
Part B Of The Definition
Let's prove part B of the definition of "fully ambiguated". We will
proceed by assuming that part B of the definition is false. We will
then encounter a contradiction.
Consider the LEFT field of each new PHRASE block passed to GIVE.
If all the PHRASE blocks passed to GIVE have fully ambiguated LEFTs,
then all elements on C (put there by GIVE) all have fully ambiguated
LEFTs. This means part B of the definition is true for C.
Playing the devil's advocate, consider now the first call to GIVE
whose LEFT field is not fully ambiguated. By assumption, all previous
calls to GIVE got LEFTs which were fully ambiguated. Since C contains
only PHRASE blocks passed to GIVE, all LEFTs in C are fully ambiguated,
up to now. Thus, the value in C after each of those earlier calls was
fully ambiguated as per both parts of the definition. (We proved part
A earlier, and we can assume part B for all previous calls to GIVE).
All previous calls to GIVE left C fully ambiguated. Our first call
to GIVE with a non-fully ambiguated LEFT can occur in any one of three
places:
1) When taking user input:
LEFT is set to NIL initially. Thus, the first call to
GIVE receives a fully ambiguated LEFT.
Each subsequent call to GIVE delivers a LEFT field
which is taken from C, i.e.,
LEFT:= C ;
C:= NIL ;
GIVE( [LEFT:LEFT POS:<pos>] );
That value in C was the result of the previous (left
neighboring) call to GIVE.
By assumption, every LEFT passed to GIVE earlier, and
hence all LEFT fields in C, are fully ambiguated.
Part B of the definition thus holds for C. Combined
with part A, we know that C is fully ambiguated.
Thus the LEFT just passed to GIVE, which came from C,
must be fully ambiguated.
This contradicts our choice that this GIVE got a non-
fully ambiguated LEFT.
2) For context-free rules:
A context-free rule calls GIVE with a PHRASE block
whose LEFT is taken from the want-phrase. The want-
phrase was produced by earlier GIVEs. By assumption,
those earlier GIVEs all received PHRASE blocks with
fully ambiguated LEFTs.
Our new PHRASE block, the one now being passed to
GIVE, has as its LEFT one of those fully ambiguated
LEFTs.
This contradicts our choice that this GIVE got a non-
fully ambiguated LEFT.
3) For general rewrite rules:
Notice that the LEFT field for each GIVE is taken
from either:
a) An existing LEFT (from the leftmost PHRASE
block in the want-phrase).
or
b) A value in C following a call to GIVE.
The argument for context-free rules shows that (a), an
existing LEFT, is indeed fully ambiguated. Thus (a)
is not the source for a non-fully ambiguated LEFT.
The argument for taking user input shows that the
the value in LEFT passed to GIVE, upon executing:
LEFT:= C;
C:= NIL;
GIVE( [LEFT:LEFT POS:<pos>] );
is fully ambiguated.
This contradicts our choice that this GIVE got a non-
fully ambiguated LEFT.
We conclude that there is no first call to GIVE passing in a LEFT that
is not fully ambiguated. This implies that all GIVEs receive fully
ambiguated LEFTs, and hence Part B of the definition is proven.
BOX: What does it mean for a CHOICE_OF_PHRASES to be fully
BOX: ambiguated?
BOX:
BOX: Why does GIVE ~preserve the fully ambiguated property
BOX: on the variable C?
13.4
Only Legal Strings Are Accepted
We claim that ~all full-spanning visible phrases are in fact derivable
by performing rewrites upon the input string. We prove that here.
For shorthand, we will refer to full-spanning visible phrases as
"strings".
13.4.1
When One String Is ~Older Than Another
Given two strings, they may be identical starting at the left,
containing the same PHRASE blocks. Let's think of each string as a
~rightward going sequence of PHRASE blocks. (This is opposite the
direction we've used so far).
Suppose that only the first K-1 blocks in each string are identical.
This means that:
PHRASE(k) in the first string
is distinct from
PHRASE(k) in the second string.
Let's name these two PHRASE blocks:
PHRASE1(k) is the PHRASE(k) in the first string, and
PHRASE2(k) is the PHRASE(k) in the second string.
Each PHRASE(k) has a time when it was passed into GIVE. Let's denote
this time via the notation:
Give_Time( PHRASE(k) )
Since no two PHRASE blocks are GIVen at the same moment, one came
before the other, i.e.,
Give_Time( PHRASE1(k) ) < Give_Time( PHRASE2(k) )
or
Give_Time( PHRASE1(k) ) > Give_Time( PHRASE2(k) )
We say that the first string is "older than" the second string if
Give_Time( PHRASE1(k) ) < Give_Time( PHRASE2(k) )
---------------- Parentheses in previous paragraph around "k"
---------------- mean subscripting! ------------------------------------
13.4.2
Transitive Closure Of ~Older ~Than
Now suppose we have three strings, such that
string(1) is older than string(2)
and
string(2) is older than string(3)
We now show that the transitive property holds, i.e., that
string(1) is older than string(3)
Let PHRASE1(k) be the first (leftmost) PHRASE block in string(1) that
is not also in string(2). By "is older than", we have:
Give_Time( PHRASE1(k) ) < Give_Time( PHRASE2(k) )
Let's do the same for the second pair of strings. Let PHRASE(j) be
the first (leftmost) PHRASE block in string(2) that differs from
PHRASE(j) in the third string. By "is older than", we have
Give_Time( PHRASE2(j) ) < Give_Time( PHRASE3(j) )
We now show that string(1) is "older than" string(3):
1) If j=k, then we have:
Give_Time( PHRASE1(k) ) < Give_Time( PHRASE3(k) )
We also know that the (K-1)'th PHRASE block, and each one
further to the left, are identical between string(1) and
string(3).
2) If j < k, then string(1) and string(3) differ at PHRASE(j).:
a) We know that:
PHRASE1(j) = PHRASE2(j)
since j < k, and only at K (further to the right) can
string(1) and string(2) differ.
This equality makes the following true:
Give_Time( PHRASE1(j) ) = Give_Time( PHRASE2(j) )
b) We have also that:
Give_Time( PHRASE2(j) ) < Give_Time( PHRASE3(j) )
since string(2) is older than string(3), and j is the
(first) disagreement between string(2) and string(3).
Put together, we have:
Give_Time( PHRASE1(j) ) < Give_Time( PHRASE3(j) )
We also know that all three strings are identical up to
but not including PHRASE(j).
Thus, string(1) is "older than" string(3).
3) If k < j, then perform step (2), exchanging string(1) and
string(2). (This makes j < k ).
We now know that "is older than" is transitive. It is also anti-
reflexive because:
string(1) is older than string(2)
and
string(2) is older than string(1)
cannot both be true. (Compare the Give_Times).
---------------- Parentheses in previous paragraph around "1", "2",
---------------- "3", "k", "j" mean subscripting! ---------------------
13.4.3
When Does The First Illegal String Show Up?
Let's return to C, the rightmost CHOICE_OF_PHRASES. We say that a
string is "legal" if and only if it can be derived from the input
string. We show that all strings are legal:
Given an illegal string, who made it?
We proceed by forming a contradiction.
Pick an "oldest" illegal string, S.
1) The grammar didn't make it, because:
Consider the latest rule application that formed S.
It occured by noticing an instance of the want-phrase
somewhere, and then completing S by introducing the
give-phrase.
Since the want-phrase and the give-phrase share the
same span, we can take our illegal string S and replace
the occurence of the give-phrase with an occurence of
the want-phrase.
This new string, say S1, which has the want-phrase must
itself be illegal:
(If S1 were legal, then the one rewrite of
replacing the want-phrase with the give-phrase,
(producing S), would map S1 (legal) to another
legal string (S). This contradicts our choice
of S to be illegal).
S1 is "older" than S:
(S1 and S are identical to the left of the
want-phrase (also the give-phrase). The want-
phrase had to exist prior to this rule's
application, lest it never apply. Thus, all
the PHRASE blocks in the want-phrase are older
than all the PHRASE block(s) in the give-
phrase. In particular, the leftmost PHRASE
block in the want-phrase is older than the
leftmost PHRASE block in the give-phrase.
Given that S1 is illegal and that it is older than S,
we have an even older illegal string, contradicting
our choice of S as being the oldest illegal string.
2) The taking of user input didn't make S:
We've just shown that no PHRASE block in S was
generated by the grammar. The only other PHRASE blocks
are generated while taking user input.
We could just as well turn off the grammar, supress-
ing all calls to the grammar. S would still appear.
However, the only string
introduced while taking user input is the input
string. This implies that S is the input string, and
hence is legal. This contradicts our choice that S be
illegal.
Therefore, all strings in C are legal strings.
13.4.4
A Detail - The Entire Give-Phrase Came From One Rule Application
Our argument thus far assumed that the illegal string S contained
some rule's give-phrase. We assumed actually that S contained
~all the PHRASE blocks generated by one application of one rule.
That was how we could focus in on an actual application, and prove
that an older occurence of the want-phrase had to exist.
Consider the latest rule application that formed S, the one that
~triggered most recently. (A rule ~triggers precisely when its want-
phrase is recognized).
We claim that ~all the PHRASE blocks in the generated give-phrase are
also contained in S:
No subsequent rule application can affect S, by assumption.
Thus, we could just as well have turned off the grammar while
generating our give-phrase. S would still come to exist.
With the grammar turned off, figure 13.3 shows our give-phrase.
Each CHOICE_OF_PHRASES contains only a single PHRASE block.
(Multiple PHRASEs in a CHOICE_OF_PHRASES can be generated only
upon (further) rule applications).
Thus, the only way to arrive at any block in the give-phrase
is to pass thru the rightmost PHRASE block. (We will prove in
the next section that a PHRASE block can be referenced from
only one CHOICE_OF_PHRASES, the ones you see in the figure).
Thus, if any of the give-phrase's PHRASE blocks is in S,
then the rightmost PHRASE block is also in S.
Upon arriving at any PHRASE block in the give-phrase, there is
no way out except to travel leftward along all PHRASE blocks
in the give-phrase. Since S is full spanning, S contains at
least the leftmost block in the give-phrase. Thus, all blocks
of the give-phrase are in S.
This says in conclusion that the blocks generated by some rule's
application are all contained in S. Now that we have the rule
application, we can proceed as before. We deduce the existence of the
rule's want-phrase, where all blocks in the want-phrase are older than
all the blocks in the give-phrase. As before, we exchange the want-
phrase with the give-phrase, and arrive at the contradiction, by
producing an illegal string even older than S.
13.4.5
More Properties Of GIVE
Notice the following about GIVE and the grammar:
* GIVE receives always a ~new PHRASE block.
(We use always the notation:
GIVE( [LEFT: p.LEFT POS: part-of-speech] );
when calling GIVE, from the grammar or when taking
user input. The [...] always creates a new block).
* Only GIVE contributes elements to C.
* GIVE only augments C, never removing an element from C.
The following useful properties are then derived:
* A PHRASE block is a member of only one CHOICE_OF_PHRASES
(All new PHRASE blocks come into GIVE. GIVE puts
each onto C once. GIVE can't put it onto C ever
again because GIVE is called only with ~new PHRASE
blocks. Since nowhere else are elements added to C,
a given PHRASE block is referenced from at most one
CHOICE_OF_PHRASES).
* Given a visible phrase, it has a unique rightmost
CHOICE_OF_PHRASES. It holds the visible phrase's rightmost
(first) PHRASE block.
BOX: BOX: When is one string older than another?
BOX:
BOX: Given an oldest illegal string, how did we find an
BOX: even older illegal string?
BOX:
BOX: Can two CHOICE_OF_PHRASES reference the same PHRASE
BOX: block?