CHAPTER 11 A BRIEF VISIT WITH AMBIGUITY For us, ambiguity refers to a ~multiplicity of interpretations. Within that multiplicity, differing interpretations can coexist ~and even be incompatible with one another. The interpretations are independent of one another. As the needs of the future become known, only one or a few interpretations may actually be used. The ability for the future to pick among the multiple interpretations eases an otherwise difficult if not impossible task of predicting the future. By preserving multiple, possible interpretations, the future can be accomodated optimally. Ambiguity is thus a natural way to deal with the uncertain future. This contrasts another meaning sometimes associated with ambiguity. Ambiguity might be thought of as a ~loss of information, where details might be murky. In contrast, what we mean by ambiguity is a ~gain of information, beyond and above one single interpretation. Most humor is based on exposing ambiguity, triggering multiple, often incompatible interpretations. Ambiguity is ~fun. It relieves us for a moment from a belief that we live in a serious, unambiguous universe. It exposes the real nature of human information. 11.1 Ambiguity In Parsing A parser takes in an input sequence of characters and tries to understand it. All it has in hand is a set of rules of grammar. Our parser proceeds by rewriting portions of the input text. We discovered in Chapters 1 and 2 that false rewrites can occur, which impose dead ends on the parsing process. Our solution, to avoid ever getting stuck, is to never commit wholeheartedly to any rewrite. Each rewrite doesn't actually erase a found (~want) phrase. It only ~contributes the replacement (~give) phrase as an ~additional ~phrase. Both phrases exist, and such multiplicity of interpretations is ~ambiguity. A rewrite operation is a modification. We apply consistently the subjective form of modification (Chapter 10), copy-on-write, so that no existing data are changed. Phrases that exist at some time continue to exist unchanged, until all the parsing is done (footnote). Thus, old and new phrases ~coexist. BOX: Is ambiguity a ~gain or a ~loss of information? BOX: (footnote: We always employ subjective modification for syntax data. However, we will use objective modification in one place, but it will be applied only to semantic data). 11.2 Ambiguity In Languages A simple example of ambiguity in language occurs the word "flies". It is both a noun and a verb, but ~uncertain as to which. The noun intepretation appears in the sentence: FLIES populate the world. The verb form is necessary in: The airplane FLIES to Paris. Both forms are necessary in the sentence: Each of the FLIES FLIES to the apple. Since we insist that a word can exist all by itself, that self must have both a noun and a verb interpretation. This is not one combined interpretation, but two distinct unrelated interpretations. Any ~one interpretation will be used for an occurence of the word "flies", never both at once. (To have both interpretations in a sentence, the word must appear twice). To illustrate the evolution of ambiguity, consider that the phrase "male flies" is also ambiguous: The MALE FLIES are sterilized. The MALE FLIES towards the female. Figure 11.1(a) shows an ambiguous interpretation for "flies", and 11.1(b) shows one for "male". Figure 11.1(c) shows them put together. Figure 11.1(d) shows the meaningful interpretation used by the first sentence above. Figure 11.1(e) shows the second sentence. If we disregard the literal words "male" and "flies", figures 11.1(a) and 11.1(b) show two interpretations for each. Figure 11.1(c) shows the four possible total interpretations, which we can write as a set of unambiguous meanings: ----- noun -------- noun ---- --- adjective ----- noun ---- ----- noun -------- verb ---- --- adjective ----- verb ---- . Of the four interpretations, only two of them "resonate" (make sense). That is, only the middle two trigger two rules of the English language: adjective noun -> noun_phrase noun verb -> sentence There might be no rules corresponding to the other two interpretations: noun noun -> ?? adjective verb -> ?? This illustrates disambiguation, the discarding of some ambiguity as more context is seen. Four interpretations have reduced to two full-spanning interpretations (figure 11.1(f)). BOX: Can the word "it" have multiple meanings? Consider BOX: the sentences: BOX: BOX: The apple and the car are red. It... 11.3 Programming Languages Although we wish a programming language to be unambiguous, we will do well to tolerate ambiguity ~during the translation task. For example, the phrase 1 + 2 is a valid arithmetic expression. It is easiest to let that thought occur, even if we later discover that the 1+2 appears in 1 + 2 * 3 . The overall expression (where multiplies occur before additions) is incompatable with the old 1+2 thought. That's okay because no thought ever destroys another thought in this system. The thought of the whole expression simply doesn't reference the 1+2 thought. If we didn't tolerate ambiguity, we would have to carefully supress the 1+2 thought. When we preserve the 1+2 thought along with other thoughts (e.g., 2*3), we are tolerating ambiguity, the existence of ~multiple (although incompatible) interpretations. Non-syntactical, or semantic, ambiguities can arise too. The number 1 is ambiguous semantically: Is it an integer or a real number? Further context, such as an enclosing assignment statement, may disambiguate this situation. For another example, the function ABS (absolute value) may be ambiguous because it can apply to integers, reals, or points. The particular ABS can be decided only when more context appears. 11.4 Constraints Imposed By Restrictive Methods Most other methods for language processing, e.g., LR(k)[] and its variations, don't tolerate ambiguity. Because of this, the kind of language that can be processed by LR(k) is restricted. It is so restricted that the following is true: Given two languages X and Y, the union of X and Y, the language that accepts sentences from either, may be impossible to implement using LR(k), even if each of X and Y were LR(k) languages. That is to say, even if you could process each of English and Spanish independently, you couldn't process their union, the language that accepts sentences in either English or Spanish. Ambiguity is essential, as the word "no" has a meaning in both English and Spanish. In contrast to those unambiguous methods, where the language processing task must be careful to supress all but a single train of thought, we instead encourage free thinking, where all possible thoughts are allowed. It is easier to allow multiple thoughts than it is to supress all but one thought. Even though most of the thoughts won't be compatible with the overall understanding of the sentence, it is still more practical to allow thoughts than to disallow all but one thought. 11.5 Lots Of Ambiguity One might become alarmed at the allowance of all thoughts, because there are so many of them. Given an input phrase consisting of N characters (or words), there can be easily 2^N, an exponential amount, of possible thoughts. For example, figure 11.2 shows a four word phrase, where each word (e.g., A1) has two possible interpretations (A1 and A2). (Figure 9.9 shows more explicitly the representation). We can generalize that to N words. By picking one or the other interpretaion for each of the N words, we have 2^N unambiguous thoughts. We will nonetheless represent and process all those thoughts within only polynomial cost, and in practice, at a nearly linear cost. We do this by ~sharing the common parts of thoughts (phrases), and representing only the essential differences among thoughts. For example, the representation of those 2^N unambiguous thoughts takes only 2*N memory. Also, notice how the same A1 is shared among all 2^(N-1) thoughts that begin with A1. The ambiguity A1/A2 is also represented once and shared throughout the 2^N unambiguous interpretations. We process all thoughts in parallel, instead of persuing one complete thought at a time. With all thoughts grown together, common elements among thoughts will arise together. We will process the commonalities only once, and share the benefit of that processing among the different thoughts. We will proceed by finding all meanings for the first word in a sentence before bringing in the second word. All possible thoughts involving the first two words together are generated prior to bringing in the third word. The full ambiguity is thus grown left-to-right. By the time we reach the righthand end, we have all meanings for the whole sentence. BOX: What advantage is there in processing multiple BOX: interpretations in parallel? BOX: BOX: Why can we tolerate an exponential number of BOX: interpretations? BOX: BOX: Do you come up with multiple interpretations when BOX: viewing a work of art? BOX: BOX: If a spray can were advertised as being both a floorwax BOX: ~and a dessert topping, what ambiguity is exposed? 11.5.1 Ambiguous Semantics The factoring, or sharing of ambiguity inherent in this ambiguous phrase representation relieves one source of exponential cost. However, one further step is needed to kill all exponential costs, which can arise with ambiguous grammars. Figure 11.3 shows an ambiguous phrase where some parses are duplicated, e.g., there are two "A"s sharing the same span. This kind of duplication arises very naturally, e.g., if there are two distinct means by which an "A" interpretation can be seen. Once we subvert such duplicates, we will relieve all causes of exponential cost. To remove such duplication without losing one or the other's meaning, we will need the notion of ambiguous semantics. One "A" phrase will survive but its associated meaning will be ambiguous, so to represent faithfully the two possible original meanings. Ambiguous semantics allows an exponential (even infinite) number of unambiguous meanings to be represented in only polynomial memory. We will see how to process all exponentially many meanings in only polynomial time, unless of course someone wants us to ~enumerate the exponentially many meanings. However, in practice, after the meanings are processed, usually one (or a few) survive the constaints imposed on those meanings, and hence we never enumerate all the meanings. (Part 7 shows how to use ambiguous semantics). BOX: What gives rise to ambiguous semantics? BOX: 11.6 Principle Of Ambiguity (An Uncertainty Principle) There is a principle of ambiguity that feels like the uncertainty principle in quantum mechanics. The "uncertainty principle" in quantum physics states that you can trade off precision between a particle's position and its momentum. You can not get precise measurements on both. The greater the precision in the particle's position, the greater the uncertainty of that particle's momentum, or vice versa. The "ambiguity principle" in language processing states that the ~shorter the phrase under consideration, the ~greater ~its ~ambiguity. Longer phrases disambiguate their words more than shorter phrases. The longer phrase has more context. The other words in the phrase limit the number of possible meanings for any word in that phrase. We've seen disambiguation occur with the word "flies" after more of the sentence is considered. This principle of ambiguity also applies another way. The smaller a ~slice of a domain you consider, the greater its ambiguity. For example, if you consider only the syntax of a language, one slice in the domain of language, some things remain inherently uncertain until subsequent semantic processing. Ambiguity lets us separate out systems as may be convenient for human thought. Where one system can't predict which of several possibilities will be the right one in the next system, ambiguity, the representation of all posibilities between systems, is required. 11.6.1 Perhaps In The Deepest Sense Real Information Is Ambiguous The uniform tolerance of ambiguity mimics the essential structure of real, complex information. When a speaker talks, the listener freely associates multiple possible meanings. This is necessary because no one association approximates the speaker's thought very well. Together the multiple possible meanings provide the most complete approximation of the speaker's thought. (Some of the meanings may fall to the side as further utterances from the speaker are heard). The existence of ambiguity in information itself can also be seen in one of the unintuitive properties of Quantum Mechanics. It involves the nature of questions you can ask of the theory (figure 11.4). It may be possible to know: Is the ball under one of cup A or cup B? and yet be unable to know either of: Is the ball under cup A? Is the ball under cup B? The unintuitive part is that you would probably attempt to answer the original question by asking each of the separate unaskable questions. The information delivered by the theory is inherently ambiguous. You can't reduce the ambiguous datum (A or B) into two unambiguous, separable data (A, B). It is in human character to see ~two regions. In the imagination, they are separable. Here, real physical information, in human terms, is ambiguous. BOX: What is the principle of ambiguity? BOX: BOX: Given a string of length zero, why is there BOX: infinite ambiguity? 11.6.2 Concession To Computers One of the most striking aspects of a computer besides its speed, is the fact that each operation can deal with only a very limited amount of data, e.g., on the order of 32-bits. The computer cannot in one operation see a whole sentence. Due to this nature of computing, we are limited in language processing to see at any one time only a very limited portion of a sentence. We don't have the luxury of seeing the whole sentence at once. The ambiguity principle forces us then to deal with large amounts of ambiguity. 11.6.3 Ambiguity Everywhere For example, our parser is employed twice in ICL. First it performs syntax analysis, and later it performs datatype analysis. Both applications benefit by allowing ambiguity. Ambiguity may first appear during syntax processing. Some of that ambiguity will be reduced as longer and longer phrases are seen. However, some ambiguity may remain even after all of syntax is done. That remaining ambiguity is passed on to the next pass, the datatype analysis pass. An example of a syntax ambiguity that passes onto the datatype analysis pass is: A + B # C ICL has two different rules that involve "+": real + real -> real and point + point -> point That is, "+" can add two REALs or can add two two-dimensional POINTs. ("+" cannot add a REAL and a POINT). ICL has another rule which forms a POINT given its two REAL coordinates: real # real -> point Thus, 1#2 is the point whose x-coordinate is 1 and whose y-coordinate is 2, 1+2#3 is the point 3#3 (using the real + real -> real rule) 1#2+3#4 is the point 4#6 (using the point + point -> point rule). Reconsider the expression: A + B # C If A, B, and C are all REALs, then the expression must group as follows: (A + B) # C A+B must be evaluated ~before forming the whole POINT. "+" operates on REALs in this example. (The other grouping, doing the "#" first, would leave "+" in an impossible situation, adding a REAL (A) and a POINT (B#C)). In contrast, if A were a POINT, then the expression must group as: A + (B # C) The point B#C must be formed first, so that "+" can operate on POINTs. (Again, the other grouping, doing "+" first, is impossible lest "+" be forced to add a POINT (A) to a REAL (B) in A+B). Thus, the ~types of A, B, and C will dictate how the parentheses should go. This uncertainty in grouping is a necessary syntax ambiguity. Both groupings must be considered. The choice will ultimately be resolved by a subsequent pass, datatype analysis, when the types of A, B, and C first become known. (The syntax pass must be ~completed ~in ~its ~entirety first, so that we can even ~see the declarations of the variables A, B, and C). The datatype analysis pass will accept initial ambiguities from the syntax pass. It will also generate its own. For example, the syntactically unambiguous phrase: 3 becomes ambiguous in the datatype analysis pass, where it is both an INTeger and a REAL. We presumably have the coercion: INT -> REAL so that all INTegers also make sense as REALs. Coercions, like this one, and polymorphic functions contribute to type-pass ambiguity. (Polymorphic functions are functions that are defined for more than one datatype. The functions ABS and "+" are polymorphic functions because each is defined over ~several types. "+" works for INTs, REALs, and POINTs. ABS is defined for both INTs and REALs). The type-pass also resolves some ambiguities, both ambiguities from syntax and from the type-pass. For example, the ambiguous phrase: 1 + 2 (which is both a REAL and INTeger), is disambiguated within the larger context: R := 1 + 2 ; Assuming R is a REAL variable, the INT/REAL ambiguity is resolved. Notice that the time between the generation of an ambiguity until the resolution of that ambiguity can be great. For example, the expression: ABS( 1 + 2 ) is both a REAL and an INTeger, (because ABS is defined both for INTs and REALs). Here, the ambiguity that started with the "1" and with the "2" is not resolved until the outer context appears: FACTORIAL( ABS( 1 + 2 ) ) Presuming that FACTORIAL is defined only for INTegers, the INT/REAL ambiguity is resolved. If FACTORIAL were defined both for INTegers and REALs, then this phrase would remain ambiguous. BOX: How can ambiguity be resolved? Relate this to the BOX: principle of ambiguity. 11.7 Common Sense, Or "Resistance" Cuts Down On Ambiguity Unlike the FACTORIAL function, the WRITE function is defined for both INTegers and REALs. Thus, the following remains ambiguous: WRITE( ABS( 1 + 2 ) ) ; Does it write a REAL number or an INTeger number? This ambiguity will be resolved by the use of common sense, or "resistances". We pick the interpretation that involves the least number of coercions. We minimize the number of conversions of data from one type to another type. Each coercion, e.g., the one from INTeger to REAL, contributes a "resistance" into any interpretation that uses the result of that coercion. Notice that the REAL interpretation of: ABS( 1+2 ) requires at least one application of that coercion, and hence has an associated resistance of at least one. (The INT-to-REAL coercion must be applied either to both of "1" and "2", so that the REAL ABS can apply, or we can use the INT ABS, and apply the coercion to ABS's INT result so as to render a REAL). The INTeger interpretation uses no coercions, and hence has zero resistance. The INTeger interpretation will be chosen because it incurs the smaller resistance. A sentence (e.g., program) will ultimately be called ambiguous only if there is more than one interpretation that has the same, minimal resistance. NOTE: The reduction of ambiguity by resistances is a post-process. It executes only after all possible meanings are acquired. We must still develop all possible meanings. Thus, resistances do not cut down on any ambiguity until all ambiguity is developed. NOTE: It will be easiset to continue reading this text by ignoring entirely the notion of resistance. Thus, the phrase 3 is still ambiguous, acting as either an INT or a REAL. The implementation of resistances appears in Chapter 27. BOX: How does common sense serve to reduce ambiguity? BOX: 11.8 Representation Of Ambiguous Phrases The expression ABS( 1 + 2 ) can be represented by the string: ------ ABS ----- ( ----- 1 ----- + ----- 2 ----- ) ----- The fact that each of "1" and "2" is also an INTeger can be added to this representation: ----- ABS ----- ( ----- 1 ----- + ----- 2 ----- ) ----- \ / \ / \-INT-/ \-INT-/ The fact that any INT can be viewed also as a real can be added: ----- ABS ----- ( ----- 1 ----- + ----- 2 ----- ) ----- \ \ / / \ \ / / \ \-INT-/ / \ \-INT-/ / \ / \ / \-REAL/ \-REAL/ Because "+" is defined both upon INTs and REALs, we can get: ----- ABS ----- ( ------- 1 ----- + ----- 2 ------- ) ----- \ \ \ / / \ \ / / / \ \ \-INT-/ / \ \-INT-/ / / \ \ / \ / / \ \-REAL/ \-REAL/ / \ / \-------- INT --------/ \ / \------ REAL -----/ Finally, the interpretations that involve ABS can be added: ----- ABS ----- ( ------- 1 ----- + ----- 2 ------- ) ----- \ \ \ \ / / \ \ / / / / \ \ \ \-INT-/ / \ \-INT-/ / / / \ \ \ / \ / / / \ \ \-REAL/ \-REAL/ / / \ \ / / \ \-------- INT --------/ / \ \ / / \ \------ REAL -----/ / \ / \-------------- INT --------------/ \ / \------------ REAL -----------/ How many distinct phrases can you see in this ambiguous phrase? There are 13 of them: ------ ABS ----- ( ----- 1 ----- + ----- 2 ----- ) ----- ------ ABS ----- ( ---- INT ---- + ----- 2 ----- ) ----- ------ ABS ----- ( ---- REAL --- + ----- 2 ----- ) ----- ------ ABS ----- ( ----- 1 ----- + ---- INT ---- ) ----- ------ ABS ----- ( ---- INT ---- + ---- INT ---- ) ----- ------ ABS ----- ( ---- REAL --- + ---- INT ---- ) ----- ------ ABS ----- ( ----- 1 ----- + ---- REAL --- ) ----- ------ ABS ----- ( ---- INT ---- + ---- REAL --- ) ----- ------ ABS ----- ( ---- REAL --- + ---- REAL --- ) ----- ------ ABS ----- ( ------------ INT ------------ ) ----- ------ ABS ----- ( ------------ REAL ----------- ) ----- ----------- INT ---------------------------------------- ----------- REAL --------------------------------------- We will refer to each of these phrases as an "unambiguous" phrase, because it offers no alternatives. 11.8.1 Locality Of Ambiguity The ambiguous phrase just shown represents ambiguity efficiently. It preserves the "locality of ambiguity". Notice how much more efficient it is than the 13 separate phrases just shown. In our ambiguous phrase, local ambiguities do not reach out across the whole phrase; they remain local. 11.8.2 Another Example Of Exponential Ambiguity At Polynomial Cost Consider the phrase: 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 Each number can be intepreted both as an INTeger and as a REAL. How many distinct unambiguous phrases would it take to represent all those possible meanings? It would take 2-to-the-9th power, because each number can independently be one of two things, an INT or a REAL. There are 9 independent such choices. (If "+" were not addition, all 512 possibilities might in fact give 512 different answers (footnote)). The ambiguous phrase, which preserves locality of ambiguity, consumes memory proportional to the length of the phrase. The separate unambiguous representations consume an exponential amount of memory, and processing time. We will do all our language processing within polynomial cost, as a function of the phrase length. However, we will be representing, and processing, what amounts to an exponential number of meanings. We will see how to maintain a "fully ambiguated" ambiguous phrase in only polynomial time. We will also see how to represent an exponential number of ~meanings in only polynomial memory. We will even see how to process important aspects of an exponential number of meanings in only polynomial time. BOX: How does forbidding locality of ambiguity affect the BOX: cost of representing ambiguity? (Footnote: Here is a grammar which gives 512 distinct meanings. In each of the following rules, let FACTOR be the smallest power of ten greater than "a+b", where "a" and "b" are the two given values (from the lefthand side of each rule): INT + INT -> INT by a+b + 1*FACTOR INT + REAL -> REAL by a+b + 2*FACTOR REAL + INT -> REAL by a+b + 3*FACTOR REAL + REAL -> REAL by a+b + 4*FACTOR We prove that there are 2^N distinct answers by induction on phrase length, N. The inductive hypothesis is: A phrase of length N gives rise to 2^N-1 distinct REAL meanings, and one INT meaning. Each REAL meaning is N digits long, where the high-order digit is less than five. By induction, each of the last two rules, with a REAL to the left of the "+", delivers 2^(N-1) - 1 interpretations, via that REAL to the left of the "+". Together they deliver 2^N-2 REALs. In addition, the first two rules give one more REAL and one INT meaning. 11.9 Rules Of Grammar A grammar rule represents a phrase replacement. It says "Whenever you see the phrase XXX, feel free to replace it by another phrase, YYY". For example, the rule INT -> REAL says that any occurence of INT can also be viewed as a REAL. The overall function of a language processor is to apply all grammar rules at all places within the given phrase. For example, the grammar rule Literal_number -> INT applies everywhere possible, i.e., at the "1" and the "2" in the given phrase, ABS(1+2): ----- ABS ----- ( ----- 1 ----- + ----- 2 ----- ) ----- \ / \ / \-INT-/ \-INT-/ Because all grammar rules apply also to all results of grammar rules, the rule INT -> REAL is applied to those newly formed INTs: ----- ABS ----- ( ----- 1 ----- + ----- 2 ----- ) ----- \ \ / / \ \ / / \ \-INT-/ / \ \-INT-/ / \ / \ / \-REAL/ \-REAL/ 11.9.1 Grammar Rules Maintain Locality Of Ambiguity We apply rules by having them add interpretations to an ambiguous phrase. For example, the rule: INT -> REAL could take the phrase ----- ABS ----- ( ----- 1 ----- + ----- 2 ----- ) ----- \ / \ / \-INT-/ \-INT-/ and augment it to be: ----- ABS ----- ( ----- 1 ----- + ----- 2 ----- ) ----- \ \ / / \ \ / / \ \-INT-/ / \ \-INT-/ / \ / \ / \-REAL/ \-REAL/ Each rule naturally wants to make a local rewrite. It would be cumbersome for this rule to produce instead a set of unambiguous phrases: ----- ABS ----- ( ----- INT ----- + ----- INT ----- ) ----- ----- ABS ----- ( ----- REAL ---- + ----- INT ----- ) ----- ----- ABS ----- ( ----- INT ----- + ----- REAL ----- ) ----- ----- ABS ----- ( ----- REAL ---- + ----- REAL ----- ) ----- This local nature of rewrites naturally gives rise to ambiguous phrases that preserve the locality of ambiguity. BOX: Why do grammar rules naturally maintain locality BOX: of ambiguity? 11.9.2 Grammar Rules Behave Like A Resonance Chamber A grammar, which is a set of grammmar rules, takes in a phrase and ambiguates it so that all possible applications of all possible rules at all possible locations become represented. The original phrase acts as the initial input or "noise" entering a resonance chamber. The grammar "resonates" in all possible ways, producing resonances of resonances, etc., as does a physical resonance chamber. Only when all resonances are achieved does the grammar terminate its effort. In other words, the computer takes in the initially incomprehensible user input, and understands that input only in terms of well defined resonances. The structure of the resonances holds the essential, comprehensible meaning of the given user input. The receiver of information transmitted thru language has at hand ~only knowledge of the language itself. Understanding must occur in terms of the language, in terms of grammar rules. BOX: Why does understanding occur in terms of resonances BOX: (associations sprung by grammar rules)? BOX: BOX: How is this similar to the vastly simpler AM radio BOX: communication? BOX: BOX: What is an art form that thrives on resonances? BOX: BOX: What are some fields of science where resonances play a BOX: big role?