CHAPTER 3 PRECEDENCE GRAMMARS AND PARAMETERIZED PARTS-OF-SPEECH In this chapter, we provide a more concise and efficient method for expressing the <FORM> grammar. This will apply to a whole class of grammars, called precedence grammars. We introduce ~array, or ~parameterized parts-of-speech to aid in this endeavor. Finally, in Section 3.5, we show another use of parameterized parts-of-speech, where we allow for grammars with flexible word order. You may have noticed a pattern emerging in the <FORM> grammar. Each time we introduced a new operator, we invented a new part-of-speech. <FORM> is for "+" and <TERM> is for "*". If we introduce a new operator, say "^" for exponentiation, which we want to group before "*" and "+", we introduce a new part-of-speech, say ATOM0. Here is the whole <FORM> grammar with the new operator. (The semantics are unspecified): <NUMBER> -> <ATOM0> ( <FORM> ) -> <ATOM0> <ATOM0> -> <ATOM> <ATOM> ^ <ATOM0> -> <ATOM> <ATOM> -> <TERM> <TERM> * <ATOM> -> <TERM> <TERM> -> <FORM> <FORM> + <TERM> -> <FORM> Let's reorder these rules to expose a pattern: <NUMBER> -> <ATOM0> <ATOM0> -> <ATOM> <ATOM> -> <TERM> <TERM> -> <FORM> ( <FORM> ) -> <ATOM0> <ATOM> ^ <ATOM0> -> <ATOM> <TERM> * <ATOM> -> <TERM> <FORM> + <TERM> -> <FORM> 3.1 Coercion Rules The first set of rules all have one item on each side of the "->". They apply without consuming any additional characters from the original string. (The other rules consume at least one additional character, e.g., (,),+,*,^). Rules having only one item on each side of the "->" (or "::=") are called coercions. Whenever the parser sees a <NUMBER>, it will unconditionally consider all the coercion rules. That is, the derivation 1 <NUMBER> will automatically lead at least to the consideration of the derivation: 1 <NUMBER> <ATOM0> <ATOM> <TERM> <FORM> That is, whenever you produce a <NUMBER> you will also produce <ATOM0>, <ATOM>, <TERM>, and <FORM>. That is five times as expensive as producing the <NUMBER> alone, and yet this will always happen. 3.2 Operator Precedence We can avoid the expense of the cascading coercions just shown. We also make it very easy to introduce new operators, with any chosen grouping order, or "precedence". Let us substitute one array of parts-of-speech, called EXPR, in place of <ATOM0>, <ATOM>, <TERM>, and <FORM>. We rename each of the original parts-of-speech: EXPR[1] is ATOM0 EXPR[2] is ATOM EXPR[3] is TERM EXPR[4] is FORM The rules now appear as: <NUMBER> -> <EXPR[1]> <EXPR[1]> -> <EXPR[2]> <EXPR[2]> -> <EXPR[3]> <EXPR[3]> -> <EXPR[4]> ( <EXPR[4]> ) -> <EXPR[1]> <EXPR[2]> ^ <EXPR[1]> -> <EXPR[2]> <EXPR[3]> * <EXPR[2]> -> <EXPR[3]> <EXPR[4]> + <EXPR[3]> -> <EXPR[4]> We can abandon the coercion rules entirely if we modify the latter set of rules. For example, we can remove the coercion <EXPR[1]> -> <EXPR[2]> if we modify all occurences of <EXPR[2]> so as to say "<EXPR[2]> or <EXPR[1]>". <EXPR[1]>'s will no longer need to rewrite to <EXPR[2]>'s in order to be accepted as <EXPR[2]>'s. We write <EXPR[ =< 2 ]> instead of <EXPR[ 2 ]> to say "either <EXPR[2]> or <EXPR[1]>". Any <EXPR[1]> will be accepted wherever <EXPR[=<2]> is accepted, and hence we no longer need to rewrite <EXPR[1]>s to <EXPR[2]>s. In general, we will replace all occurences (on the lefthand sides) of <EXPR[i]> with <EXPR[ =< i ]> Thus, whenever you wrote <EXPR[i]>, you will accept not only [i], but also [i-1], [i-2], etc. as well. This relieves the need for the coercions from [i-1] to [i], from [i-2] to [i-1], etc. We now write the entire <FORM> grammar simply as: <NUMBER> -> <EXPR[1]> ( <EXPR[=<4]> ) -> <EXPR[1]> <EXPR[=<2]> ^ <EXPR[=<1]> -> <EXPR[2]> <EXPR[=<3]> * <EXPR[=<2]> -> <EXPR[3]> <EXPR[=<4]> + <EXPR[=<3]> -> <EXPR[4]> In effect, the "+" rule says "I'll take anything with a ^, *, or + to my left (=<4), and anything with a ^ or * on my right (=<3)". Notice the index appearing on the righthand side. The "^" rule has [2], the "*" rule has [3], and the "+" rule has [4]. These indices denote the "precedence" of each operator: ^ has precedence 2 * has precedence 3 + has precedence 4 Operators of lower precedence combine before operators of higher precedence. (We are using the term "precedence" with a reversal in its meaning. You can reverse the indices (to go from 4 to 1 as opposed to going from 1 to 4) if you change all "=<"s to ">="s. Then our policy would read more naturally: "Operators with higher precedence bind before operators of lesser precedence". We continue this discussion, however, with our original, reversed meaning of precedence). 3.2.1 Introducing The Part-Of-Speech BOP Let's introduce a new array of parts-of-speech BOP, short for "Binary OPerator". We declare it like we declare any array part-of-speech: POS BOP[4] : a_semantic_type ; We leave the semantic type unspecified for now. This declaration creates all at once the parts-of-speech <BOP[0]>, <BOP[1]>, <BOP[2]>, <BOP[3]>, and <BOP[4]>. The new part-of-speech BOP is said to be parameterized because ~which of the five parts-of-speech is variable. Similarly, EXPR is declared by: POS EXPR[4] : a_semantic_type ; We've seen parameterized use of EXPR, as in: <EXPR[ =< 2 ]> EXPR's parameter, a number between 0 and 4, is thus used as a condition for rule application. Here is the <FORM> grammar using the new parts-of-speech EXPR and BOP: ( <EXPR[=<4]> ) -> <EXPR[1]> <NUMBER> -> <EXPR[1]> ^ -> <BOP[2]> * -> <BOP[3]> + -> <BOP[4]> <EXPR[=<i]> <BOP[i]> <EXPR[<i]> -> <EXPR[i]> for i from 2 to 4. The operator ^, *, and + rewrite to <BOP> prior to combining with <EXPR>s. This grammar is equivalent to the former grammar, but it shows operators' precedences most clearly. That last rule is officially three rules, where i takes on the three values 2,3, and 4. Finally, we write all three instances of the last rule as a single conditional rule: <EXPR[i]> <BOP[j]> <EXPR[k]> -> <EXPR[j]> when i =< j and k < j. The i,j, and k on the lefthand side are passive. Whenever an <EXPR> <BOP> <EXPR> is seen, the system sets the variables i, j, and k to the indices found in the matched phrase. This rule applies only when the condition i =< j & j < k is satisfied. We write this rule officially as: <EXPR[i]: x > <BOP[j]: y > <EXPR[k]: z > -> IF i =< j & k < j THEN <EXPR[j]: f(x,y,z) > FI Conditional rules are specified by using an IF-THEN programming notation on the righthand side. (The trailing FI ends our IF-THEN notation, as shown in Sections 21.4.2.3 or 22.2.3). 3.2.2 How Parts-Of-Speech Are Matched In the absence of array parts-of-speech, a single part-of-speech is represented simply by a unique number. We match a part-of-speech in the rule against a part-of-speech in the string by simple equality. With array parts-of-speech, e.g., EXPR and BOP, we use inequalities to check for matching. That is, the rule <EXPR[i]> <BOP[j]> <EXPR[k]> -> ... is paraphrased as: < anything between EXPR[0] and EXPR[4] > < anything between BOP[0] and BOP[4] > < anything between EXPR[0] and EXPR[4] > -> ... These matches by inequalities set the variables i, j, and k to the actual parts-of-speech matched. The introduction of array parts-of-speech has allowed us to abandon the coercion rules, and thus achieve greater parsing efficiency. It has also allowed for a more readable grammar, where the precedence of each operator is clearly specified, e.g., + -> <BOP[4]> The three rules that involved operators in the original <FORM> grammar are now replaced by one, clear conditional rule. Here is the new, economical derivation for 1+2*(3+4): 1 + 2 * ( 3 + 4 ) DIGIT DIGIT DIGIT DIGIT NUMBER NUMBER NUMBER NUMBER EXPR[1] EXPR[1] EXPR[1] EXPR[1] BOP[4] BOP[3] BOP[4] ----- EXPR[4] ------- ---------- EXPR[1] ----------- ------------- EXPR[3] --------------------- ----------------------- EXPR[?] ---------------------------- (The "?" is left as an excercise). 3.2.3 All Elements Of An Array Part-Of-Speech Have The Same Semantic Type Finally, note that an array part-of-speech has the identical semantic type for all of its individual parts-of-speech. That is, the type associated with <EXPR[1]> must be the same as the type associated with <EXPR[3]>. This is strictly required, once again, because of the way we match parts-of-speech. Since <EXPR> can match any of <EXPR[0]>, <EXPR[1]>, <EXPR[2]>, <EXPR[3]>, and <EXPR[4]>, those individual parts-of-speech must have one and the same type. That is, we need a unique answer to the question: What is the type of X in <EXPR[i]: x > ? That type must be independent of i, since we don't know a priori what particular value i will take on. Remember, we need to know the type of x (for all i) in order to be able to even begin examining what's in x. The <EXPR> grammar that replaces the <FORM> grammar follows these type constraints. We have since the beginning assigned the same type to <FORM>, <TERM>, and <ATOM>. The <EXPR[i]>'s thus naturally take on that same common type. 3.3 More Top-Down Context Let's introduce semantics into our new, brief <EXPR> grammar. The semantics will appear slightly different from before, now that all operators rewrite to the part-of-speech BOP prior to combining with EXPRs. We first declare the parts-of-speech we will be using: POS EXPR[4] : BASIC_PROCESS ; " Prints itself on the terminal " POS BOP[4] : BASIC_PROCESS ; " Prints itself on the terminal " Here is the grammar: <NUMBER:n> -> <EXPR[1]: //[N;] WRITE(N); \\ > ( <EXPR[i]:x> ) -> <EXPR[1]: x > + -> <BOP[4]: // WRITE('+'); \\ > * -> <BOP[3]: // WRITE('*'); \\ > ^ -> <BOP[2]: // WRITE('^'); \\ > <EXPR[i]:x> <BOP[j]:y> <EXPR[k]:z> -> IF I =< J & K < J THEN <EXPR[j]: //[X;Y;Z;] WRITE('('); <*X*>; <*Y*>; <*Z*>; WRITE(')'); \\ > FI When "+", "*", or "^" appears as a BOP, the semantics is to cause that character to appear on the terminal. The (big) EXPR-BOP-EXPR rule has semantics similar to before, except that we now invoke the BOP (<*Y*>) to get the operator to appear on the terminal. Now suppose we want "*" to print its operands in reverse order. As things stand now, there is nothing we can do to achieve this just by modifying the BOP rule: * -> <BOP[3]: // WRITE('*'); \\ > The BOP simply never has access to its operands. Those operands are presently available only in the EXPR-BOP-EXPR rule. Let's now modify the grammar so that each BOP has access to its operands. We will have the EXPR-BOP-EXPR rule not invoke the two <EXPR>s, but rather, have the rule pass the two <EXPR>s down to the <BOP> rules, so that each <BOP> can be responsible for printing its own operands. We declare two global variables for top-down context: VAR LEFT_EXPR, RIGHT_EXPR = BASIC_PROCESS ; We've declared them to be of type BASIC_PROCESS so that they can hold the semantics of <EXPR>s, which are to be passed down to the <BOP>. To our declaration of the part-of-speech BOP, we add a very important comment: POS BOP[4] : BASIC_PROCESS ; " The BASIC_PROCESS reads the global variables LEFT_EXPR and RIGHT_EXPR. " This important comment applies to all <BOP>s: Whenever we invoke the semantics of a <BOP> we always must first put values into the variables LEFT_EXPR and RIGHT_EXPR. No matter what, <BOP>s assume that valid values appear in those two global variables. Here is the updated grammar. Only the BOP rules and the one EXPR-BOP- EXPR rule need change: <EXPR[i]:x> <BOP[j]:y> <EXPR[k]:z> -> IF I =< J & K < J THEN <EXPR[k]: //[X;Y;Z;] WRITE('('); HOLDING LEFT_EXPR:= X; RIGHT_EXPR:=Z; DO <*Y*>; ENDHOLD WRITE(')'); \\ > FI + -> <BOP[4]: // <*LEFT_EXPR*>; WRITE('+'); <*RIGHT_EXPR*>; \\ > * -> <BOP[3]: // <*LEFT_EXPR*>; WRITE('*'); <*RIGHT_EXPR*>; \\ > ^ -> <BOP[2]: // <*LEFT_EXPR*>; WRITE('^'); <*RIGHT_EXPR*>; \\ > The EXPR-BOP-EXPR rule now invokes only the BOP, but sets LEFT_EXPR and RIGHT_EXPR to the operands prior to that invocation. Notice how each BOP rule's semantics takes on the responsibility of invoking LEFT_EXPR and RIGHT_EXPR, to get its operands printed out. Now, if we wanted to make "*" print out its operands in reverse order, we replace the "*" rule with: * -> <BOP[3]; // <*RIGHT_EXPR*>; WRITE('*'); <*LEFT_EXPR*>; \\ > 3.4 Exercises 1) Will 1+2 print as 1+2 or as (1+2)? 2) What is the "[?]" in the latest derivation shown? 3) The condition in the rule <EXPR[i]> <BOP[j]> <EXPR[k]> -> IF i =< j & k < j THEN <EXPR[j]:...> FI is i=<j and k<j. The fact that i is =< and j is < imposes an asymmetry. How does that asymmetry manifest itself in the formula 1+1+1? 4) Affect the condition in the EXPR-BOP-EXPR rule so as to have a right-to-left grouping just for operators of precedence 2. That is, "^" has precedence 2 and we want it alone to group right-to-left. Fill in the ...: <EXPR[i]:x> <BOP[j]:y> <EXPR[k]:z> -> IF ... THEN <EXPR[j]: f(x,y,z) > FI 5) In the expression 1+2*3, what values are in LEFT_EXPR and RIGHT_EXPR when the BOP "+" is invoked? That is, what would <*LEFT_EXPR*>; print and what would <*RIGHT_EXPR*>; print? How about for the + between the 3 and 4 in the expression 1+2*3+4*5, or in 3+4*(5+6)*7+8 ? 6) Introduce a rule which will admit "-" with the same precedence as "+", and another rule which will admit "/" like "*". Specify whether you are using the grammar with top-down context for BOPs (LEFT_EXPR and RIGHT_EXPR), or the previous rendition without top-down context. 7) Let's declare a new part-of-speech, UOP (for Unary OPerator): POS UOP : BASIC_PROCESS; " Prints itself on terminal " Add two rules to the <EXPR> grammar (without the top-down context). The first rule says that "-" can be a UOP. The second rule combines a UOP followed by an EXPR to become an EXPR. Arrange the second rule so that unary minus ("-") applies before any of the binary operators (BOPs). 8) Let's rewrite the EXPR grammar from scratch. We make the following new declarations: POS EXPR[4]: //INT\\ ; POS BOP[4]: //INT\\ ; "Reads the global variables LEFT_VALUE and RIGHT_VALUE" The "//INT\\" is an action type like BASIC_PROCESS, but its invocation delivers an INTeger. That is, //5\\ and //5+6\\ are expressions of type //INT\\. The invocation of these delivers an INT (5 and 11 in these examples). a) What will we see if we say: VAR X = //INT\\ ; X:= // 2 \\ ; WRITE( <*X*> ); What about: X:= // 5+6 \\ ; X:= //[X;] <*X*> * 2 \\ ; WRITE( <*X*> ); Let's declare the top-down context global variables: VAR LEFT_VALUE, RIGHT_VALUE = INT; We now specify the grammar: <NUMBER:n> -> <EXPR[1]: //[n;] n \\ > ( <EXPR[i]:x> ) -> <EXPR[1]: x > <EXPR[i]:x> <BOP[j]:y> <EXPR[k]:z> -> IF I =< J & K < J THEN <EXPR[j]: //[X;Y;Z;] HOLDING LEFT_VALUE:= <*X*> ; RIGHT_VALUE:=<*Z*> ; GIVE <*Y*> ENDHOLD \\ > FI + -> <BOP[4]: // LEFT_VALUE + RIGHT_VALUE \\ > * -> <BOP[3]: // LEFT_VALUE * RIGHT_VALUE \\ > ^ -> <BOP[2]: // LEFT_VALUE ^ RIGHT_VALUE \\ > b) When we invoke the semantics of the following <EXPR>s, what INTeger values do we get? 1+2*3 (1+2)*3 c) In the EXPR-BOP-EXPR rule, what types have the variables X, Y, and Z? What is the type of LEFT_VALUE and RIGHT_VALUE? d) Why does the HOLDING assignment appear as: HOLDING LEFT_VALUE:= <*X*> ; ... instead of HOLDING LEFT_VALUE:= X ; ... e) Why does the semantic specification in the "+" rule include the surrounding //...\\? f) Why don't the global variable LEFT_VALUE and RIGHT_VALUE appear as square-bracket context within the semantics specification of the three BOP rules? g) Suppose LEFT_VALUE and RIGHT_VALUE happen to be initialized to the value 1. Suppose we write the "+" rule using square-bracket context: + -> <BOP[4]: //[LEFT_VALUE;RIGHT_VALUE;] LEFT_VALUE + RIGHT_VALUE \\ > When 1+2 is seen as an overall <EXPR>, what number will we get upon invoking its semantics? Will the expression 1+2+3 have that same value? Why? (Recall that the BASIC_PROCESS //[LEFT_VALUE;RIGHT_VALUE;] LEFT_VALUE + RIGHT_VALUE \\ traps the values in LEFT_VALUE and RIGHT_VALUE ~during the rewriting process. This occurs before the invocation of any semantics). 9) Suppose we build the numbers grammar from scratch, as follows: POS DIGIT, NUMBER : //INT\\ ; POS NUM : //INT\\ ; " Reads the global variable RADIX. " Let's declare the top-down context variable RADIX: VAR RADIX = INT; RADIX:= 10 ; Here is the new numbers grammar: 0 -> <DIGIT: //0\\ > 1 -> <DIGIT: //1\\ > ... 9 -> <DIGIT: //9\\ > <DIGIT:d> -> <NUM: d > <NUM:n> <DIGIT:d> -> <NUM: //[N;D;] <*N*> * RADIX + <*D*> \\ > We've replaced the part-of-speech <NUMBER> with <NUM>. This grammar will interpret numbers as being in whatever base appears in RADIX. Two rules complete the grammar, and allow the specification for an input base: <NUM:n> -> <NUMBER: n > <NUM:n> BASE <NUM:n> -> <NUMBER: ... > a) Would it mean the same thing if we replaced the first of these last two rules by: <NUM:n> -> <NUMBER: //[n;] <*n*> \\ > That is, is //[N;] <*N*> \\ equivalent to N (Consider what happens when you invoke (<*...*>) either one). b) Fill in the "..." in the last rule. Make sure that the <NUM> to the right of BASE is interpretted in base ten. c) If someone accidently performs the assignment RADIX:= 9 ; the grammar just given will interpret numbers without BASE specification as being specified in base 9. Why? Fix the rule <NUM:n> -> <NUMBER: n > so to guarantee that the <NUM> will always appear in base ten. 3.5 Flexible Word Order By Using Masks Upon Indices An array part-of-speech offers a set of parts-of-speech under one name. Whenever any array part-of-speech is matched, an index is set, which distinguishes among the possible members. Consider our EXPR-BOP-EXPR rule that enforces precedence: <EXPR[i]:x> <BOP[j]:y> <EXPR[k]:z> -> IF i =< j & j > k THEN <EXPR[j]: ... > FI As usual, the indices, i,j, and k, are set up upon each match, whenever this rule is about to apply. Those indices are used here in an IF-statement, to allow only certain combinations of their values to apply. When the condition "i=<j & j>k" fails, the rule doesn't apply. The IF in this rule enforces precedence grouping. The condition is an arithmetic expression: We think of i, j, and k as integers, as we compare them with "=<" and ">". In contrast now, let's think of each integer value as a bit-pattern. Doing so will allow us to express concisely grammars that have ~free ~word ~order. For example, our model language ICL has a part which is free of word order. We want to accept a quantifier that can look like: FOR I ~FROM 1 ~TO 10 ; or like FOR I ~TO 10 ~FROM 1 ; Following the "FOR I" there are two clauses, a TO- and a FROM-clause. We wish that they may appear in any order. We wish also that only ~one of each kind of clause be admitted. We want to disallow for example: FOR I TO 10 TO 10 ; We begin to write such a grammar as follows. We introduce the part- of-speech <AQUANT> to denote "FOR I", or that followed by TO- and/or FROM-clauses: FOR <ID:x> -> <AQUANT: ... > <AQUANT:x> FROM <EXPR:e> -> <AQUANT: ... > <AQUANT:x> TO <EXPR:e> -> <AQUANT: ... > This grammar allows an AQUANT to have any number of FROM and TO clauses, in any order. We limit the appearence of FROM and TO clauses to only one each by the following notation. The subscripts are new: FOR <ID:x> -> <AQUANT(-FROM-TO): ... > <AQUANT(-FROM):x> FROM <EXPR:e> -> <AQUANT(+FROM): ... > <AQUANT(-TO):x> TO <EXPR:e> -> <AQUANT(+TO): ... > ----------------(parentheses in previous 3 lines mean subscripting)---- We think of the subscripted TO and FROM as two ~bits associated with AQUANT. We agree that: If the ~TO bit is on, then AQUANT already has a ~TO-clause, and IF the ~FROM bit is on, the AQUANT already has a ~FROM-clause. Thus: FOR I FROM 1 is an AQUANT with the FROM bit set (written as ~+FROM) and with the TO bit zero (written as ~-TO). FOR I TO 1 is an AQUANT with -FROM+TO FOR I FROM 1 TO 10 is an AQUANT with +FROM+TO FOR I TO 10 FROM 1 ditto FOR I is an AQUANT with -FROM-TO. Our first rule ("FOR <ID>") generates an AQUANT (on the righthand side) with both bits off (-TO-FROM), because no FROM nor TO clause exist in that AQUANT, yet. The second rule (FROM) insists that the given AQUANT ~not contain a FROM-clause (-FROM). The result from this rule is an AQUANT with the FROM bit set (+FROM). The second rule thus allows for the introduction of a FROM-clause only if one is not already present. Similarly, the TO rule allows for the introduction of a TO clause only if one is not already present (-TO). Once the TO clause is acquired, the resulting AQUANT has the TO bit set (+TO), and so will never again admit a TO clause. 3.5.1 Written In Our Official Notation We write these rules in our official notation as follows. We treat the AQUANT's index as a ~bit-pattern. We allocate two bits of information to be carried with the part-of- speech AQUANT via: POS AQUANT[4]: ... ; The index associated with an AQUANT can thus range from 0 to 4. We will use only the values 0 thru 3, to represent two bits of information. We will apply masks to the index, to test each of the two bits in the index. We let the names FROM and TO denote the masks: VAR FROM, TO = INT ; FROM:= 1; "Low-order bit" TO:= 2; "Next lowest-order bit" The rules follow: FOR <ID:x> -> <AQUANT[0]: ... > <AQUANT[a]:x> FROM <EXPR:e> -> IF (a & FROM) = 0 THEN <AQUANT[ a ! FROM ] : ... > FI <AQUANT[a]:x> TO <EXPR:e> -> IF (a & TO) = 0 THEN <AQUANT[ a ! TO ] : ... > FI The first rule generates an AQUANT with no bit set (0). It has neither a FROM nor a TO clause yet. The IF-clause in the second rule reads as: IF the FROM bit of ~a is off THEN ... (Section 22.1.2 introduce the operators ~& (and), and ~! (or)). Thus, the second rule applies only when a FROM clause is not present ( (a&FROM)=0 ). Similarly, the last rule, the TO-rule, applies only when a TO clause is not present ( (a&TO)=0 ). The FROM rule generates an AQUANT whose index is "a ! FROM". "A ! FROM" is identical to the given ~a, except that the FROM bit is set. Thus, the acquisition of a FROM clause turns on the FROM bit. Similarly, the TO-rule generates an AQUANT which has the TO bit set ("a ! TO"). Each rule tests only one bit in an AQUANT's index (~a) and sets only one bit. That one bit is used to indicate the presence or absence of a particular kind of clause. 3.5.2 More Than Two Clauses There may be more than two clauses. For example, our FOR-quantifier can actually contain four clauses beyond the FOR: FOR I FROM 1 TO 100 BY 3 IN 20 ; Again, we assign one bit for each of the four possible clauses, FROM, TO, BY, and IN. This time, we use four bits, and so declare the AQUANT part-of-speech as having an index between 0 and 16: POS AQUANT[16] : ... ; We declare our four masks via: VAR FROM, TO, BY, IN = INT; FROM:= 1; TO:= 2; BY:= 4; IN:= 8; We express the rules as: FOR <ID:x> -> <AQUANT[0]: ... > <AQUANT[a]:x> FROM <EXPR:e> -> IF (a & FROM) = 0 THEN <AQUANT[ a ! FROM ] : ... > FI <AQUANT[a]:x> TO <EXPR:e> -> IF (a & TO) = 0 THEN <AQUANT[ a ! TO ] : ... > FI <AQUANT[a]:x> BY <EXPR:e> -> IF (a & BY) = 0 THEN <AQUANT[ a ! BY ] : ... > FI <AQUANT[a]:x> IN <EXPR:e> -> IF (a & IN) = 0 THEN <AQUANT[ a ! IN ] : ... > FI Now, the four clauses can appear in any order, and duplicate occurences of a clause are forbidden. In general, conditions that involve combinations of bits can be programmed. For more uses, see Thompson[] and Blank[]. 3.6 Conclusion ~Array, or ~parameterized, parts-of-speech come with ~indices that can be involved in computations. Those computations can affect when rules may apply. We've seen precedence grammars, where those indices are thought of as integers. We've seen also where those indices are viewed as bit- patterns, basing conditions on the settings of individual bits in the index. Both ways of using an index can be combined as well. Continue to think of the index as a set of bits. You can use some of the bits to encode one (or more) integers.