- - - - - RELATIVE PRECEDENCE OF INFIX/RASFIX/LASFIX/PREFIX/POSTFIX OPERATORS
When properties (other than openfix/closefix) are combined, the parser considers all alternatives. If there is exactly one alternative that allows the expression to be extended, that alternative is taken. Otherwise, if the current expression can be terminated at the current point, this is done. Otherwise (cannot extend nor terminate, or there are two or more ways to extend), the parser stops with an error. So, for example, setting
(postarg+postbreak, prebreak+prearg) for atoms,
(postarg+postbreak, prefix) for '-' , and
(postfix, prebreak+prearg) for '^'
says that
* atoms do not operate on anything, but may be operated upon from either end,
* the - sign is a prefix operator, and may start an expression or
may be the operand of something to its left,and
* the ^ is postfix and may terminate the expression or may be operated upon from its right.
Therefore, ----A^^^---B^CDE^-C to be parsed as three expressions, ----A^^^,
Properties may be combined as follows:
prefix+postfix: means infix only (left/right/non associative iff prepower >/</= postpower)
prefix+postarg: means prefix only
prearg+postfix: means postfix only
prearg+postarg: means treat as closed expr (i.e. variable or literal)
prefix+postfix+prearg: either postfix or infix
prefix+postfix+postarg: either prefix or infix
prefix+prearg+postarg: either prefix or variable
postfix+postarg+prearg: either postfix or variable
prefix+prearg+postfix+postarg: either infix or prefix or postfix or variable
- - - - - RELATIVE PRECEDENCE OF INFIX/RASFIX/LASFIX/PREFIX/POSTFIX OPERATORS
Let
op be an infix/lasfix/rasfix operator with declared binding power
b. Such an operator can be seen as both prefixoid and postfixoid: it acts on things to its right as a prefix of binding power
br, and on things to its left as a postfix with binding power
bl, where
bl and
br are as given below:
if op is infix then br = b and bl = b
if op is lasfix then br = b + e and bl = b - e
if op is rasfix then br = b - e and bl = b + e
for a very small epsilon. In this schema, for a prefix or postfix operator op we must consider
if op is prefix then br = b + e and bl = 0
if op is postfix then br = 0 and bl = b + e
When in an expression we have a sequence ...
L X R ..., where
L is prefixoid,
R is postfixoid, and
X is is a closed, balanced expression, there is a "tug-of-war" between
L and
R to see who gets applied first to
X. The parser will interpret the expression as
...(... L X) R ........ if br(L) - bl(R) > e, and as
........ L (X R ...)... if br(L) - bl(R) < -e.
If -
e
d
br(
L) -
br(
R)
d
e the expression is considered to be ambiguous, and an error results. As a consequence of this rule, let
pre,
pos,
inf,
las, and
ras be operators of the obvious types and having same bindingPower. Let also
A,
B,
C be closed, balanced expressions. Then
pre A inf B pre A ras B A inf B pos A las B pos
A inf B inf C
A inf B las C A inf B ras C A las B inf C A ras B inf C
A las B ras C A ras B las C
are all invalid, whereas
pre A las B A las B las C pre A las B las C
A ras B pos A ras B ras C A ras B ras C pos
are valid, and parsed as
(pre A) las B (A las B) las C ((pre A) las B) las C
A ras (B pos) A ras (B ras C) A ras (B ras (C pos))
We remark that a lasfix behaves like a prefix operator whose result is a postfix one of slightly lower binding power. For example, to get the usual behavior of binary '+' and '-' we must define them as lasfix operators of same binding power. Then, in an expression like A - B + C - D we may think of the first - as a prefix operator, that when applied to B yields the postfix operator -B, that is in turn applied to A. Similarly, + applied to C yields the postfix operator +C, that is applied to the result of -B applied to A. The binding power of the postfix +C is lower (by 2e) than that of the "prefix" -, so - wins in the tug-of-war for B.
- - - - - OPERATORS WITH MULTIPLE PROPERTIES
Yes, Virginia, A lexeme L may have more than one property flag set.
When properties (other than openfix/closefix) are combined, the parser considers all alternatives. If it gets to a situation where no choice leads to a valid parsing, or where there are more than two ways to proceed with the parsing, it will complain.
More precisely, for every pair of conscutive non-matchfix lexemes (or balanced bracketed subexpressions) ... L R ..., the parser will try to determine, based only on the properties of L and R (and assuming that there are no syntax errors elsewhere) whether
case 1. L is to be applied to R, or to some expression beginning with R, or
case 2. R is to be applied to L, or to some expression ending with L, or
case 3. L ends a complete subexpression and R begins an unrelated one, or
case 4. L and R crash: each wants to apply to the other.
If L has only one property out of {prefix, prearg}, and R has only one of {postfix, postarg}, then exactly one of {1,2,3,4} holds:
case 1 if L.prefix and R.postarg
case 2 if L.prearg and R.postfix
case 3 if L.prearg and R.postarg
case 4 if L.prefix and R.postfix
Case 4 is a syntax error: the parser complains the input is invalid because of "operator conflict".
If L is prefix+prearg, and R is postfix+postarg, then we cannot decide between cases 1 and 2. The parser considers this a "serious" ambiguity, and stops with the message "operator type ambiguity".
If the options are (L.prefix, R.postfix+postarg), then the parser will assume R to be postarg (case 1), since the other choice would give an error. Similarly, if L is prefix+prearg and R is just postfix, then the parser considers L to be prearg (case 2).
If L is prefix+prearg but R is just postarg, then there is an ambiguity is between cases 1 and 3. The parser consider this a "minor" ambiguity, and resolves it by taking the alternative that leads to the longest expression, namely case 1 (L.prefix). Similarly, if the options are (L.prearg, R.postfix+postarg), then the parser will take R.postfix.
Except in case of error, the analysis above determines exactly one of {L.prefix, L.prearg}, and one of {R.postfix, R.postarg}, and exactly one of cases {1,2,3}, for all consecutive pairs L, R. Let's then put an arrow from L to R in case 1, from R to L in case 2, and no arrow in case 3. Define the strength of the arrow as being L.prepower in case 1, R.postpower in case 2. Then apply repeatedly the following procedure:
!!!! DESCRIBE IT
prep(X) be (if X.prefix THEN X.prepower ELSE 0),
posp(X) be (if X.postfix THEN X.postpower ELSE 0).
then the cases are
case 1 if prep(L) > 0, posp(R) = 0, (L.prefix and R.postarg)
case 2 if prep(L) = 0, posp(R) > 0, (L.prearg and R.postfix)
case 3 if prep(L) = 0, posp(R) = 0, (L.prearg and R.postarg)
case 4 if prep(L) > 0, posp(R) > 0, (L.prefix and R.prefix)
If L.prefix and L.prearg are both true, the parser considers each alternative in turn. Same if R.postfix and R.postarg are both true. Then if some alternative gives case 1 and some gives case 2, the parser complains the input is "ambiguous because of operator polymorphism". If all alternatives give case 4, the parser complains the input is invalid because of "operator conflict".
The valid combinations and their meanings are given below:
1. At most one of
L.
openfix and
L.closefix may be true. If
L.openfix is TRUE, the parser always treats
L as an openfix, that is, tries to pair it with a matching closefix
R (and generates an error if the latter is not found). After the two have been matched, the parsed subexpression
L...R is then treated as a single lexeme, with all the properties of
L, minus
L.openfix, plus
L.selfix.
For example, if we want '[...]' to be treated as a postfix subscripting operator, as in mesa, we may declare '[' as being openfix+postfix. Then the formula [a] will be treated as a postfix(+selfix) lexeme, and 'foo[a][b]' will parse into
( (bracket $b) ( ( bracket $a ) $foo))
(This is rougly equivalent to the subfix property of the original Juno parser).
2. At most one of L.infix, L.rasfix, and L.lasfix may be true. L.lasfix may be combined with L.prefix, and l.rasfix with L.postfix. All other combinations of infix/lasfix/rasfix with prefix/postfix are invalid. (Also, L.prefix and L.postfix are mutually exclusive.)
If
L.lasfix and
L.prefix are both true, then
L is treated as a
lasfix if it follows an expression to which it can be applied, otherwise it is treated as
prefix.
A natural example is '-' and '+', in formulas such as '-A+B-C*(-D)'.
Similarly, if L.rasfix and L.postfix are both true, then L is treated as a rasfix if it precedes a what looks like the beginning of a valid formula to which it can be applied. Otherwise L is treated as a postfix.
3. If
L.prefix and
L.selfix are both TRUE, then
L will be treated as a
prefix if it precedes what looks like the beginning of a valid formula to which it can be applied. Otherwise it is treated as a
selfix, i.e. a closed formula in itself.
In the subscript example, a bracketed subformula like '[a]' is treated as postfix (because '[' is openfix+postfix) and selfix (default). This allows 'expressions like '[a]
( (bracket $b) ( ( bracket $a ) $foo))
(This is rougly equivalent to the subfix property of the original Juno parser).
As another example, if we declare all atoms to be prefix+selfix, and '(' to be openfix+prefix+selfix, a formula like
'foo bar baz' parses to ( $foo ( $bar $baz ) )
'foo (bar,baz)' parses to ( $foo ( leftParen ( comma $bar $baz ) ) )
'gip (foo bar) baz' parses to ( $gip ( ( leftParen ( $foo $bar ) ) $baz ) )
'(foo bar), baz' parses to ( comma ( leftParen ( $foo $bar ) ) $baz )
i.e. atoms and closed expressions are treates as functions if they are applied to something, and as simple values otherwise.
A selfix+prefix lexeme is parsed as a prefix if it is followed by a valid expression, otherwise it is treated as a selfix. For example, we may want to use this as the default for atoms, if we want to allow 'foo x' where foo is a user-defined prefix operator.
Similarly, a selfix+postfix (resp. +infix) lexeme is treated as postfix (resp. infix) if it is preceded by a valid expression, otherwise it is treated as selfix.
Lexemes with more than one of {selfix, postfix, prefix, infix} make it easier to write ambiguous expressions. ?? IS IT HARD TO DETECT THEM ?? .