[Cyan]<PreISLCedar>Stolfi>Hercules>HerculesSyntaxDoc.tioga!1

HerculesSyntaxDoc.tioga

Written by Stolfi, February 16, 1984 7:26 pm

Last edited by Stolfi, April 18, 1984 8:53:50 pm PST

- - - - - RELATIVE PRECEDENCE OF INFIX/RASFIX/LASFIX/PREFIX/POSTFIX OPERATORS

When properties (other than openfix/closefix) are combined, the parser considers all alternatives. If there is exactly one alternative that allows the expression to be extended, that alternative is taken. Otherwise, if the current expression can be terminated at the current point, this is done. Otherwise (cannot extend nor terminate, or there are two or more ways to extend), the parser stops with an error. So, for example, setting

(postarg+postbreak, prebreak+prearg) for atoms,
(postarg+postbreak, prefix) for '-' , and
(postfix, prebreak+prearg) for '^'

says that
* atoms do not operate on anything, but may be operated upon from either end,
* the - sign is a prefix operator, and may start an expression or
may be the operand of something to its left,and
* the ^ is postfix and may terminate the expression or may be operated upon from its right.

Therefore, ----A^^^---B^CDE^-C to be parsed as three expressions, ----A^^^,

Properties may be combined as follows:

prefix+postfix: means infix only (left/right/non associative iff prepower >/</= postpower)
prefix+postarg: means prefix only
prearg+postfix: means postfix only
prearg+postarg: means treat as closed expr (i.e. variable or literal)

prefix+postfix+prearg: either postfix or infix
prefix+postfix+postarg: either prefix or infix

prefix+prearg+postarg: either prefix or variable
postfix+postarg+prearg: either postfix or variable

prefix+prearg+postfix+postarg: either infix or prefix or postfix or variable

- - - - - RELATIVE PRECEDENCE OF INFIX/RASFIX/LASFIX/PREFIX/POSTFIX OPERATORS

Let op be an infix/lasfix/rasfix operator with declared binding power b. Such an operator can be seen as both prefixoid and postfixoid: it acts on things to its right as a prefix of binding power br, and on things to its left as a postfix with binding power bl, where bl and br are as given below:

if op is infix then br = b and bl = b
if op is lasfix then br = b + e and bl = b - e
if op is rasfix then br = b - e and bl = b + e

for a very small epsilon. In this schema, for a prefix or postfix operator op we must consider

if op is prefix then br = b + e and bl = 0
if op is postfix then br = 0 and bl = b + e

When in an expression we have a sequence ... L X R ..., where L is prefixoid, R is postfixoid, and X is is a closed, balanced expression, there is a "tug-of-war" between L and R to see who gets applied first to X. The parser will interpret the expression as

...(... L X) R ........ if br(L) - bl(R) > e, and as

........ L (X R ...)... if br(L) - bl(R) < -e.

If -e d br(L) - br(R) d e the expression is considered to be ambiguous, and an error results. As a consequence of this rule, let pre, pos, inf, las, and ras be operators of the obvious types and having same bindingPower. Let also A, B, C be closed, balanced expressions. Then

pre A inf B pre A ras B A inf B pos A las B pos

A inf B inf C

A inf B las C A inf B ras C A las B inf C A ras B inf C

A las B ras C A ras B las C

are all invalid, whereas

pre A las B A las B las C pre A las B las C

A ras B pos A ras B ras C A ras B ras C pos

are valid, and parsed as

(pre A) las B (A las B) las C ((pre A) las B) las C

A ras (B pos) A ras (B ras C) A ras (B ras (C pos))

We remark that a lasfix behaves like a prefix operator whose result is a postfix one of slightly lower binding power. For example, to get the usual behavior of binary '+' and '-' we must define them as lasfix operators of same binding power. Then, in an expression like A - B + C - D we may think of the first - as a prefix operator, that when applied to B yields the postfix operator -B, that is in turn applied to A. Similarly, + applied to C yields the postfix operator +C, that is applied to the result of -B applied to A. The binding power of the postfix +C is lower (by 2e) than that of the "prefix" -, so - wins in the tug-of-war for B.

- - - - - OPERATORS WITH MULTIPLE PROPERTIES

Yes, Virginia, A lexeme L may have more than one property flag set.
When properties (other than openfix/closefix) are combined, the parser considers all alternatives. If it gets to a situation where no choice leads to a valid parsing, or where there are more than two ways to proceed with the parsing, it will complain.

More precisely, for every pair of conscutive non-matchfix lexemes (or balanced bracketed subexpressions) ... L R ..., the parser will try to determine, based only on the properties of L and R (and assuming that there are no syntax errors elsewhere) whether

case 1. L is to be applied to R, or to some expression beginning with R, or

case 2. R is to be applied to L, or to some expression ending with L, or

case 3. L ends a complete subexpression and R begins an unrelated one, or

case 4. L and R crash: each wants to apply to the other.

If L has only one property out of {prefix, prearg}, and R has only one of {postfix, postarg}, then exactly one of {1,2,3,4} holds:

case 1 if L.prefix and R.postarg

case 2 if L.prearg and R.postfix

case 3 if L.prearg and R.postarg

case 4 if L.prefix and R.postfix

Case 4 is a syntax error: the parser complains the input is invalid because of "operator conflict".

If L is prefix+prearg, and R is postfix+postarg, then we cannot decide between cases 1 and 2. The parser considers this a "serious" ambiguity, and stops with the message "operator type ambiguity".

If the options are (L.prefix, R.postfix+postarg), then the parser will assume R to be postarg (case 1), since the other choice would give an error. Similarly, if L is prefix+prearg and R is just postfix, then the parser considers L to be prearg (case 2).

If L is prefix+prearg but R is just postarg, then there is an ambiguity is between cases 1 and 3. The parser consider this a "minor" ambiguity, and resolves it by taking the alternative that leads to the longest expression, namely case 1 (L.prefix). Similarly, if the options are (L.prearg, R.postfix+postarg), then the parser will take R.postfix.

Except in case of error, the analysis above determines exactly one of {L.prefix, L.prearg}, and one of {R.postfix, R.postarg}, and exactly one of cases {1,2,3}, for all consecutive pairs L, R. Let's then put an arrow from L to R in case 1, from R to L in case 2, and no arrow in case 3. Define the strength of the arrow as being L.prepower in case 1, R.postpower in case 2. Then apply repeatedly the following procedure:

!!!! DESCRIBE IT

prep(X) be (if X.prefix THEN X.prepower ELSE 0),

posp(X) be (if X.postfix THEN X.postpower ELSE 0).

then the cases are

case 1 if prep(L) > 0, posp(R) = 0, (L.prefix and R.postarg)

case 2 if prep(L) = 0, posp(R) > 0, (L.prearg and R.postfix)

case 3 if prep(L) = 0, posp(R) = 0, (L.prearg and R.postarg)

case 4 if prep(L) > 0, posp(R) > 0, (L.prefix and R.prefix)

If L.prefix and L.prearg are both true, the parser considers each alternative in turn. Same if R.postfix and R.postarg are both true. Then if some alternative gives case 1 and some gives case 2, the parser complains the input is "ambiguous because of operator polymorphism". If all alternatives give case 4, the parser complains the input is invalid because of "operator conflict".

The valid combinations and their meanings are given below:

1. At most one of L.openfix and L.closefix may be true. If L.openfix is TRUE, the parser always treats L as an openfix, that is, tries to pair it with a matching closefix R (and generates an error if the latter is not found). After the two have been matched, the parsed subexpression L...R is then treated as a single lexeme, with all the properties of L, minus L.openfix, plus L.selfix.

For example, if we want '[...]' to be treated as a postfix subscripting operator, as in mesa, we may declare '[' as being openfix+postfix. Then the formula [a] will be treated as a postfix(+selfix) lexeme, and 'foo[a][b]' will parse into

( (bracket $b) ( ( bracket $a ) $foo))

(This is rougly equivalent to the subfix property of the original Juno parser).

2. At most one of L.infix, L.rasfix, and L.lasfix may be true. L.lasfix may be combined with L.prefix, and l.rasfix with L.postfix. All other combinations of infix/lasfix/rasfix with prefix/postfix are invalid. (Also, L.prefix and L.postfix are mutually exclusive.)

If L.lasfix and L.prefix are both true, then L is treated as a lasfix if it follows an expression to which it can be applied, otherwise it is treated as prefix.

A natural example is '-' and '+', in formulas such as '-A+B-C*(-D)'.

Similarly, if L.rasfix and L.postfix are both true, then L is treated as a rasfix if it precedes a what looks like the beginning of a valid formula to which it can be applied. Otherwise L is treated as a postfix.

3. If L.prefix and L.selfix are both TRUE, then L will be treated as a prefix if it precedes what looks like the beginning of a valid formula to which it can be applied. Otherwise it is treated as a selfix, i.e. a closed formula in itself.

In the subscript example, a bracketed subformula like '[a]' is treated as postfix (because '[' is openfix+postfix) and selfix (default). This allows 'expressions like '[a]

( (bracket $b) ( ( bracket $a ) $foo))

(This is rougly equivalent to the subfix property of the original Juno parser).

As another example, if we declare all atoms to be prefix+selfix, and '(' to be openfix+prefix+selfix, a formula like

'foo bar baz' parses to ( $foo ( $bar $baz ) )

'foo (bar,baz)' parses to ( $foo ( leftParen ( comma $bar $baz ) ) )

'gip (foo bar) baz' parses to ( $gip ( ( leftParen ( $foo $bar ) ) $baz ) )

'(foo bar), baz' parses to ( comma ( leftParen ( $foo $bar ) ) $baz )

i.e. atoms and closed expressions are treates as functions if they are applied to something, and as simple values otherwise.

A selfix+prefix lexeme is parsed as a prefix if it is followed by a valid expression, otherwise it is treated as a selfix. For example, we may want to use this as the default for atoms, if we want to allow 'foo x' where foo is a user-defined prefix operator.

Similarly, a selfix+postfix (resp. +infix) lexeme is treated as postfix (resp. infix) if it is preceded by a valid expression, otherwise it is treated as selfix.

Lexemes with more than one of {selfix, postfix, prefix, infix} make it easier to write ambiguous expressions. ?? IS IT HARD TO DETECT THEM ?? .

- - - - - !!!! MAYBE JUNK BELOW:

!!!! MAYBE JUNK BELOW:

An (external) Juno expression, <expr>, has one of the following forms:

X where X is a selfix lexeme, or

<op> <expr> where <op> is a prefix operator, or

<expr> <op> where <op> is a postfix operator, or

<expr> <op> <expr> where <op> is an infix operator, or

F <expr 1> <expr 2> ... <expr n> G where F is an openfix lexeme, G is
the matching closefix lexeme, and n is zero or more.

These rules are further modified by the binding powers of the operators involved. In general, an infix or prefix (resp. infix or postfix) operator placed to the left (resp. right) of an expression is considered to be applied to it only if the right (resp. left) binding power b of the operator is strictly less than the left (resp. right) cleavage resistance c of the expression. If b > c, then b is actually applied to some proper prefix of the expression. If b = c, then the expression is ambiguous, and the parser will complain ?? WILL IT ??

The right binding power rbp(X) of a closed expression X (a single lexeme or a complete openfix/closefix expression) is

zero if the lexeme is selfix, postfix, openfix, or closefix;
props.bindingPower+1/2 if prefix,
props.bindingPower+1/4 if left-associative infix,
props.bindingPower if right-associative infix

The left binding power is similarly defined, with left <--> right and prefix <--> postfix.

The left cleavage resistance of an expression E is:

infinite if the expression is a single lexeme, or an expression delimited
by a matching openfix/closefix pair;
if E is <prefix op> E1, then it is the left cleavage strength of E1;
if E is E1 <postfix op>, then it is MIN[left cleavage res. of E1, right b. p of the postfix op>]
???

The parser maps an input expression <expr> into aparse tree <expr*> (also called symbolic expression), with a format similar to that of LISP S-expressions. All lexemes except closefix and nullfix ones are reperesented in the parse tree:

- If <expr> consists of a single selfix lexeme X, then <expr*> is X.

- If <expr> has the form

<op> <expr> for a prefix operator <op>, or
<expr> <op> for a postfix operator <op>,

then <expr*> is the two-element list ( <op*> <expr*> ).

- If <expr> has the form

<expr1> <op> <expr2>

where <op> is an infix operator, then <expr*> is the three-element list

( <op*> <expr1*> <expr2*> ) .

- The expression

F <expr1> <expr2> ... <expr n> G

where F is openfix lexeme, G is the matching closefix, n is zero or more,
and each <expr i> is a valid complete Juno expression, is mapped into
the (n+1) element list

( F <expr1*> <expr2*> ... <expr n*> ) .

- - - - SYMBOLIC EXPRESSIONS

The parsing of a juno text <x> produces a symbolic expression <x'> according to the table below:

x - -> x - - - - if x is atom or integer or string
f(<args>) - -> (<leftPren> $f <args'>) - - - - note: three elements when '(' is subfix
a op b - -> ($op $a $b) - - - - if op is infix operator
(<args>) - -> (<leftParen> <args'>) - - - - note: two elements when '(' is matchfix
x, y - -> (<comma> $x $y) - - - - comma is an infix operator (as ';', 'then', etc.)
x, y, z - -> (<comma> $x (<comma> $y $z)) - - infixes associate to the right

- - - - PROPOSED EXTENSIONS TO JUNO MOUSED-IN ACTIONS

Ideas for JunoStorage (and maybe Hercules) April 18, 1984 8:54:11 pm PST

stroke: same as draw
width: [width: REF REAL] or [p1, p2: Point]
ends: [ends: ATOM] (one of $butt, $square, $round)

paint: [color: {ATOM or REF Graphics.Color}] (if atom, one of $red, $lightRed, ...)
startfill, endfill: []
fillEdge: same as draw
(each edge in a separate action to allow editing. Should we have hierarchical actions?)