/Indigo/CaminoReal/ParsingDoc.tioga
Syntax-Directed Editing in CaminoReal
by Dennis Arnon, Kevin McIsaac, and Carl Waldspurger
© Copyright 1987 Xerox Corporation. All rights reserved.
Abstract: CaminoReal is a direct manipulation style user interface for mathematical software. Its fundamental module is a syntax-directed, WYSIWYG mathematical expression editor. This paper summarizes the design and implementation of the editor, which have been strongly molded by the environment in which it is used: (1) we need an internal form for expressions which can be easily evaluated and passed to an algebra system; (2) we want clean separation of translation of user input into internal form, from display of internal form; (3) we want character-grain incrementality, i.e. the display is updated after every user keystroke or mouse action.
Keywords: Mathematical Expression Editing, Parsing, Incremental Parsing, Syntax-Directed Editing, WYSIWYG
XEROX  Xerox Corporation
   Palo Alto Research Center
   3333 Coyote Hill Road
   Palo Alto, California 94304

For Internal Xerox Use Only
1. Introduction
2. Basic architecture
1. There are multiple possible linear input languages, and (1 or 2D) output "languages". Each is translated to, or created from, an abstract tree representation by a table-driven parser.
1. All user keystrokes and mouse actions get passed to the scanner/parser in the form of an input stream, consisting of a sequence of input tokens and "actions" ("actions" may typically be represented as control characters in the input stream). A user tool TipTable just defines a mapping from the actual keystrokes and mouse clicks to the set of characters and actionNames understood by the scanner.
2. Nonterminal symbols of the input grammar may appear in the input stream. They become placeholder nodes of that type. The typical situation in which this occurs is template expansion.
2. The scanner may mark a token as "possibly unfinished", in which case subsequent token/actions will be resends of this one until there is a final resend in which it is marked "finished". When it gets a possibly unfinished token, the parser saves an audit trail from its previous state to the new one brought about by the token (but it actually moves as though the token were finished). This is accomplished by sending it a "checkpoint" action. When it gets a resend of that token, it backs up to the previous state and reprocesses using the revised token (this is accomplished by first sending it a "rollback" action, followed by the new token). The reason for this mode of action is so that the display can be driven from the abstract tree.
3. Note that "possibly unfinished" tokens fit well with template expansion - a template name is sent to the parser as possibly unfinished, then when the scanner sees the ^E, it sends a "rollback" action to the parser, followed by the new stream of tokens.
3. The scanner has a buffer. It sends one token or action at a time to the parser, upon request from the parser. The scanner only reads ahead in the input stream if its buffer is empty. Examples of uses of the buffer: Template expansions go into the scanner buffer. Or as an integer is built, char by char, the scanner is sending integer tokens, marked possibly unfinished, to the parser. Finally the integer is terminated by a comma; the scanner puts both integer and comma in its buffer, and the next time the parser asks for a token/action, it sends the integer marked finished, at the next token/action request, it sends the comma.
2. The output display is generated only from the current abstract tree.
3. Abstract trees
1. Nodes in the abstract tree may be either functionNames, atoms (e.g. integer, real, symbol), or nonterminalSymbols (which will display as appropriately-typed placeholders). A functionName can be any expression; if it not an atom (i.e. an identifier), Display Functional is the only option.
4. Parsing and Editing
General
1. The input grammar consists of pure functional notation, augmented by operators with specified fixities and precedences. Only unary operators may be postfix. We do not expect to recognize space as multiplication unless some equivalent alternate whitespace character is defined.
2. Each "function" in the expression tree comes from some production of the grammar. Every production that has nonterminals on the right hand side corresponds to a function (both of these statements assume absence of "boring" productions, e.g. for n-aryness). We assume a one-one correspondence of right side nonterminals and arguments in the function tree. For reasonableness, we assume that argument subtrees get put into the abstract tree in the same left to right order in which their corresponding nonterminals occur in grammar productions. Thus as we recognize a production from left to right, we are putting arguments into the abstract from left to right.
3. The input grammar is only used to get stuff into an abstract tree. When you are editing a existing a given abstract tree, e.g. replacing subtrees, you can do whatever you want in principle. Thus placeholder type becomes irrelevant: I can replace a placeholder node of any type by an arbitrary subtree e.g. by subtree copy. Thus in reality there is only one nonterminal in our "language", i.e. Expression. However, if I replace a node with the result of parsing an input stream, I have to supply a stream that the parser will accept.
4. We do not expect to do wraps in an input stream.
Selections
1. A selection (primary, copy, or move is (the complete subtrees of) one or more contiguous siblings. The rule should be that a single left click selects some subtree, then a right mouse click on a sibling extends it over all siblings in between, inclusive.
2. There need not be a current Primary selection at all times. When one is made, however, this implies that the trees which it includes are going to be replaced by one or more new trees. The possiblities are: (1) we select a rope, bug "parse", and insert the results as one or more new siblings replacing the Primary selection, as children of the same parent (new "sequence" parent inserted if Primary Selection is root) (2) we type keystrokes which are parsed and the results inserted as one or more new siblings replacing the Primary selection, (3) we copy (the current copy selection, which is one or more contiguous siblings) in on top of the Primary selection.
3. If there is no current Primary selection, and we are not in the middle of a parse, then it is meaningless to start typing parsable input. I.e. before we can parse, we have to "root" the parser, and this is done by making a primary selection. The first token the parser gets is either a syntax error or a legitimate replacement for (the subtree rooted at) that node; we do that replacement, and are off and running in the current parse.
4. There are two possible modes of implementing "rooting". One is that you start grafting the new abstract tree into the existing tree as it is built. The other is that you build it all somewhere and graft it in when it'd done. In either case, the new input is built by a recursive call to the parser.
The ParsePointer
A pointer to a node in the tree whose associated production is still incomplete, and for whose associated production the cursor currently sits between two symbols of rhs (as opposed to upon a nonterminal). I.e. a node all of whose current children are complete, and at which we're sitting to listen to the next input token.
The ParsePointer only "exists" between input tokens. I.e. each time we listen to an input token, the ParsePointer may move through several nodes, but eventually it stops somewhere and sits between two rhs symbols.
When we suspend a parse, we just save a ParsePointer, i.e. a pointer to a node. The user must live with the consequences if the state of the production at a saved node changes while a parse is suspended, or if the node is deleted from the abstract tree.
With each internal (non-leaf) node of the abstract tree which is on the path from the current ParsePointer to the root of the current parse, is associated a production of the grammar, and a specification of where we are (i.e. a cursor position) in that production. The possible cursor positions are:
a. Between two symbols: we have not yet seen any token "belonging to" the symbol to the right of the cursor. That next token may either cause us to begin modifying the most recently added child of this node (this can happen iff there is a nonterminal to the left of the cursor, and the next token causes a different parse of it, (e.g. E -> "sum [a,b]" in the input stream, followed by "*"), or to insert and begin work on a new rightmost child (e.g. next token is the comma separating two args of a function, when we see it and move the cursor past it, we insert a new child and set it to a placeholder of type Expression).
b. Upon some nonterminal. This means we have started it, i.e. seen a terminal which is a leaf of some legal subtree, but it is not yet complete. The ParsePointer is at the unique child node whose cursor is pointing between two characters, and not upon some nonterminal.
We will never be "after the last symbol of rhs of the production", for in that case, the production would be complete and state would have been flushed.
Parser motion
1. As soon as we complete a node's production, the ParsePointer moves to parent, we check for completion there, if so, move to its parent, etc. All this before looking at the next input token/action.
2. The root node is thought of as having a nonexistent "parent", i.e. a fictitious node whose production is <StartSymbol> -> Expression. A normal parser return occurs if this production is completed, which is accomplished if ParsePointer is this parent node, if its cursor is sitting upon the <StartSymbol> nonterminal, and we receive the "EndOfInput" token. We then return the root node, i.e. its child.
2. We can complete a node's production when its cursor is sitting to the right of the rightmost symbol of its production's rhs, and we know that the next token cannot cause us to modify the most recently added child of this node. Then we know we can move the move the ParsePointer to its parent. Whether or not we can move the parent's cursor off the nonterminal upon which it is sitting, to point between that nonterminal and the next symbol to the right, depends on whether we are know that the next token cannot cause us to modify this child that we are of the parent. Example: in f[sum[a,b],c], when we see the first ] we can complete the node for sum and move to its parent P, when we see the , we can shift P's cursor off the nonterminal for the sum node and past the terminal for "," to rest upon the nonterminal S corresponding to the node N for c, when we see the next ] we can complete N, move back to P, move P's cursor past S, and past the terminal for ], completing P, then we wait to see if more input (e.g. "*") that requires us to
2. Downward child creation occurs when a lookahead token enables us to decide that the nonterminal N to the right of the cursor can be expanded to a particular production (or perhaps we can do a whole sequence of such expansions). Then we move the cursor of the current ParsePointer upon that nonterminal, create a new child, associate the rhs of the production with the child, set the cursor appropriately in that rhs, and set a new ParsePointer appropriately. This is topdown parsing.
Parser Actions
^D (DeletePrimary) - Delete the current Primary selection, replace with a placeholder of type Expression (error if no current Primary selection), Primary select that.
^V (SavePrimary) - Save the current Primary selection, replace it with a placeholder of type Expression (error if no current Primary selection), Primary select that. (perhaps user interface should pop up a new window for this save expression).
^T -(ParsePointerSelect) PrimarySelect the current ParsePointer.
^U (CursorChildSelect) - Error if current ParsePointer's cursor is not pointing upon some nonterminal symbol of its production's rhs; assuming it is, PrimarySelect the child corresponding to it.
^K (PrimaryFirstChildSelect) - Move the current Primary Selection to the leftmost (first) child of current primary selection (error if no current Primary Selection)
^P (PrimaryParentSelect) - Move the current Primary Selection to the parent of current primary selection (error if no current Primary Selection)
^L, ^M (PrimarySiblingSelect) - Move the current Primary Selection to the left (right) sibling of current primary selection (error if no current Primary Selection)
^A (Abort) - Abort the current parse. Equivalent to ForceCompletionAll followed by ForceCompletion followed by EndOfInput.
^F -(ForceCompletion) Error if the current ParsePointer's cursor is not pointing to the right of the rightmost symbol of its production's rhs; assuming it is, mark this production (node) as complete, and move ParsePointer to parent node.
^G -(ForceCompletionAll) Do ForceCompletion until ParsePointer is the root of this parse.
^R -(Resume) Resume the topmost suspended parse. Simply leaves the current parse, if any (assumes
^S -(Suspend) Suspend the current parse.
^N -(NewParse) Start a new parse to replace the current Primary Selection.
Parser: PROC [ROPE] RETURNS [AbstractTree ← NIL];
overallTree, parsePointer: AbstractTree;
selections
scanBuffer:
for each char c in ROPE do
[overallTree, parsePointer, selections, scanBuffer] ← ParseCycle[c, overallTree, parsePointer, selections, scanBuffer];
Scanner Actions
^E (Expand) - template expansion
^W (WrapExpand) - template expansion, wrap around current Primary selection, set .
Editing with abstract trees
1. Basic commands:
a. replace one or more contiguous subtrees by one or more new trees that become children of the same parent, if any (tree copy, or parse input)
b. add sibling before (as a placeholder node),
c. add sibling after (as a placeholder node),
d. wrap function around and make ith arg (i.e. ith arg is hot),
e. exchange subtrees (swap)
f. delete the subtrees of the current primary selection as children of their parent.
2. The ones other than replace are implemented by tree modification to create appropriate new nodes, then a replace.
Display and user interface notes
1. Text Capture: this is your fallback if you can't get the right on-screen behavior for something you want to do - you select some subtree, dump its text out into a buffer, edit, reparse and stuff back into the tree. It is a preorder traversal of the tree that puts it in functional notation.
2. DisplayFunctional is a display mode. Can be done to different levels, e.g. display the top level function, but display its rules in accordance with whatever formatting rules apply.
3. One could imagine a "prompting" parser action mode where, when a cursor moves to a position between two symbols, the symbol to the right is a nonterminal, and the symbol to the left could not possibly get a different derivation depending on the next input token (e.g. the symbol to the left is a terminal symbol), we immediately put a child in the abstract tree for that nonterminal, whose "value" is a placeholder of that type, and mark it to be highlighted in the display (i.e. this is prompting). Note that this is sort of the same as making the new node the current selection, but we want to distinguish it from that because here we're not starting up a new parse. This clearly seems hard, so we won't do it. Thus in pure linear input mode, there is no cursor flashing on the screen, and no prompting. When you use a template, you will see placeholders, and as you use actions to move around and make and replace selections, each current selection you have will be highlighted.
3. Whenever you select and replace a function name, the display goes back to DisplayFunctional (because first the old name is deleted, at which point there is no name, so there is no formatting rule for it), the characters you type for the new name (if you enter it this way) are echoed back to you until, perhaps, you have typed in a name that some formatting rule in the database matches, at which point that rule takes over the display. Note that until you enter a "terminate parse" action, the scanner/parser is listening to your keystrokes, so you can continue typing characters and making the name longer, for which you will get either a different display, or DisplayFunctional, depending on what is or is not in the display database.
Templates
1. Template definitions can end with an action, which will suspend the current parse. The default is is no action following a template, which has the effect of just inserting the characters of the rhs of the template into the input stream of the current parse. Since the template database can be changed anytime, the validity/meaningfulness of an input stream that contains templates is dependent on the current database.
Template expansion is equivalent to the insertion of some sequence of tokens and actions into the scanner's input stream.
Example: suppose we have some currently selected node that we are going to replace with parsed input, and the input stream is:
"f[sumEa^2Rb^2D,x]D".
The bold characters are control characters; E means "replace with template", R means "move laterally to the right and select the next sibling if any, else select the whole parent subtree", and D means "terminate input stream, and select the whole tree".
Assume that the sum template in the database is "sum[%, %]F", where the % are placeholders, where the F means "select the first child".
When the template is expanded, the given input stream becomes
"f[sum[%, %]Fa^2Rb^2D,x]D"
We assume that the interaction of template expansions, subparses, and legal control characters in a parsed input stream has only the the effect of expanding (e.g. replace some leaf node with a tree) and moving around in the abstract tree, i.e. we are never deleting or rearranging subtrees during a parse, or adding siblings. Hence we know that the node with which a suspended parse is associated (or the subtree that may have replaced it) is always present in the tree in the same position relative to its parent and siblings.
With each internal (non-leaf) node of the abstract tree which is on the path from the current ParsePointer to the root of the current parse, is associated a production of the grammar, and a specification of where we are (i.e. a cursor position) in that production. The possible cursor positions are:
a. Between two symbols: we have not yet seen any token "belonging to" the symbol to the right of the cursor. That next token may either cause us to begin modifying the most recently added child of this node (this can happen iff there is a nonterminal to the left of the cursor, and the next token causes a different parse of it, (e.g. E -> "sum [a,b]" in the input stream, followed by "*"), or to insert and begin work on a new rightmost child (e.g. next token is the comma separating two args of a function, when we see it and move the cursor past it, we insert a new child and set it to a placeholder of type Expression).
b. Upon some nonterminal. This means we have started it, i.e. seen a terminal which can be a leaf of some legal subtree, but it is not yet complete. The ParsePointer is at the unique child node whose cursor is sitting between two characters, and not upon some nonterminal.
We will never be "after the last symbol of rhs of the production", for in that case, the production would be complete and state would have been flushed.
In general, in any downward motion in the abstract tree, the parser saves its state at a node before creating and moving (the current selection) to a child. Thus its easy to suspend a parse. When the parser is moving upward in the tree, i.e. recognizing productions, it deletes the saved information, makes the recognized subtree the current selection, and moves the parser (but not the current selection) to the parent to reassume its state (unless it is now complete, in which case we iterate). Thus in general, when we make a selection, we can simply resume the parse from the state it has there, if any, or carry out some tree action which may cause us to throw away the saved state.
Displaying the abstract tree is separate from the parse states that may be stored in it.
Thus the semantics of a parser restart are: do a text capture on the full expression tree up to and including the selection associated with this parser, position the cursor after the resulting last character, and resume. In such a situation, we do not need a notion of a selection; next
Remember we assume that a function in the abstract tree can have any number of arguments (identifier XYZ is not the same in the tree as function XYZ of zero arguments).
5. Box trees (Display)
1. There are separate "abstract" tree and "box" tree data structures. Each node (subtree) in the box structure points to the corresponding node (subtree) in the abstract structure. This is in contrast to the "convert DisplayExpr to Expr as needed" point of view in the current CaminoReal. In other words, you always have a pointer to the corresponding "semantics" of any displayed notation.
2. The box tree is also a functional tree - each node has a bounding box for the subtree rooted at it, a (possibly empty; nonempty iff list of function name nodes and list of children nodes are empty) rope or other sequence of glyphs to be painted in this bounding box, a (possibly empty) list of nodes for the "function name", and a (possibly empty) list of children nodes of the same type.
2. The display rule for a given function may associate multiple boxes with a given component of an abstract tree for it, e.g. an operator name which is *F ... ***, or the multiple plus signs in an n-ary plus. The rule is that if you select any of these boxes you are selecting that subtree of the abstract tree, and all the corresponding boxes get highlighted.
2. It is not necessarily the case that every function argument in the abstract tree will appear in a display for it, for example, the function "matrix" could be matrix[nRows, nCols, elt11, elt12, ...], and only the elts get printed. Presumably this should have a display rule that if fewer than two args, or first two args not ints, then DisplayFunctional, else do a matrix, filling in zeros if insufficient elts, and omitting extra elts.
3. It is assumed that the bounding boxes of disjoint subtrees of a box tree are disjoint.
6. Formatting language
1. We can make a distinction between the matching function, i.e. how we determine what display rule to apply, and the expressiveness of the rules themselves. For current start, follow rule that we use the first rule we match until it no longer applies, then search for another, Display Functional is the catchall. Current matching can be done simply by: is it the right name, and does the rule have as many or more arguments than the subtree I'm trying to display.
2. Display rules must be able to support n-ary-ness, define "imbedded symbols" (e.g. integral sign), specify fonts, specify scaling constraints (e.g. some box is as big as something else), specify alignment constraints, specify how many arguments are expected (n-ary is one option), and invoke other rules by name, i.e. specify to stick in an instance of some other notation (with placeholders for all args).
3. It would be nice to have display rules that can look arbitrarily deep into the structure of the things they are trying to display. E.g. what is internally represented as (a * summation) / b gets printed as (a/b) * summation. This seems to require general pattern matching. Having such rules may lose efficiency if they are always "enabled": if you try such rules every time you are looking for a match, you will be continually reaching way down in the tree. In any event, such rules bring home the point that in our new system, the display can change arbitrarily much with every keystroke.
4. Display rules should always support fewer than their expected no. of args; placeholders are used for empties.
5. (Rick 6/30/87): There is indeed a tradeoff between declarative and procedural representations of formatting specifications and constraints, i.e. between having formatting directives expressed in a special-purpose language, which is interpreted on the fly, allowing the user to edit formatting rules without recompiling, versus writing Cedar procedures. The latter give you full generality. Until you know what the formatting language should be, perhaps better to
7. Implementation changes to current system.
1. Get rid of use of tags; key off of position, or use local variables in formatting rules
2. Replace current "database of procedures" with "datafile", i.e. so user can change while he's running and see the changes without recompiling, and so users don't have to be programmers. Seems to be tradeoff of ease of use of datafile vs. generality of procedures - seems hard to have a rule language that will be good for expressing incremental update rules. What is tradeoff wrt efficiency?
3. We use Xerox character set throughout, i.e. users enter, and we display, characters from the XCS. Thus our task is just to define keyboard mapping and fonts.