*start* 22301 00024 USt Date: 4 Sept. 1981 4:39 pm PDT (Friday) From: Horning.pa Subject: Current Level 0/1 Interdoc status/rev. 32 To: Mitchell, Horning, Lampson [Candidate for circulation to Interdoc↑, with history log removed.] Edited by Jim H. on 4 Sept. 1981 4:03 pm PDT (Friday). Compressed syntactic example. Re-order sections, in preparation for circulation to Interdoc↑. Label ::= Mark | Source | Target | Link R&T&P&M&U > R&B&M&S&T external names > universal names remove op as literal $ hidden # Properties(present/absent)/Attributes(have values and scopes) Format for description of standard properties Editor levels Internal definition/invocation Nesting of nodes Links Hints to implementors: environment restoration points coding tricks for common structures ------------------- Open questions: We should rethink our character assignments. check our characterset for disjointness with Interpress.DoubtfulChars. enlarge op with a few more single-character operators? %, &, \ Better way to present semantics? Not done: ------------------- STANDARD CARDS 1. WE ARE DESIGNING A STANDARD FOR INTERCHANGE, NOT EDITING. 2. EVERY SCRIPT IS PARSED BEFORE BEING INTERPRETED. 3. STANDARDIZE CONCEPTS, NOT NAMES. 4. COMBINING DOCUMENTS IS AN EDITOR, NOT AN INTERCHANGE, TASK. 5. GENSYM IS AN EDITOR FUNCTION. 6. CHARACTER-SET AND LEXICAL ISSUES CAN BE RESOLVED AT THE END. 7. GET THE TECHNICAL BASIS RIGHT BEFORE WORRYING ABOUT THE POLITICAL ISSUES. ------------------- We envision an Interdoc script being input and viewed in any manner equivalent to the following: Parse the entire script, repeatedly - reducing each expression to its "dominant structure," containing only literals and structural delimiters, by replacing identifiers by the values to which they are bound in the current environment, by applying operators, and by removing binding items, - determining the properties of each node from its marks, - recording the sources and targets for all links, and - transforming the environment as indicated by the binding items (recording the relevant attributes for each node in a form convenient to the particular editor). BASIC INTERDOC SYNTACTIC EXAMPLE {Book.example! -- Links to this from Book@ and Book.example@ ExampleParagraph -- Invokes a definition $UniqueMark12356# -- Adds a mark Font←[Font | size←10*pt face←bold] factorial←'(LT[Value 2] | 1 | Value*factorial(Value-1))' a:='NOT[EQ[margins.left factorial[5]]]' margins.right←100 r=12.5*pt (a | margins.left←+5 margins.right←5 | margins.left+←10) -- conditional: Algol68 <text for this node> } -- Or, using short identifiers, and omitting comments and unnecessary spaces: {b.e!ep um#f←[f|s←10*pt fc←bo]fa←'(l[Value 2]|1|Value*fa(Value-1))' a:='n[e[m.l fa[5]]]'m.r←100 r=12.5*pt(a|m.l←+5 m.r←5|m.l+←10)<text for this node>} GRAMMAR item ::= value | binding | label value ::= term | node | sequence term ::= primary | primary op term op ::= "+" | "" | "*" | "/" primary ::= literal | invocation | application | selection literal ::= Boolean | integer | hexint | real | string invocation ::= name | universal name ::= id ( "." id )* id ::= letter ( letter | digit )* universal ::= "$" name application ::= invocation "[" item* "]" selection ::= "(" term "|" item* "|" item* ")" node ::= "{" item* "}" sequence ::= "(" item* ")" binding ::= name bindingMode rhs bindingMode ::= "=" | ":" | ":=" | "←" rhs ::= value | op term | "'" item* "'" | "[" [ invocation ] "|" binding* "]" label ::= mark | link mark ::= invocation "#" link ::= name "@" | name "!" | id "@!" SEMANTICS: REDUCTION AND BINDING R: expression, environment > expression -- Reduction B: expression, environment > environment -- Bindings R&B<construct>(E) denotes the pair R<construct>(E); B<construct>(E) [Unless explicitly given below, B<construct>(E) = E.] R<primary op term>(E) = R<primary>(E) op R<term>(E) R<literal>(E) = literal R&B<id>(E) = R&B<valOf(id, E)>(E) R&B<name "." id>(E) = R&B<valOf(id, R<name>(E))>(E) R<"$" name>(E) = "$" name R<invocation "[" item* "]">(E) = apply(invocation, R<item*>(E), E) R&B<"(" term "|" item1* "|" item2* ")">(E) = if R<term>(E) then R&B<item1*>(E) else R&B<item2*>(E) R&B<"{" item* "}">(E) = "{" R<"Sub" item*>([Null | "Outer" "=" E]) "}"; locVal("Outer", (B<"Sub" item*>([Null | "Outer" "=" E]))) R&B<"(" item* ")">(E) = "(" R<item*>(E) ")" ; B<item*>(E) R<>(E) = Nil R&B<item1 item*>(E) = R<item1>(E) R<item*>(B<item1>(E)); B<item*>(B<item1>(E)) R&B<name mode rhs>(E) = Nil; bind(name, mode, R<rhs>(E), E) <name mode op term> = <name mode name op term> -- Syntactic sugar R<"'" item* "'">(E) = item* --Usable only in rhs of binding R<"[|" binding* "]">(E) = [B<binding*>([Null | "Outer" "=" E]) | "Outer" "=" Null] R<"[" invocation "|" binding* "]">(E) = [B<binding*>([R<invocation>(E) | "Outer" "=" E]) | "Outer" "=" Null] R<invocation "#">(E) = R<invocation>(E) "#" R<link>(E) = link -- Subsidiary definitions for R&T bindingOf(id, E) = locBinding(id, whereBound(id, E)) -- Gets innermost binding valOf(id, E) = locVal(id, whereBound(id, E)) -- Gets innermost value whereBound(id, E) = -- Finds innermost binding locBinding(id, E) ~= None => E locBinding("Outer", E) ~= None => whereBound(id, locVal("Outer", E)) True => Null apply(invocation, value*, E) = CASE R<invocation>(E) OF "$equal" => value1 = value2 "$greater" => value1 > value2 . . . "$subscript" => value1[value2] -- value1: sequence, value2: int "$marks" => "(" M<inner(value1)> ")" "$links" => "(" L<inner(value1)> ")" "$sources" => "(" S<inner(value1)> ")" "$targets" => "(" T<inner(value1)> ")" "$contents" => "(" C<inner(value1)> ")" ELSE => R<invocation>([[Null | "Outer" "=" E] | "Value" "=" value*]) inner("{" value* "}") = value* bind(id, mode, value, E) = bindingOf(id, E) = "=" => E -- Can't rebind constants mode = ":=" => assign(id, value, E) -- Assign at right level True => [E | id mode value] bind(id "." name, mode, value, E) = [E | id bindingOf(id, E) bind(name, mode, value, valOf(id, E))] assign(id, value, E) = locBinding(id, E) = ":" => [E | id ":" value] bindingOf(id, E) = ":" => bind("Outer." id, ":=", value, E) True => E -- Can only assign to vars NOTATION FOR ENVIRONMENTS Environments bind identifiers to expressions, in various modes ("=", ":", ":=", "←"): Null denotes the "empty" environment [E | id m e] means "E with id mode m bound to e" locBinding(id, E) denotes the binding mode of id in E locBinding(id, Null) = None locBinding(id, [E | id' m e]) = if id=id' then m else locBinding(id, E) locVal(id, E) denotes the value locally bound to id in E locVal(id, Null) = Nil = "" locVal(id, [E | id' m e]) = if id=id' then e else locVal(id, E) SEMANTICS OF MARKS, LINKS, CONTENTS M: expression > expression -- Sequence of marks L: expression > expression -- Sequence of links S: expression > expression -- Sequence of source links T: expression > expression -- Sequence of target links C: expression > expression -- Content of node [These functions all return the non-value, Nil (which does not lengthen a sequence), except as specified below.] M<name "#"> = name M<"(" item* ")"> = M<item*> M<item1 item*> = M<item1> M<item*> L<id "@!"> = id L<"(" item* ")"> = L<item*> L<item1 item*> = L<item1> L<item*> S<name "!"> = prefixes(name) S<"(" item* ")"> = S<item*> S<item1 item*> = S<item1> S<item*> T<name "@"> = name T<"(" item* ")"> = T<item*> T<item1 item*> = T<item1> T<item*> C<value> = value C<label> = Nil prefixes(id) = id prefixes(name "." id) = name "." id prefixes(name) VALUE SPACE Expressions in an Interdoc script may denote literal values: Booleans: (F, T) integers: ... -3, -2, -1, 0, 1, 2, 3, ... reals: 1.2E5, . . . strings: <this is a string> marks: Xerox.Laurel.Message# links: A123@!, Paragraph.Example!, anId@ universal names: $Authority.Sub.id nodes sequences of values unevaluated expressions environments DISCUSSION Each environment, E, initially contains only its "inherited" environment (bound to the id Outer). Most bindings take place directly in E. To allow for "persistent" bindings, the value of a bind(id, ":=", val, E) will change E by rebinding id in the "innermost" environment (following the chain of Outers) in which it is bound, if that binding has the binding ":" (Var). Identifiers bound with binding "=" (Const) may not be rebound in inner environments. If the rhs of a binding is surrounded by single quotes, it will be evaluated in the environments where the name is invoked, rather than the environment in which the binding is made. When an id is referred to and locBinding(id, E)=None, then the value is sought recursively in locVal("Outer"). The (implicit) "outermost" environment binds each id to the "universal" name $id. Nodes are delimited by brackets. The contents of each node are implicitly prefixed by Sub, which will generally be bound in the containing environment to a quoted expression performing some bindings, and perhaps supplying some labels (marks and links). Parentheses are used to delimit sequence values. Square brackets are used to delimit the argument list of an operator application and to denote environment constructors, which behave much like records. Expressions involving the four infix ops (+, , *, /) are evaluated right-to-left (a la APL); since we expect expressions to be short, we have not imposed precedence rules. The notation for selections (conditionals) is borrowed from Algol 68: ( <test> | <true part> | <false part> ) This is consistent with our principles of using balanced brackets for compound constructions and avoiding syntactically reserved words; the true part and false part may each contain an arbitrary number of items (including none). A label N! on a node makes that node a "target" of the link N (and its prefixes); a label N@ makes it a "source." The "main" identifier of a link must be declared (using id@!) at the root of a subtree containing all its sources and targets. The link represents a set of directed arcs, one from each of its sources to each of its targets. Multiple target labels make a node the target of multiple links. A target label that appears only on a single node places it in a singleton set, i.e., identifies it uniquely. OTHER NOTES Conservative rules for editor treatment of Interdoc subtrees created by other editors: -It's OK to display a node if you understand at least one of its marks. -It's OK to edit a node if you understand ALL of its (local) marks, and either don't remove any of them or also understand ALL marks of its parent. -It's OK to copy a node if that doesn't move any labels outside their scope, and you understand ALL marks of its new parent. -it's OK to delete a (subtree rooted at a) node if you understand ALL marks of its parent. The "view" of the dominant structure is ALWAYS controlled by the marks on its nodes. (E.g., text is not always there to be "shown.") $hidden# means that a node is intended not to be viewed at the point where it occurs in the dominant structure; such nodes will generally be targets of links, placed remotely from their sources (uses) to extract attributes from an environment (e.g., page or figure numbers for references). We will try to use the term "property" to refer to a characteristic associated with a node by means of a label (mark or link) and "attribute" to refer to a characteristic whose value is determined by the environment. Properties are either present (locally) or absent, whereas attributes commonly have values that apply throughout a scope. Level 2 of the standard will be primarily concerned with the definition of standard properties that are expected to be shared among editors. For each standard property, it will describe - the associated universal mark that denotes it, - the assumptions that this mark implies about the contents of a node (values that must/may be present and their intended intepretation), - the assumptions that this mark implies about the environment of a node (attributes that must be present and their intended intepretation). Attributes are "relevant" to a node if they are assumed by any of its marks. In general, a node's environment will also contain bindings for many "latent" attributes that are either relevant to its ancestors (and inherited by default) or are potentially relevant to its descendants. Attributes are inherited only via environments following the dominant structure. Thus the choice of a dominant structure to represent scripts from a particular editor will be strongly influenced by expectations about inheritance. Put a value in contents if: Put a value in environment if: effect is local to node has scope is directly edited is only indirectly edited is to be bound locally needs delayed or global binding The presentation of this material could be clarified by a table that relates constructions in the notation to their intended uses and meanings. EDITOR LEVELS Just as with Interpress, the Interdoc standard will apply to editors with varying capabilities, and it will be important to define some structure to the space of possibilities. Dimensions in which we foresee reasonable variation are: - Invocation/Application: Only built-in definitions - definitions in script. - Dominant structure: Nodes cannot nest - nesting allowed. - Other structure: No links - links allowed. HINTS TO IMPLEMENTORS Environment restoration: Some constructs in the language are not allowed to have a "net effect" on the environment, even though they may contain internal bindings that will be in effect for some or all of their substructures. Thus, a program to process Interdoc scripts must make provision for restoration of saved environments. The amount of bookkeeping required to do this can be minimized by the following observations: - A rhs always produces a value (which may be an environment), which is then bound to a name in the environment that obtained when the binding was encountered. Thus, occurrence of a bindingMode (=, :, :=, ←) signals a potential restoration point. - Any application (but not every invocation), any term containing infix ops, and any mark will leave the environment unchanged. Thus the appearance of an operator, the bracket surrounding an argument list, or # signals a potential restoration point; the environment to be saved is the one BEFORE the invocation to its left. (One-character lookahead before invocation is prudent.) - Environments are NOT restored at the end of sequences. - The environment at the end of a node is the final value of the Outer component of the node's environment; unless the node contains persistent (:=) bindings, this will be the same as the environment just prior to the node. Coding tricks for common structures: - Attributes that tend to persist across boundaries in the dominant structure (e.g., Bravo character looks, which persist from paragraph to paragraph unless explicitly changed by the user) should generally be represented by VAR (:) bindings at a high-level node in the script; changes would then be represented by persistent (:=) bindings at the points where they occur. - Persistent bindings can also be used to record changes to "running" information (e.g., section number, figure number). - Information (e.g., marks) repeated in all subnodes of a node should be bound to Sub, and hence associated with the subnodes implicitly. - Very compact encodings can often be obtained by using single-character id's for common sequences of items; each editor can have its own set, tuned to its own usages. When reading a script produced by a different editor, it will be necessary to know the bindings for these abbreviations. CONSCIOUSLY POSTPONED Lambda expressions. Sets (Cf. Mitchell's Font example.) SET/LIST operators ($append $union ?) Extend selection to CASE? HISTORY LOG Bring the syntax up front. Further develop parallelism between grammar and semantic equations. Write semantic equations in terms of concrete syntax. Quote general expressions. V, E, C > R, T, E . [...] > <...> for quotation of script expressions. (E | id←e, m) > [E | id←e, m] for local binding. Introduce primary to disambiguate expression* , factor lhs from binding. Introduce Sub component to initialize nodes. Debug semantics of braces and dot. Mode > binding. Debug semantics of <id> (fix up indirection). Add VAL. Edited by Mitchell, 30 July 1981 9:21 pm PDT (Thursday): Changed grammar to allow more complete expression syntax; couldn't use "<" or ">" as operators because they delimit strings. Moved history log to end of message. Edited by Mitchell, 31 July 1981 12:20 pm PDT (Friday) Simplified expression syntax. Expressions with embedded binary operators are simply interpreted in a right-to-left fashion; e.g., x←a*b+c means x←a*(b+c). Fixed up semantic equations to reflect this. Exchanged the use of {}s and ()s. Edited by Mitchell, 7 Aug. 1981 4:40 pm PDT (Friday) Fixed error in semantics when exchanging the use of {}s and ()s. Edited by Horning 13 Aug. 1981 4:47 pm PDT (Thursday). E(id) > locVal(id, E) --Remove conflict with f(E). Outer > "Outer" Const > "=" id lookup rule modified (R & T<id>) [E | id←e, m] > [E | id m e] "." as infix op expressions are evaluated left-to-right (except for binding operator) Reverse VAL/ENV default for parens. bindq > bind binding > bindingMode expand definition of apply inline default T<construct>(E) = E add comments to semantic equations ------------------- R<>(E) = Nothing -- The empty expression -- Expression sequence R<e1 e*>(E) = R<e1>(E) R<e*>(T<e1>(E)) -- List insert T<e1 e*>(E) = T<e*>(T<e1>(E)) -- Composition R<literal>(E) = literal R<id>(E) = if bindingOf(id, E)=None then id else R<valOf(id, E)>(E) T<id>(E) = if bindingOf(id, E)=None then E else T<valOf(id, E)>(E) R<"IF(" e1 "," e2* "," e3* ")">(E) = if R<e1>(E) then R<e2*>(T<e1>(E)) else R<e3*>(T<e1>(E)) T<"IF(" e1 "," e2* "," e3* ")">(E) = if R<e1>(E) then T<e2*>(T<e1>(E)) else T<e3*>(T<e1>(E)) R<"NOT" p>(E) = if R<p>(E) then False else True R<p1 op p2>(E) = op = "." => R<p2>([R<p1>(E) | "Outer" = E]) op = "+" => R<p1>(E)+R<p2>(E) . . . R<n m op e>(E) = Nothing -- Empty list T<n m e>(E) = bind(n, m, R<e>(E), E) T<n m "'" e>(E) = bind(n, m, e, E) T<n m op e>(E) = bind(n, m, R<n op e>(E), E) R<"{" labels e* "}">(E) = "{" labels R<Sub e*>([Null | "Outer" = E]) "}" T<"{" labels e* "}">(E) = locVal("Outer", (T<"ENV("Sub e*")">(E))) R<"(" e* ")">(E) = R<e*>(E) R<"ENV(" e* ")">(E) = [T<"ENV(" e* ")">(E) | "Outer" = Null] T<"ENV(" e* ")">(E) = T<e*>([Null | "Outer" = E]) ------------------- Edited by Jim Horning 17 Aug. 1981 10:49 am PDT (Monday) R&T<> Nothing > "" Edited by Jim H. on 17 Aug. 1981 4:58 pm PDT (Monday) Remove side-effects from all expressions. Parentheses purely for grouping (don't hide environment transformations). #label > label ! labels within nodes Edited by Jim H. on 19 Aug. 1981 9:52 am PDT (Wednesday). Rewrite <n m op e> as syntactic sugar. structured labels re-introduce apply function in R&T<p1 op p2> correct syntax for "." % for opening an environment (also replaces ENV?) Edited by Jim H. on 19 Aug. 1981 6:55 pm PDT (Wednesday). Drop "%"; ENV() is now the only environment-constructing operator. Add SUB operator (first operand: sequence only, second: number only). Add atoms, as distinct from ids. Fix lhs op rhs syntax. Edited by Jim H. on 20 Aug. 1981 5:29 pm PDT (Thursday). resolve pending questions as per message of 20 Aug. 1981 12:29 pm PDT. distinguish syntactically between properties (marks) and labels. only the "main" id of a label is declarable. eliminate as an id character. eliminate op ids from grammar. restructure the grammar for "functional" notation for operators. update semantic equations for new grammar, etc. fix treatment of unbound qualified names (now produce Nil). Edited by Jim H. on 21 Aug. 1981 6:58 pm PDT (Friday). restore $val. move quoting to rhs, allow quoted primaries without parentheses. allow an op to be the rhs of a definition. eliminate the functions operate, apply, eval by back substitution. change semantics of () to allow "record" construction without $env. Edited by Jim H. on 24 Aug. 1981 6:08 pm PDT (Monday). "It's OK to edit a node if you understand ALL of its (local) properties, and either don't remove any of them or also understand ALL properties of its parent." "Put in contents if: Put in environment if: ..." Add connection syntax to syntactically rule out a+←'b. Edited by Jim H. on 25 Aug. 1981 11:33 am PDT (Tuesday). Syntactically separate label references and name invocation. Put in distinct syntax in rhs for environment construction. Informal semantics of labels. ( ... ) > [ ... ] in applications; permitting ( ... ) as a primary. Edited by Jim H. on 25 Aug. 1981 4:08 pm PDT (Tuesday). Add sequence as a nonterminal to the syntax. State the formal semantics of labels and properties. Reorder presentation (hopefully to improve readability). Edited by Jim H. on 28 Aug. 1981 2:08 pm PDT (Friday). ' ... ' in rhs Restore infix operators, right to left. Modify syntax to rule out more nonsense, add semantically meaningful nonterminals. Introduce special syntax for selections. Eliminate side-effects for $subscript (actually, all applications). Add application of defined functions. Note that Value[ ... ] allows use of temporary (hidden) local definitions, Nil[ ... ] allows placement of hidden nodes. ( ... ) creates list/sequence values (without hiding bindings). Tidy up definition of assign, using bind("Outer." ...). Introduce value nonterminal into grammar (rule out more nonsense). rhs ::= ... | "[" [ lookup ] "|" binding* "]" . Change nonterminal lookup to invocation. Remove $ name from literal (to invocation). Add node operators: $properties -- All #'s $marks -- All !'s $references -- All @'s $contents -- The rest (fringe) Restrict $subscript just to sequences, not nodes. *start* 21130 00024 USt Date: 8 Sept. 1981 2:10 pm PDT (Tuesday) From: Horning.pa Subject: Interdoc Level 0/1 DRAFT and notes To: Mitchell, Horning, Lampson, Guttag [Sorry, meant to send this out on Friday afternoon.] [Based on "Current Level 0/1 Interdoc status/rev. 32" of 4 Sept. 1981.] [Circulation to Interdoc↑ planned for Wednesday morning, 9 Sept. 1981.] STANDARD CARDS 1. WE ARE DESIGNING A STANDARD FOR INTERCHANGE, NOT EDITING. 2. EVERY SCRIPT IS PARSED BEFORE BEING INTERPRETED. 3. STANDARDIZE CONCEPTS, NOT NAMES. 4. COMBINING DOCUMENTS IS AN EDITOR, NOT AN INTERCHANGE, TASK. 5. GENSYM IS AN EDITOR FUNCTION. 6. CHARACTER-SET AND LEXICAL ISSUES CAN BE RESOLVED AT THE END. 7. GET THE TECHNICAL BASIS RIGHT BEFORE WORRYING ABOUT THE POLITICAL ISSUES. ------------------- We envision an Interdoc script being input and viewed in any manner equivalent to the following: Parse the entire script, repeatedly - reducing each expression to its "dominant structure," containing only literals and structural delimiters, by replacing identifiers by the values to which they are bound in the current environment, by applying operators, by evaluating selections, and by removing binding items, - determining the properties of each node from its marks, - recording the sources and targets for all links, and - transforming the environment as indicated by the binding items (recording the relevant attributes for each node in a form convenient to the particular editor). BASIC INTERDOC SYNTACTIC EXAMPLE {Book.example! -- Links to this from Book@ and Book.example@ ExampleParagraph -- Invokes a definition $UniqueMark12356# -- Adds a mark Font←[Font | size←10*pt face←bold] factorial←'(LT[Value 2] | 1 | Value*factorial(Value-1))' a:='NOT[EQ[margins.left factorial[5]]]' margins.right←100 r=12.5*pt (a | margins.left←+5 margins.right←5 | margins.left+←10) -- conditional: Algol68 <text for this node> } -- Or, using short identifiers, and omitting comments and unnecessary spaces: {b.e!ep um#f←[f|s←10*pt fc←bo]fa←'(l[Value 2]|1|Value*fa(Value-1))' a:='n[e[m.l fa[5]]]'m.r←100 r=12.5*pt(a|m.l←+5 m.r←5|m.l+←10)<text for this node>} GRAMMAR item ::= value | binding | label value ::= term | node | sequence term ::= primary | primary op term op ::= "+" | "" | "*" | "/" primary ::= literal | invocation | application | selection literal ::= Boolean | integer | hexint | real | string invocation ::= name | universal name ::= id ( "." id )* id ::= letter ( letter | digit )* universal ::= "$" name application ::= invocation "[" item* "]" selection ::= "(" term "|" item* "|" item* ")" node ::= "{" item* "}" sequence ::= "(" item* ")" binding ::= name bindingMode rhs bindingMode ::= "=" | ":" | ":=" | "←" rhs ::= value | op term | "'" item* "'" | "[" [ invocation ] "|" binding* "]" label ::= mark | link mark ::= invocation "#" link ::= id "@!" | name "@" | name "!" SEMANTICS: REDUCTION AND BINDING R: expression, environment > expression -- Reduction B: expression, environment > environment -- Bindings R&B<construct>(E) denotes the pair R<construct>(E); B<construct>(E) [Unless explicitly given below, B<construct>(E) = E.] R<primary op term>(E) = R<primary>(E) op R<term>(E) R<literal>(E) = literal R&B<id>(E) = R&B<valOf(id, E)>(E) R&B<name "." id>(E) = R&B<valOf(id, R<name>(E))>(E) R<"$" name>(E) = "$" name R<invocation "[" item* "]">(E) = apply(invocation, R<item*>(E), E) R&B<"(" term "|" item1* "|" item2* ")">(E) = if R<term>(E) then R&B<item1*>(E) else R&B<item2*>(E) R&B<"{" item* "}">(E) = "{" R<"Sub" item*>([Null | "Outer" "=" E]) "}"; locVal("Outer", (B<"Sub" item*>([Null | "Outer" "=" E]))) R&B<"(" item* ")">(E) = "(" R<item*>(E) ")" ; B<item*>(E) R<>(E) = Nil R&B<item1 item*>(E) = R<item1>(E) R<item*>(B<item1>(E)); B<item*>(B<item1>(E)) R&B<name mode rhs>(E) = Nil; bind(name, mode, R<rhs>(E), E) <name mode op term> = <name mode name op term> -- Syntactic sugar R<"'" item* "'">(E) = item* --Usable only in rhs of binding R<"[|" binding* "]">(E) = [B<binding*>([Null | "Outer" "=" E]) | "Outer" "=" Null] R<"[" invocation "|" binding* "]">(E) = [B<binding*>([R<invocation>(E) | "Outer" "=" E]) | "Outer" "=" Null] R<invocation "#">(E) = R<invocation>(E) "#" R<link>(E) = link -- Subsidiary definitions for R&T bindingOf(id, E) = locBinding(id, whereBound(id, E)) -- Gets innermost binding valOf(id, E) = locVal(id, whereBound(id, E)) -- Gets innermost value whereBound(id, E) = -- Finds innermost binding locBinding(id, E) ~= None => E locBinding("Outer", E) ~= None => whereBound(id, locVal("Outer", E)) True => Null apply(invocation, value*, E) = CASE R<invocation>(E) OF "$equal" => value1 = value2 "$greater" => value1 > value2 . . . "$subscript" => value1[value2] -- value1: sequence, value2: int "$marks" => "(" M<inner(value1)> ")" "$links" => "(" L<inner(value1)> ")" "$sources" => "(" S<inner(value1)> ")" "$targets" => "(" T<inner(value1)> ")" "$contents" => "(" C<inner(value1)> ")" ELSE => R<invocation>([[Null | "Outer" "=" E] | "Value" "=" value*]) inner("{" value* "}") = value* bind(id, mode, value, E) = bindingOf(id, E) = "=" => E -- Can't rebind constants mode = ":=" => assign(id, value, E) -- Assign at right level True => [E | id mode value] bind(id "." name, mode, value, E) = [E | id bindingOf(id, E) bind(name, mode, value, valOf(id, E))] assign(id, value, E) = locBinding(id, E) = ":" => [E | id ":" value] bindingOf(id, E) = ":" => bind("Outer." id, ":=", value, E) True => E -- Can only assign to vars NOTATION FOR ENVIRONMENTS Environments bind identifiers to expressions, in various modes ("=", ":", ":=", "←"): Null denotes the "empty" environment [E | id m e] means "E with id mode m bound to e" locBinding(id, E) denotes the binding mode of id in E locBinding(id, Null) = None locBinding(id, [E | id' m e]) = if id=id' then m else locBinding(id, E) locVal(id, E) denotes the value locally bound to id in E locVal(id, Null) = Nil = "" locVal(id, [E | id' m e]) = if id=id' then e else locVal(id, E) SEMANTICS OF MARKS, LINKS, CONTENTS M: expression > expression -- Sequence of marks L: expression > expression -- Sequence of links S: expression > expression -- Sequence of source links T: expression > expression -- Sequence of target links C: expression > expression -- Content of node [These functions all return the empty string, Nil (which does not lengthen a sequence), except as specified below.] M<name "#"> = name M<"(" item* ")"> = M<item*> M<item1 item*> = M<item1> M<item*> L<id "@!"> = id L<"(" item* ")"> = L<item*> L<item1 item*> = L<item1> L<item*> S<name "!"> = prefixes(name) S<"(" item* ")"> = S<item*> S<item1 item*> = S<item1> S<item*> T<name "@"> = name T<"(" item* ")"> = T<item*> T<item1 item*> = T<item1> T<item*> C<value> = value C<label> = Nil prefixes(id) = id prefixes(name "." id) = name "." id prefixes(name) VALUE SPACE Expressions in an Interdoc script may denote literal values: Booleans: (F, T) integers: ... -3, -2, -1, 0, 1, 2, 3, ... reals: 1.2E5, . . . strings: <this is a string> marks: Xerox.Laurel.Message# links: A123@!, Paragraph.Example!, anId@ universal names: $Authority.Sub.id nodes sequences of values unevaluated expressions environments DISCUSSION Each environment, E, initially contains only its "inherited" environment (bound to the id Outer). Most bindings take place directly in E. To allow for "persistent" bindings, the value of a bind(id, ":=", val, E) will change E by rebinding id in the "innermost" environment (following the chain of Outers) in which it is bound, if that binding has the binding ":" (Var). Identifiers bound with binding "=" (Const) may not be rebound in inner environments. If the rhs of a binding is surrounded by single quotes, it will be evaluated in the environments where the name is invoked, rather than the environment in which the binding is made. When an id is referred to and locBinding(id, E)=None, then the value is sought recursively in locVal("Outer"). The (implicit) "outermost" environment binds each id to the "universal" name $id. Nodes are delimited by brackets. The contents of each node are implicitly prefixed by Sub, which will generally be bound in the containing environment to a quoted expression performing some bindings, and perhaps supplying some labels (marks and links). Parentheses are used to delimit sequence values. Square brackets are used to delimit the argument list of an operator application and to denote environment constructors, which behave much like records. Expressions involving the four infix ops (+, , *, /) are evaluated right-to-left (a la APL); since we expect expressions to be short, we have not imposed precedence rules. The notation for selections (conditionals) is borrowed from Algol 68: ( <test> | <true part> | <false part> ) This is consistent with our principles of using balanced brackets for compound constructions and avoiding syntactically reserved words; the true part and false part may each contain an arbitrary number of items (including none). A label N! on a node makes that node a "target" of the link N (and its prefixes); a label N@ makes it a "source." The "main" identifier of a link must be declared (using id@!) at the root of a subtree containing all its sources and targets. The link represents a set of directed arcs, one from each of its sources to each of its targets. Multiple target labels make a node the target of multiple links. A target label that appears only on a single node places it in a singleton set, i.e., identifies it uniquely. OTHER NOTES Conservative rules for editor treatment of Interdoc subtrees created by other editors: -It's OK to display a node if you understand at least one of its marks. -It's OK to edit a node if you understand ALL of its (local) marks, and either don't remove any of them or also understand ALL marks of its parent. -It's OK to copy a node if that doesn't move any labels outside their scope, and you understand ALL marks of its new parent. -it's OK to delete a (subtree rooted at a) node if you understand ALL marks of its parent. The "view" of the dominant structure is ALWAYS controlled by the marks on its nodes. (E.g., text is not always there to be "shown.") $hidden# means that a node is intended not to be viewed at the point where it occurs in the dominant structure; such nodes will generally be targets of links, placed remotely from their sources (uses) to extract attributes from an environment (e.g., page or figure numbers for references). We will try to use the term "property" to refer to a characteristic associated with a node by means of a label (mark or link) and "attribute" to refer to a characteristic whose value is determined by the environment. Properties are either present (locally) or absent, whereas attributes commonly have values that apply throughout a scope. Level 2 of the standard will be primarily concerned with the definition of standard properties that are expected to be shared among editors. For each standard property, it will describe - the associated universal mark that denotes it, - the assumptions that this mark implies about the contents of a node (values that must/may be present and their intended intepretation), - the assumptions that this mark implies about the environment of a node (attributes that must be present and their intended intepretation). Attributes are "relevant" to a node if they are assumed by any of its marks. In general, a node's environment will also contain bindings for many "latent" attributes that are either relevant to its ancestors (and inherited by default) or are potentially relevant to its descendants. Attributes are inherited only via environments following the dominant structure. Thus the choice of a dominant structure to represent scripts from a particular editor will be strongly influenced by expectations about inheritance. Put a value in contents if: Put a value in environment if: effect is local to node has scope is directly edited is only indirectly edited is to be bound locally needs delayed or global binding The presentation of this material could be clarified by a table that relates constructions in the notation to their intended uses and meanings. EDITOR LEVELS Just as with Interpress, the Interdoc standard will apply to editors with varying capabilities, and it will be important to define some structure to the space of possibilities. Dimensions in which we foresee reasonable variation are: - Invocation/Application: Only built-in definitions - definitions in script. - Dominant structure: Nodes cannot nest - nesting allowed. - Other structure: No links - links allowed. HINTS TO IMPLEMENTORS Environment restoration: Some constructs in the language are not allowed to have a "net effect" on the environment, even though they may contain internal bindings that will be in effect for some or all of their substructures. Thus, a program to process Interdoc scripts must make provision for restoration of saved environments. The amount of bookkeeping required to do this can be minimized by the following observations: - A rhs always produces a value (which may be an environment), which is then bound to a name in the environment that obtained when the binding was encountered. Thus, occurrence of a bindingMode (=, :, :=, ←) signals a potential restoration point. - Any application (but not every invocation), any term containing infix ops, and any mark will leave the environment unchanged. Thus the appearance of an operator, the bracket surrounding an argument list, or # signals a potential restoration point; the environment to be saved is the one BEFORE the invocation to its left. (One-character lookahead before invocation is prudent.) - Environments are NOT restored at the end of sequences. - The environment at the end of a node is the final value of the Outer component of the node's environment; unless the node contains persistent (:=) bindings, this will be the same as the environment just prior to the node. Coding tricks for common structures: - Attributes that tend to persist across boundaries in the dominant structure (e.g., Bravo character looks, which persist from paragraph to paragraph unless explicitly changed by the user) should generally be represented by VAR (:) bindings at a high-level node in the script; changes would then be represented by persistent (:=) bindings at the points where they occur. - Persistent bindings can also be used to record changes to "running" information (e.g., section number, figure number). - Information (e.g., marks) repeated in all subnodes of a node should be bound to Sub, and hence associated with the subnodes implicitly. - Very compact encodings can often be obtained by using single-character id's for common sequences of items; each editor can have its own set, tuned to its own usages. When reading a script produced by a different editor, it will be necessary to know the bindings for these abbreviations. CONSCIOUSLY POSTPONED Lambda expressions. Sets (Cf. Mitchell's Font example.) SET/LIST operators ($append $union ?) Extend selection to CASE? ------------------- Open questions: We should rethink our character assignments. check our characterset for disjointness with Interpress.DoubtfulChars. enlarge op with a few more single-character operators? %, &, \ Which way to present semantics? Semantics of C<sequence>? Not done: ------------------- ACKNOWLEDGEMENTS Interpress, Interdoc working group, Lampson, Guttag, Donahue, others? APPENDICES ALTERNATE PRESENTATIONS OF THE SEMANTIC EQUATIONS 1. GRAMMATICAL FEATURE X SEMANTIC FUNCTION MATRIX FEATURES: FUNCTIONS: R C B M L S T term ::= primary op term + = primary ::= literal == invocation ::= id + + invocation ::= name "." id + + universal ::= "$" name == application ::= invocation "[" item* "]" + selection ::= "(" term "|" item1* "|" item2* ")" + + node ::= "{" item* "}" + = + sequence ::= "(" item* ")" + = + + + + + item* ::= item1 item* + + + + + + + binding ::= name mode rhs + rhs ::= "'" item* "'" + rhs ::= "[|" binding* "]" + rhs ::= "[" invocation "|" binding* "]" + mark ::= invocation "#" + + link ::= id "@!" = + link ::= name "@" = + link ::= name "!" = + Semantic function produces Nil or E or does not apply. + Non-trivial semantic equation. =For R: passes value unchanged; for C: value same as R. 2. PRESENTATION BY FEATURE term ::= primary op term R = C = R<primary>(E) op R<term>(E) B = E M = L = S = T = Nil primary ::= literal R = C = literal B = E M = L = S = T = Nil invocation ::= id R = R<valOf(id, E)>(E) B = B<valOf(id, E)>(E) invocation ::= name "." id R = R<valOf(id, R<name>(E))>(E) B = B<valOf(id, R<name>(E))>(E) universal ::= "$" name R = C = "$" name B = E M = L = S = T = Nil application ::= invocation "[" item* "]" R = apply(invocation, R<item*>(E), E) B = E selection ::= "(" term "|" item1* "|" item2* ")" R = if R<term>(E) then R<item1*>(E) else R<item2*>(E) B = if R<term>(E) then B<item1*>(E) else B<item2*>(E) node ::= "{" item* "}" R = C = "{" R<"Sub" item*>([Null | "Outer" "=" E]) "}" B = locVal("Outer", (B<"Sub" item*>([Null | "Outer" "=" E]))) M = L = S = T = Nil sequence ::= "(" item* ")" R = C = "(" R<item*>(E) ")" ?? C = "(" C<item*> ")" B = B<item*>(E) M = M<item*> L = L<item*> S = S<item*> T = T<item*> item* ::= "" R = C = M = L = S = T = Nil B = E item* ::= item1 item* R = R<item1>(E) R<item*>(B<item1>(E)) B = B<item*>(B<item1>(E)) C = C<item1> C<item*> M = M<item1> M<item*> L = L<item1> L<item*> S = S<item1> S<item*> T = T<item1> T<item*> binding ::= name mode rhs R = Nil B = bind(name, mode, R<rhs>(E), E) binding ::= name mode op term = <name mode name op term> -- Syntactic sugar rhs ::= "'" item* "'" R = item* rhs ::= "[|" binding* "]" R = [B<binding*>([Null | "Outer" "=" E]) | "Outer" "=" Null] rhs ::= "[" invocation "|" binding* "]" R =[B<binding*>([R<invocation>(E) | "Outer" "=" E]) | "Outer" "=" Null] mark ::= invocation "#" R = R<invocation>(E) "#" B = E C = L = S = T = Nil M = invocation link ::= id "@!" R = id "@!" B = E C = M = S = T = Nil L = id link ::= name "@" R = name "@" B = E C = M = L = S = Nil T = name link ::= name "!" R = name "!" B = E C = M = L = T = Nil S = prefixes(name) 3. PRESENTATION BY FUNCTION R: expression, environment > expression -- Reduction R<construct>(E) = CASE construct OF <primary op term> => R<primary>(E) op R<term>(E) <literal> => literal <id> => R<valOf(id, E)>(E) <name "." id> => R<valOf(id, R<name>(E))>(E) <"$" name> => "$" name <invocation "[" item* "]"> => apply(invocation, R<item*>(E), E) <"(" term "|" item1* "|" item2* ")"> => if R<term>(E) then R<item1*>(E) else R<item2*>(E) <"{" item* "}"> => "{" R<"Sub" item*>([Null | "Outer" "=" E]) "}" <"(" item* ")"> => "(" R<item*>(E) ")" <> => Nil <item1 item*> => R<item1>(E) R<item*>(B<item1>(E)) <name mode rhs> => Nil <"'" item* "'"> => item* <"[|" binding* "]"> => [B<binding*>([Null | "Outer" "=" E]) | "Outer" "=" Null] <"[" invocation "|" binding* "]"> => [B<binding*>([R<invocation>(E) | "Outer" "=" E]) | "Outer" "=" Null] <invocation "#"> => R<invocation>(E) "#" <link> => link B: expression, environment > environment -- Bindings B<construct>(E) = CASE construct OF <id> => B<valOf(id, E)>(E) <name "." id> => B<valOf(id, R<name>(E))>(E) <"(" term "|" item1* "|" item2* ")">(E) => if R<term>(E) then B<item1*>(E) else B<item2*>(E) <"{" item* "}"> => locVal("Outer", (B<"Sub" item*>([Null | "Outer" "=" E]))) <"(" item* ")"> => B<item*>(E) <item1 item*> => B<item*>(B<item1>(E)) <name mode rhs> => bind(name, mode, R<rhs>(E), E) ELSE => E M: expression > expression -- Sequence of marks M<construct> = CASE construct OF <name "#"> => name <"(" item* ")"> => M<item*> <item1 item*> => M<item1> M<item*> ELSE => Nil L: expression > expression -- Sequence of links L<construct> = CASE construct OF <id "@!"> => id <"(" item* ")"> => L<item*> <item1 item*> => L<item1> L<item*> ELSE => Nil S: expression > expression -- Sequence of source links S<construct> = CASE construct OF <name "!"> => prefixes(name) <"(" item* ")"> => S<item*> <item1 item*> => S<item1> S<item*> ELSE => Nil T: expression > expression -- Sequence of target links T<construct> = CASE construct OF <name "@"> => name <"(" item* ")"> => T<item*> <item1 item*> => T<item1> T<item*> ELSE => Nil C: expression > expression -- Content of node C<construct> = CASE construct OF <value> => value <label> => Nil *start* 24976 00024 USt Date: 8 Sept. 1981 4:17 pm PDT (Tuesday) From: Horning.pa Subject: Interdoc Level 0/1 DRAFT/1 and notes To: Mitchell, Horning [Based on "Interdoc Level 0/1 DRAFT and notes" of 8 Sept. 1981 2:10 pm PDT (Tuesday).] [Circulation to Interdoc↑ planned for Wednesday morning, 9 Sept. 1981.] STANDARD CARDS 1. WE ARE DESIGNING A STANDARD FOR INTERCHANGE, NOT EDITING. 2. EVERY SCRIPT IS PARSED BEFORE BEING INTERPRETED. 3. STANDARDIZE CONCEPTS, NOT NAMES. 4. COMBINING DOCUMENTS IS AN EDITOR, NOT AN INTERCHANGE, TASK. 5. GENSYM IS AN EDITOR FUNCTION. 6. CHARACTER-SET AND LEXICAL ISSUES CAN BE RESOLVED AT THE END. 7. GET THE TECHNICAL BASIS RIGHT BEFORE WORRYING ABOUT THE POLITICAL ISSUES. ------------------- We envision an Interdoc script being input and viewed in any manner equivalent to the following: Parse the entire script, repeatedly - reducing each expression to its "dominant structure," containing only literals and structural delimiters, by replacing identifiers by the values to which they are bound in the current environment, by applying operators, by evaluating selections, and by removing binding items, - determining the properties of each node from its marks, - recording the sources and targets for all links, and - transforming the environment as indicated by the binding items (recording the relevant attributes for each node in a form convenient to the particular editor). BASIC INTERDOC LEXICAL STRUCTURE AND EXAMPLES -- allow blank, CR, comma, semicolon as equivalent separators??? -- largely pirate grammar of literals from Interpress id ::= letter ( letter | digit )* Expressions in an Interdoc script may denote literal values: Booleans: (F, T) integers: ... -3, -2, -1, 0, 1, 2, 3, ... reals: 1.2E5, . . . strings: <this is a string> marks: Xerox.Laurel.Message# links: A123@!, Paragraph.Example!, anId@ universal names: $Authority.Sub.id nodes sequences of values unevaluated expressions environments SYNTACTIC EXAMPLE {Book.example! -- Links to this from Book@ and Book.example@ ExampleParagraph -- Invokes a definition $UniqueMark12356# -- Adds a mark Font←[Font | size←10*pt; face←bold] factorial←'(LT[Value, 2] | 1 | Value*factorial(Value-1))' a:='NOT[EQ[margins.left, factorial[5]]]' margins.right←100; r=12.5*pt (a | margins.left←+5; margins.right←5 | margins.left+←10) -- conditional: Algol68 <text for this node> } -- Or, using short identifiers, and omitting comments and unnecessary spaces: {b.e!ep;um#f←[f|s←10*pt;fc←bo]fa←'(l[Value,2]|1|Value*fa(Value-1))' a:='n[e[m.l,fa[5]]]'m.r←100;r=12.5*pt(a|m.l←+5;m.r←5|m.l+←10)<text for this node>} GRAMMAR script ::= versionId item item ::= value | binding | label value ::= term | node | sequence term ::= primary | primary op term op ::= "+" | "" | "*" | "/" primary ::= literal | invocation | application | selection literal ::= Boolean | integer | hexint | real | string invocation ::= name | universal name ::= id ( "." id )* universal ::= "$" name application ::= invocation "[" item* "]" selection ::= "(" term "|" item* "|" item* ")" node ::= "{" item* "}" sequence ::= "(" ( value | binding )* ")" binding ::= name mode rhs mode ::= "=" | ":" | ":=" | "←" rhs ::= value | op term | "'" item* "'" | "[" [ invocation ] "|" binding* "]" label ::= mark | link mark ::= invocation "#" link ::= id "@!" | name "@" | name "!" SEMANTIC FUNCTIONS R: expression, environment > expression -- Reduction C: expression > expression -- Contents B: expression, environment > environment -- Bindings M: expression > expression -- Marks L: expression > expression -- Links S: expression > expression -- Link sources T: expression > expression -- Link targets GRAMMATICAL FEATURE X SEMANTIC FUNCTION MATRIX FEATURES: FUNCTIONS: R C B M L S T term ::= primary op term + = primary ::= literal == invocation ::= id + + invocation ::= name "." id + + universal ::= "$" name == application ::= invocation "[" item* "]" + selection ::= "(" term "|" item1* "|" item2* ")" + + node ::= "{" item* "}" + = + sequence ::= "(" ( value | binding )* ")" + = + item* ::= item1 item* + + + + + + + binding ::= name mode rhs + rhs ::= "'" item* "'" + rhs ::= "[|" binding* "]" + rhs ::= "[" invocation "|" binding* "]" + mark ::= invocation "#" + + link ::= id "@!" = + link ::= name "@" = + link ::= name "!" = + Semantic function produces Nil or E or does not apply. + Non-trivial semantic equation. =For R: passes value unchanged; for C: value same as R. NOTATION FOR ENVIRONMENTS Environments bind identifiers to expressions, in various modes ("=", ":", ":=", "←"): Null denotes the "empty" environment [E | id m e] means "E with id mode m bound to e" locBinding(id, E) denotes the binding mode of id in E locBinding(id, Null) = None locBinding(id, [E | id' m e]) = if id=id' then m else locBinding(id, E) locVal(id, E) denotes the value locally bound to id in E locVal(id, Null) = Nil = "" locVal(id, [E | id' m e]) = if id=id' then e else locVal(id, E) PRESENTATION BY FEATURE [E is used to represent the value of the environment in which the feature occurs.] term ::= primary op term op ::= "+" | "" | "*" | "/" R = C = R<primary>(E) op R<term>(E) B = E M = L = S = T = Nil -- Both the primary and the term must reduce to numbers; the arithmetic operators are evaluated right-to-left (a la APL, without precedence) and bind less tightly than application. primary ::= literal literal ::= Boolean | integer | hexint | real | string R = C = literal B = E M = L = S = T = Nil -- The basic contents of a document. invocation ::= id R = R<valOf(id, E)>(E) B = B<valOf(id, E)>(E) where valOf(id, E) = locVal(id, whereBound(id, E)) -- Gets innermost value whereBound(id, E) = CASE -- Gets innermost binding locBinding(id, E) ~= None => E locBinding("Outer", E) ~= None => whereBound(id, locVal("Outer", E)) True => Null -- Both attributes and definitions are looked up in the current environment; depending on the current binding of id, this may produce values and/or bindings; if the binding's rhs was quoted, the expression is evaluated at the point of invocation. invocation ::= name "." id R = R<valOf(id, R<name>(E))>(E) B = B<valOf(id, R<name>(E))>(E) -- Qualified names are treated as "nested" environments. universal ::= "$" name R = C = "$" name B = E M = L = S = T = Nil -- Names prefixed with a $ are presumed to be directly meaningful, and are not looked up in the environment. application ::= invocation "[" item* "]" R = apply(invocation, R<item*>(E), E) B = E where apply(invocation, value*, E) = CASE R<invocation>(E) OF "$equal" => value1 = value2 "$greater" => value1 > value2 . . . "$subscript" => value1[value2] -- value1: sequence, value2: int "$contents" => "(" C<inner(value1)> ")" "$marks" => "(" M<inner(value1)> ")" "$links" => "(" L<inner(value1)> ")" "$sources" => "(" S<inner(value1)> ")" "$targets" => "(" T<inner(value1)> ")" ELSE => R<invocation>([[Null | "Outer" "=" E] | "Value" "=" value*]) inner("{" value* "}") = value* -- If the invocation does not evaluate to one of the standard external function names, the current environment is augmented with a binding of the value of the argument list to the identifier Value, and the value is the result of the invocation in that environment; this allows function definition within the language. selection ::= "(" term "|" item1* "|" item2* ")" R = if R<term>(E) then R<item1*>(E) else R<item2*>(E) B = if R<term>(E) then B<item1*>(E) else B<item2*>(E) -- This is a standard conditional expression/binding, using syntax borrowed from Algol 68. node ::= "{" item* "}" R = C = "{" R<"Sub" item*>([Null | "Outer" "=" E]) "}" B = locVal("Outer", (B<"Sub" item*>([Null | "Outer" "=" E]))) M = L = S = T = Nil -- Nodes have nested environments, and affect the containing environment only through persistent (:=) bindings to ids with outer VAR (:) bindings. The items of a node are implicitly prefixed with the id Sub, which may be bound to any information intended to be common to all subnodes in a scope. sequence ::= "(" item* ")" R = C = "(" R<item*>(E) ")" B = B<item*>(E) M = L = S = T = Nil -- Parentheses group a sequence of items as a single value; bindings in the sequence affect the environment of items to the right in the containing node, but labels are disallowed. item* ::= "" R = C = M = L = S = T = Nil B = E -- The empty sequence of items has no value and no effect; this is the basis for the following recursive definition. item* ::= item1 item* R = R<item1>(E) R<item*>(B<item1>(E)) B = B<item*>(B<item1>(E)) For F in {C, M, L, S, T}: F = F<item1> F<item*> -- In general, the value of a sequence of items is just the sequence of item values; binding items affect the environment of items to their right; Nil does not change the length of a result sequence. binding ::= name mode rhs R = Nil B = bind(name, mode, R<rhs>(E), E) where bind(id, mode, value, E) = CASE bindingOf(id, E) = "=" => E -- Can't rebind constants mode = ":=" => assign(id, value, E) True => [E | id mode value] bind(id "." name, mode, value, E) = [E | id bindingOf(id, E) bind(name, mode, value, valOf(id, E))] bindingOf(id, E) = locBinding(id, whereBound(id, E)) assign(id, value, E) = CASE locBinding(id, E) = ":" => [E | id ":" value] bindingOf(id, E) = ":" => bind("Outer." id, ":=", value, E) True => E -- Can only assign to vars -- This adds a single binding to E; bindings have no other "side effects" and no value. binding ::= name mode op term = <name mode name op term> -- This is just a convenient piece of syntactic sugar for the common case of updating a binding. rhs ::= "'" item* "'" R = item* -- A quoted rhs is evaluated at the point of invocation, rather than the point of binding. rhs ::= "[|" binding* "]" R = [B<binding*>([Null | "Outer" "=" E]) | "Outer" "=" Null] -- This creates a new environment value that may be used much like a record. rhs ::= "[" invocation "|" binding* "]" R =[B<binding*>([R<invocation>(E) | "Outer" "=" E]) | "Outer" "=" Null] -- This creates a new environment value that is an extension of an existing one. mark ::= invocation "#" R = R<invocation>(E) "#" M = invocation B = E C = L = S = T = Nil -- This gives the containing node the property denoted by the mark to which the invocation reduces. link ::= id "@!" R = id "@!" L = id B = E C = M = S = T = Nil -- This defines the scope of the set of links whose "main" component is id. link ::= name "@" R = name "@" S = name B = E C = M = L = T = Nil -- This identifies the containing node as a "source" of the link name. link ::= name "!" R = name "!" T = prefixes(name) B = E C = M = L = S = Nil where prefixes(id) = id prefixes(name "." id) = name "." id prefixes(name) -- This identifies the containing node as a "target" of each of the links that is a prefix of name. DISCUSSION Each environment, E, initially contains only its "inherited" environment (bound to the id Outer). Most bindings take place directly in E. To allow for "persistent" bindings, the value of a bind(id, ":=", val, E) will change E by rebinding id in the "innermost" environment (following the chain of Outers) in which it is bound, if that binding has the binding ":" (Var). Identifiers bound with binding "=" (Const) may not be rebound in inner environments. If the rhs of a binding is surrounded by single quotes, it will be evaluated in the environments where the name is invoked, rather than the environment in which the binding is made. When an id is referred to and locBinding(id, E)=None, then the value is sought recursively in locVal("Outer"). The (implicit) "outermost" environment binds each id to the "universal" name $id. Nodes are delimited by brackets. The contents of each node are implicitly prefixed by Sub, which will generally be bound in the containing environment to a quoted expression performing some bindings, and perhaps supplying some labels (marks and links). Parentheses are used to delimit sequence values. Square brackets are used to delimit the argument list of an operator application and to denote environment constructors, which behave much like records. Expressions involving the four infix ops (+, , *, /) are evaluated right-to-left (a la APL); since we expect expressions to be short, we have not imposed precedence rules. The notation for selections (conditionals) is borrowed from Algol 68: ( <test> | <true part> | <false part> ) This is consistent with our principles of using balanced brackets for compound constructions and avoiding syntactically reserved words; the true part and false part may each contain an arbitrary number of items (including none). A label N! on a node makes that node a "target" of the link N (and its prefixes); a label N@ makes it a "source." The "main" identifier of a link must be declared (using id@!) at the root of a subtree containing all its sources and targets. The link represents a set of directed arcs, one from each of its sources to each of its targets. Multiple target labels make a node the target of multiple links. A target label that appears only on a single node places it in a singleton set, i.e., identifies it uniquely. OTHER NOTES Conservative rules for editor treatment of Interdoc subtrees created by other editors: -It's OK to display a node if you understand at least one of its marks. -It's OK to edit a node if you understand ALL of its (local) marks, and either don't remove any of them or also understand ALL marks of its parent. -It's OK to copy a node if that doesn't move any labels outside their scope, and you understand ALL marks of its new parent. -it's OK to delete a (subtree rooted at a) node if you understand ALL marks of its parent. More work needed on definition vs. attribute, direct vs. indirect contents. The "view" of the dominant structure is ALWAYS controlled by the marks on its nodes. (E.g., numbers are often part of the content for purposes other than to be "shown.") $hidden# means that a node is intended not to be viewed at the point where it occurs in the dominant structure; such nodes will generally be targets of links, placed remotely from their sources (uses) to extract attributes from an environment (e.g., page or figure numbers for references). We will use the term "property" to refer to a characteristic associated with a node by means of a label (mark or link) and "attribute" to refer to a characteristic whose value is determined by the environment. Properties are either present (locally) or absent, whereas attributes commonly have values that apply throughout a scope. Level 2 of the standard will be primarily concerned with the definition of standard properties that are expected to be shared among editors. For each standard property, it will describe - the associated universal mark that denotes it, - the assumptions that this mark implies about the contents of a node (values that must/may be present and their intended intepretation), - the assumptions that this mark implies about the environment of a node (attributes that must be present and their intended intepretation). Attributes are "relevant" to a node if they are assumed by any of its marks. In general, a node's environment will also contain bindings for many "latent" attributes that are either relevant to its ancestors (and inherited by default) or are potentially relevant to its descendants. A "definition" is an attribute that is relevant precisely to those nodes that invoke it. We need some further thought (and syntax and semantics) for how to associate assumptions with marks. Attributes are inherited only via environments following the dominant structure. Thus the choice of a dominant structure to represent scripts from a particular editor will be strongly influenced by expectations about inheritance. Put a value in contents if: Put a value in environment if: effect is local to node has scope is directly edited is only indirectly edited is to be bound locally needs delayed or global binding The presentation of this material could be clarified by a table that relates constructions in the notation to their intended uses and meanings. EDITOR LEVELS Just as with Interpress, the Interdoc standard will apply to editors with varying capabilities, and it will be important to define some structure to the space of possibilities. Dimensions in which we foresee reasonable variation are: - Invocation/Application: Only built-in definitions - definitions in script. - Dominant structure: Nodes cannot nest - nesting allowed. - Other structure: No links - links allowed. - Bindings: Local only - CONST(=), VAR(:), and persistent (:=). - Selection: No conditionals - conditionals. - Numbers: Integers only - floating point. HINTS TO IMPLEMENTORS Scopes: The scope of a binding always begins (textually) at the point where the binding appears, and extends strictly to its right. This should simplify the processing of environments in a single left-to-right scan. Constant bindings are unchangeable within a scope. Environment restoration: Some constructs in the language are not allowed to have a "net effect" on the environment, even though they may contain internal bindings that will be in effect for some or all of their substructures. Thus, a program to process Interdoc scripts must make provision for restoration of saved environments. The amount of bookkeeping required to do this can be minimized by the following observations: - A rhs always produces a value (which may be an environment), which is then bound to a name in the environment that obtained when the binding was encountered. Thus, occurrence of a bindingMode (=, :, :=, ←) signals a potential restoration point. - Any application (but not every invocation), any term containing infix ops, and any mark will leave the environment unchanged. Thus the appearance of an operator, the bracket surrounding an argument list, or # signals a potential restoration point; the environment to be saved is the one BEFORE the invocation to its left. (One-character lookahead before invocation is prudent.) - Environments are NOT restored at the end of sequences. - The environment at the end of a node is the final value of the Outer component of the node's environment; unless the node contains persistent (:=) bindings, this will be the same as the environment just prior to the node. Coding tricks for common structures: - Attributes that tend to persist across boundaries in the dominant structure (e.g., Bravo character looks, which persist from paragraph to paragraph unless explicitly changed by the user) should generally be represented by VAR (:) bindings at a high-level node in the script; changes would then be represented by persistent (:=) bindings at the points where they occur. - Persistent bindings can also be used to record changes to "running" information (e.g., section number, figure number). - Information (e.g., marks) repeated in all subnodes of a node should be bound to Sub, and hence associated with the subnodes implicitly. - Very compact encodings can often be obtained by using single-character id's for common sequences of items; each editor can have its own set, tuned to its own usages. When reading a script produced by a different editor, it will be necessary to know the bindings for these abbreviations. - Common editing operations may be represented by definitions with short names (e.g., Look Keep 40 in Bravo might become "L.K[40]"). CONSCIOUSLY POSTPONED Lambda expressions. Sets (Cf. Mitchell's Font example.) SET/LIST operators ($append $union ?) Extend selection to CASE? ------------------- Open questions: We should rethink our character assignments. check our characterset for disjointness with Interpress.DoubtfulChars. enlarge op with a few more single-character operators? %, &, \ Not done: ------------------- ACKNOWLEDGEMENTS Interpress, Interdoc working group, Lampson, Guttag, Donahue, others? APPENDICES ALTERNATE PRESENTATIONS OF THE SEMANTIC EQUATIONS PRESENTATION BY FUNCTION R: expression, environment > expression -- Reduction R<construct>(E) = CASE <construct> OF <primary op term> => R<primary>(E) op R<term>(E) <literal> => literal <id> => R<valOf(id, E)>(E) <name "." id> => R<valOf(id, R<name>(E))>(E) <"$" name> => "$" name <invocation "[" item* "]"> => apply(invocation, R<item*>(E), E) <"(" term "|" item1* "|" item2* ")"> => if R<term>(E) then R<item1*>(E) else R<item2*>(E) <"{" item* "}"> => "{" R<"Sub" item*>([Null | "Outer" "=" E]) "}" <"(" item* ")"> => "(" R<item*>(E) ")" <> => Nil <item1 item*> => R<item1>(E) R<item*>(B<item1>(E)) <name mode rhs> => Nil <"'" item* "'"> => item* <"[|" binding* "]"> => [B<binding*>([Null | "Outer" "=" E]) | "Outer" "=" Null] <"[" invocation "|" binding* "]"> => [B<binding*>([R<invocation>(E) | "Outer" "=" E]) | "Outer" "=" Null] <invocation "#"> => R<invocation>(E) "#" <link> => link C: expression > expression -- Content of node C<construct> = CASE <construct> OF <value> => value <label> => Nil <item1 item*> => C<item1> C<item*> B: expression, environment > environment -- Bindings B<construct>(E) = CASE <construct> OF <id> => B<valOf(id, E)>(E) <name "." id> => B<valOf(id, R<name>(E))>(E) <"(" term "|" item1* "|" item2* ")">(E) => if R<term>(E) then B<item1*>(E) else B<item2*>(E) <"{" item* "}"> => locVal("Outer", (B<"Sub" item*>([Null | "Outer" "=" E]))) <"(" item* ")"> => B<item*>(E) <item1 item*> => B<item*>(B<item1>(E)) <name mode rhs> => bind(name, mode, R<rhs>(E), E) ELSE => E M: expression > expression -- Sequence of marks M<construct> = CASE <construct> OF <name "#"> => name <item1 item*> => M<item1> M<item*> ELSE => Nil L: expression > expression -- Sequence of links L<construct> = CASE <construct> OF <id "@!"> => id <item1 item*> => L<item1> L<item*> ELSE => Nil S: expression > expression -- Sequence of source links S<construct> = CASE <construct> OF <name "!"> => prefixes(name) <item1 item*> => S<item1> S<item*> ELSE => Nil T: expression > expression -- Sequence of target links T<construct> = CASE <construct> OF <name "@"> => name <item1 item*> => T<item1> T<item*> ELSE => Nil SEMANTICS: REDUCTION AND BINDING R: expression, environment > expression -- Reduction B: expression, environment > environment -- Bindings R&B<construct>(E) denotes the pair R<construct>(E); B<construct>(E) [Unless explicitly given below, B<construct>(E) = E.] R<primary op term>(E) = R<primary>(E) op R<term>(E) R<literal>(E) = literal R&B<id>(E) = R&B<valOf(id, E)>(E) R&B<name "." id>(E) = R&B<valOf(id, R<name>(E))>(E) R<"$" name>(E) = "$" name R<invocation "[" item* "]">(E) = apply(invocation, R<item*>(E), E) R&B<"(" term "|" item1* "|" item2* ")">(E) = if R<term>(E) then R&B<item1*>(E) else R&B<item2*>(E) R&B<"{" item* "}">(E) = "{" R<"Sub" item*>([Null | "Outer" "=" E]) "}"; locVal("Outer", (B<"Sub" item*>([Null | "Outer" "=" E]))) R&B<"(" item* ")">(E) = "(" R<item*>(E) ")" ; B<item*>(E) R<>(E) = Nil R&B<item1 item*>(E) = R<item1>(E) R<item*>(B<item1>(E)); B<item*>(B<item1>(E)) R&B<name mode rhs>(E) = Nil; bind(name, mode, R<rhs>(E), E) <name mode op term> = <name mode name op term> -- Syntactic sugar R<"'" item* "'">(E) = item* --Usable only in rhs of binding R<"[|" binding* "]">(E) = [B<binding*>([Null | "Outer" "=" E]) | "Outer" "=" Null] R<"[" invocation "|" binding* "]">(E) = [B<binding*>([R<invocation>(E) | "Outer" "=" E]) | "Outer" "=" Null] R<invocation "#">(E) = R<invocation>(E) "#" R<link>(E) = link SEMANTICS OF CONTENTS AND LABELS C: expression > expression -- Contents of node M: expression > expression -- Sequence of marks L: expression > expression -- Sequence of links S: expression > expression -- Sequence of source links T: expression > expression -- Sequence of target links [These functions all return the empty string, Nil (which does not lengthen a sequence), except as specified below.] C<value> = value C<label> = Nil C<item1 item*> = C<item1> C<item*> M<name "#"> = name M<item1 item*> = M<item1> M<item*> L<id "@!"> = id L<item1 item*> = L<item1> L<item*> S<name "!"> = prefixes(name) S<item1 item*> = S<item1> S<item*> T<name "@"> = name T<item1 item*> = T<item1> T<item*> prefixes(id) = id prefixes(name "." id) = name "." id prefixes(name) *start* 25613 00024 USt Date: 9 Sept. 1981 3:04 pm PDT (Wednesday) From: Horning.pa Subject: Interdoc Level 0/1 DRAFT/2 and notes To: Interdoc [Sorry for the long delay between updates. Jim M. and I have been receiving all the feedback we could handle from Butler and John Guttag. However, we think things have now stabilized quite a bit. This message focusses on things that have changed since "last time." Jim M. is working on expanding an outline for a more complete manual along the lines of the Interpress report; most of these pieces will fit in there somewhere.--Jim H.] STANDARD CARDS 1. WE ARE DESIGNING A STANDARD FOR INTERCHANGE, NOT EDITING. 2. EVERY SCRIPT IS PARSED BEFORE BEING INTERPRETED. 3. STANDARDIZE CONCEPTS, NOT NAMES. 4. COMBINING DOCUMENTS IS AN EDITOR, NOT AN INTERCHANGE, TASK. 5. GENSYM IS AN EDITOR FUNCTION. 6. CHARACTER-SET AND LEXICAL ISSUES CAN BE RESOLVED AT THE END. 7. GET THE TECHNICAL BASIS RIGHT BEFORE WORRYING ABOUT THE POLITICAL ISSUES. ------------------- We envision an Interdoc script being input and viewed in any manner equivalent to the following: Parse the entire script, repeatedly - reducing each expression to its "dominant structure," containing only literals and structural delimiters, by replacing identifiers by the values to which they are bound in the current environment, by applying operators, by evaluating selections, and by removing binding items, - determining the properties of each node from its marks, - recording the sources and targets for all links, and - transforming the environment as indicated by the binding items (recording the relevant attributes for each node in a form convenient to the particular editor). BASIC INTERDOC LEXICAL STRUCTURE AND EXAMPLES Allow blank, CR, comma, semicolon as equivalent separators??? Largely pirate grammar of literals from Interpress id ::= letter ( letter | digit )* Expressions in an Interdoc script may denote literal values: Booleans: (F, T) integers: ... 3, 2, 1, 0, 1, 2, 3, ... reals: 1.2E5, . . . strings: <this is a string> marks: Xerox.Laurel.Message# links: A123@!, Paragraph.Example!, anId@ universal names: $Authority.Sub.id nodes sequences of values unevaluated expressions environments SYNTACTIC EXAMPLE {Book.example! -- Links to this from Book@ and Book.example@ ExampleParagraph -- Invokes a definition $UniqueMark12356# -- Adds a mark Font←[Font | size←10*pt; face←bold] factorial←'(LT[Value, 2] | 1 | Value*factorial(Value-1))' a:='NOT[EQ[margins.left, factorial[5]]]' margins.right←100; r=12.5*pt (a | margins.left←+5; margins.right←5 | margins.left+←10) -- conditional: Algol68 <text for this node> } -- Or, using short identifiers, and omitting comments and unnecessary spaces: {b.e!ep;um#f←[f|s←10*pt;fc←bo]fa←'(l[Value,2]|1|Value*fa(Value-1))' a:='n[e[m.l,fa[5]]]'m.r←100;r=12.5*pt(a|m.l←+5;m.r←5|m.l+←10)<text for this node>} GRAMMAR script ::= versionId item item ::= value | binding | label value ::= term | node term ::= primary | primary op term op ::= "+" | "" | "*" | "/" primary ::= literal | invocation | application | selection | sequence literal ::= Boolean | integer | hexint | real | string invocation ::= name | universal name ::= id ( "." id )* universal ::= "$" name application ::= invocation "[" item* "]" selection ::= "(" term "|" item* "|" item* ")" sequence ::= "(" ( value | binding )* ")" node ::= "{" item* "}" binding ::= name mode rhs mode ::= "=" | ":" | ":=" | "←" rhs ::= value | op term | "'" item* "'" | "[" [ invocation ] "|" binding* "]" label ::= mark | link mark ::= invocation "#" link ::= id "@!" | name "@" | name "!" SEMANTIC FUNCTIONS R: expression, environment > expression -- Reduction C: expression > expression -- Contents B: expression, environment > environment -- Bindings M: expression > expression -- Marks L: expression > expression -- Links S: expression > expression -- Link sources T: expression > expression -- Link targets GRAMMATICAL FEATURE X SEMANTIC FUNCTION MATRIX FEATURES: FUNCTIONS: R C B M L S T term ::= primary op term + = primary ::= literal == invocation ::= id + + invocation ::= name "." id + + universal ::= "$" name == application ::= invocation "[" item* "]" + selection ::= "(" term "|" item1* "|" item2* ")" + + node ::= "{" item* "}" + = + sequence ::= "(" ( value | binding )* ")" + = + item* ::= item1 item* + + + + + + + binding ::= name mode rhs + rhs ::= "'" item* "'" + rhs ::= "[|" binding* "]" + rhs ::= "[" invocation "|" binding* "]" + mark ::= invocation "#" + + link ::= id "@!" = + link ::= name "@" = + link ::= name "!" = + Semantic function produces Nil or E or does not apply. + Non-trivial semantic equation. =For R: passes value unchanged; for C: value same as R. NOTATION FOR ENVIRONMENTS Environments bind identifiers to expressions, in various modes ("=", ":", ":=", "←"): Null denotes the "empty" environment [E | id m e] means "E with id mode m bound to e" locBinding(id, E) denotes the binding mode of id in E locBinding(id, Null) = None locBinding(id, [E | id' m e]) = if id=id' then m else locBinding(id, E) locVal(id, E) denotes the value locally bound to id in E locVal(id, Null) = Nil = "" locVal(id, [E | id' m e]) = if id=id' then e else locVal(id, E) PRESENTATION BY FEATURE [E is used to represent the value of the environment in which the feature occurs.] term ::= primary op term op ::= "+" | "" | "*" | "/" R = C = R<primary>(E) op R<term>(E) B = E M = L = S = T = Nil -- Both the primary and the term must reduce to numbers; the arithmetic operators are evaluated right-to-left (a la APL, without precedence) and bind less tightly than application. primary ::= literal literal ::= Boolean | integer | hexint | real | string R = C = literal B = E M = L = S = T = Nil -- The basic contents of a document. invocation ::= id R = R<valOf(id, E)>(E) B = B<valOf(id, E)>(E) where valOf(id, E) = locVal(id, whereBound(id, E)) -- Gets innermost value whereBound(id, E) = CASE -- Gets innermost binding locBinding(id, E) ~= None => E locBinding("Outer", E) ~= None => whereBound(id, locVal("Outer", E)) True => Null -- Both attributes and definitions are looked up in the current environment; depending on the current binding of id, this may produce values and/or bindings; if the binding's rhs was quoted, the expression is evaluated at the point of invocation. invocation ::= name "." id R = R<valOf(id, R<name>(E))>(E) B = B<valOf(id, R<name>(E))>(E) -- Qualified names are treated as "nested" environments. universal ::= "$" name R = C = "$" name B = E M = L = S = T = Nil -- Names prefixed with a $ are presumed to be directly meaningful, and are not looked up in the environment. application ::= invocation "[" item* "]" R = apply(invocation, R<item*>(E), E) B = E where apply(invocation, value*, E) = CASE R<invocation>(E) OF "$equal" => value1 = value2 "$greater" => value1 > value2 . . . "$subscript" => value1[value2] -- value1: sequence, value2: int "$contents" => "(" C<inner(value1)> ")" "$marks" => "(" M<inner(value1)> ")" "$links" => "(" L<inner(value1)> ")" "$sources" => "(" S<inner(value1)> ")" "$targets" => "(" T<inner(value1)> ")" ELSE => R<invocation>([[Null | "Outer" "=" E] | "Value" "=" value*]) inner("{" value* "}") = value* -- If the invocation does not evaluate to one of the standard external function names, the current environment is augmented with a binding of the value of the argument list to the identifier Value, and the value is the result of the invocation in that environment; this allows function definition within the language. selection ::= "(" term "|" item1* "|" item2* ")" R = if R<term>(E) then R<item1*>(E) else R<item2*>(E) B = if R<term>(E) then B<item1*>(E) else B<item2*>(E) -- This is a standard conditional expression/binding, using syntax borrowed from Algol 68. sequence ::= "(" item* ")" R = C = "(" R<item*>(E) ")" B = B<item*>(E) M = L = S = T = Nil -- Parentheses group a sequence of items as a single value; bindings in the sequence affect the environment of items to the right in the containing node, but labels are disallowed. Parentheses may also be used to override the right-to-left evaluation of arithmetic operators; an operand sequence must reduce to a single numeric value. node ::= "{" item* "}" R = C = "{" R<"Sub" item*>([Null | "Outer" "=" E]) "}" B = locVal("Outer", (B<"Sub" item*>([Null | "Outer" "=" E]))) M = L = S = T = Nil -- Nodes have nested environments, and affect the containing environment only through persistent (:=) bindings to ids with outer VAR (:) bindings. The items of a node are implicitly prefixed with the id Sub, which may be bound to any information intended to be common to all subnodes in a scope. item* ::= "" R = C = M = L = S = T = Nil B = E -- The empty sequence of items has no value and no effect; this is the basis for the following recursive definition. item* ::= item1 item* R = R<item1>(E) R<item*>(B<item1>(E)) B = B<item*>(B<item1>(E)) For F in {C, M, L, S, T}: F = F<item1> F<item*> -- In general, the value of a sequence of items is just the sequence of item values; binding items affect the environment of items to their right; Nil does not change the length of a result sequence. binding ::= name mode rhs R = Nil B = bind(name, mode, R<rhs>(E), E) where bind(id, mode, value, E) = CASE bindingOf(id, E) = "=" => E -- Can't rebind constants mode = ":=" => assign(id, value, E) True => [E | id mode value] bind(id "." name, mode, value, E) = [E | id bindingOf(id, E) bind(name, mode, value, valOf(id, E))] bindingOf(id, E) = locBinding(id, whereBound(id, E)) assign(id, value, E) = CASE locBinding(id, E) = ":" => [E | id ":" value] bindingOf(id, E) = ":" => bind("Outer." id, ":=", value, E) True => E -- Can only assign to vars -- This adds a single binding to E; bindings have no other "side effects" and no value. binding ::= name mode op term = <name mode name op term> -- This is just a convenient piece of syntactic sugar for the common case of updating a binding. rhs ::= "'" item* "'" R = item* -- A quoted rhs is evaluated at the point of invocation, rather than the point of binding. rhs ::= "[|" binding* "]" R = [B<binding*>([Null | "Outer" "=" E]) | "Outer" "=" Null] -- This creates a new environment value that may be used much like a record. rhs ::= "[" invocation "|" binding* "]" R =[B<binding*>([R<invocation>(E) | "Outer" "=" E]) | "Outer" "=" Null] -- This creates a new environment value that is an extension of an existing one. mark ::= invocation "#" R = R<invocation>(E) "#" M = invocation B = E C = L = S = T = Nil -- This gives the containing node the property denoted by the mark to which the invocation reduces. link ::= id "@!" R = id "@!" L = id B = E C = M = S = T = Nil -- This defines the scope of the set of links whose "main" component is id. link ::= name "@" R = name "@" S = name B = E C = M = L = T = Nil -- This identifies the containing node as a "source" of the link name. link ::= name "!" R = name "!" T = prefixes(name) B = E C = M = L = S = Nil where prefixes(id) = id prefixes(name "." id) = name "." id prefixes(name) -- This identifies the containing node as a "target" of each of the links that is a prefix of name. DISCUSSION Each environment, E, initially contains only its "inherited" environment (bound to the id Outer). Most bindings take place directly in E. To allow for "persistent" bindings, the value of a bind(id, ":=", val, E) will change E by rebinding id in the "innermost" environment (following the chain of Outers) in which it is bound, if that binding has the binding ":" (Var). Identifiers bound with binding "=" (Const) may not be rebound in inner environments. If the rhs of a binding is surrounded by single quotes, it will be evaluated in the environments where the name is invoked, rather than the environment in which the binding is made. When an id is referred to and locBinding(id, E)=None, then the value is sought recursively in locVal("Outer"). The (implicit) "outermost" environment binds each id to the "universal" name $id. Nodes are delimited by brackets. The contents of each node are implicitly prefixed by Sub, which will generally be bound in the containing environment to a quoted expression performing some bindings, and perhaps supplying some labels (marks and links). Parentheses are used to delimit sequence values. Square brackets are used to delimit the argument list of an operator application and to denote environment constructors, which behave much like records. Expressions involving the four infix ops (+, , *, /) are evaluated right-to-left (a la APL); since we expect expressions to be short, we have not imposed precedence rules. The notation for selections (conditionals) is borrowed from Algol 68: ( <test> | <true part> | <false part> ) This is consistent with our principles of using balanced brackets for compound constructions and avoiding syntactically reserved words; the true part and false part may each contain an arbitrary number of items (including none). A label N! on a node makes that node a "target" of the link N (and its prefixes); a label N@ makes it a "source." The "main" identifier of a link must be declared (using id@!) at the root of a subtree containing all its sources and targets. The link represents a set of directed arcs, one from each of its sources to each of its targets. Multiple target labels make a node the target of multiple links. A target label that appears only on a single node places it in a singleton set, i.e., identifies it uniquely. OTHER NOTES Conservative rules for editor treatment of Interdoc subtrees created by other editors: -It's OK to display a node if you understand at least one of its marks. -It's OK to edit a node if you understand ALL of its (local) marks, and either don't remove any of them or also understand ALL marks of its parent. -It's OK to copy a node if that doesn't move any labels outside their scope, you understand ALL marks of its new parent, and all relevant attributes have the same binding in the new environment. -it's OK to delete a (subtree rooted at a) node if you understand ALL marks of its parent. -It's OK to change a binding if you understand ALL marks of each node in its scope, or if it is for a standard attribute that you understand. More work needed on definition vs. attribute, direct vs. indirect contents. The "view" of the dominant structure is ALWAYS controlled by the marks on its nodes. (E.g., numbers are often part of the content for purposes other than to be "shown.") $hidden# means that a node is intended not to be viewed at the point where it occurs in the dominant structure; such nodes will generally be targets of links, placed remotely from their sources (uses) to extract attributes from an environment (e.g., page or figure numbers for references). We will use the term "property" to refer to a characteristic associated with a node by means of a label (mark or link) and "attribute" to refer to a characteristic whose value is determined by the environment. Properties are either present (locally) or absent, whereas attributes commonly have values that apply throughout a scope. Level 2 of the standard will be primarily concerned with the definition of standard properties that are expected to be shared among editors. For each standard property, it will describe - the associated universal mark that denotes it, - the assumptions that this mark implies about the contents of a node (values that must/may be present and their intended intepretation), - the assumptions that this mark implies about the environment of a node (attributes that must be present and their intended intepretation). Attributes are "relevant" to a node if they are assumed by any of its marks. In general, a node's environment will also contain bindings for many "latent" attributes that are either relevant to its ancestors (and inherited by default) or are potentially relevant to its descendants. A "definition" is an attribute that is relevant precisely to those nodes that invoke it. We need some further thought (and syntax and semantics) for how to associate assumptions with marks. Attributes are inherited only via environments following the dominant structure. Thus the choice of a dominant structure to represent scripts from a particular editor will be strongly influenced by expectations about inheritance. Put a value in contents if: Put a value in environment if: effect is local to node has scope is directly edited is only indirectly edited is to be bound locally needs delayed or global binding The presentation of this material could be clarified by a table that relates constructions in the notation to their intended uses and meanings. EDITOR LEVELS Just as with Interpress, the Interdoc standard will apply to editors with varying capabilities, and it will be important to define some structure to the space of possibilities. Dimensions in which we foresee reasonable variation are: - Invocation/Application: Only built-in definitions - definitions in script. - Dominant structure: Nodes cannot nest - nesting allowed. - Other structure: No links - links allowed. - Bindings: Local only - CONST(=), VAR(:), and persistent (:=). - Selection: No conditionals - conditionals. - Numbers: Integers only - floating point. HINTS TO IMPLEMENTORS Scopes: The scope of a binding always begins (textually) at the point where the binding appears, and extends strictly to its right. This should simplify the processing of environments in a single left-to-right scan. Constant bindings are unchangeable within a scope. Environment restoration: Some constructs in the language are not allowed to have a "net effect" on the environment, even though they may contain internal bindings that will be in effect for some or all of their substructures. Thus, a program to process Interdoc scripts must make provision for restoration of saved environments. The amount of bookkeeping required to do this can be minimized by the following observations: - A rhs always produces a value (which may be an environment), which is then bound to a name in the environment that obtained when the binding was encountered. Thus, occurrence of a bindingMode (=, :, :=, ←) signals a potential restoration point. - Any application (but not every invocation), any term containing infix ops, and any mark will leave the environment unchanged. Thus the appearance of an operator, the bracket surrounding an argument list, or # signals a potential restoration point; the environment to be saved is the one BEFORE the invocation to its left. (One-character lookahead before invocation is prudent.) - Environments are NOT restored at the end of sequences. - The environment at the end of a node is the final value of the Outer component of the node's environment; unless the node contains persistent (:=) bindings, this will be the same as the environment just prior to the node. Coding tricks for common structures: - Attributes that tend to persist across boundaries in the dominant structure (e.g., Bravo character looks, which persist from paragraph to paragraph unless explicitly changed by the user) should generally be represented by VAR (:) bindings at a high-level node in the script; changes would then be represented by persistent (:=) bindings at the points where they occur. - Persistent bindings can also be used to record changes to "running" information (e.g., section number, figure number). - Information (e.g., marks) repeated in all subnodes of a node should be bound to Sub, and hence associated with the subnodes implicitly. - Very compact encodings can often be obtained by using single-character id's for common sequences of items; each editor can have its own set, tuned to its own usages. When reading a script produced by a different editor, it will be necessary to know the bindings for these abbreviations. - Common editing operations may be represented by definitions with short names (e.g., Look Keep 40 in Bravo might become "L.K[40]"). CONSCIOUSLY POSTPONED Lambda expressions. Sets (Cf. Mitchell's Font example.) SET/LIST operators ($append $union ?) Extend selection to CASE? ------------------- Open questions: We should rethink our character assignments. check our characterset for disjointness with Interpress.DoubtfulChars. enlarge op with a few more single-character operators? %, &, \ Not done: ------------------- ACKNOWLEDGEMENTS Interpress, Interdoc working group, Lampson, Guttag, Donahue, others? APPENDICES ALTERNATE PRESENTATIONS OF THE SEMANTIC EQUATIONS PRESENTATION BY FUNCTION R: expression, environment > expression -- Reduction R<construct>(E) = CASE <construct> OF <primary op term> => R<primary>(E) op R<term>(E) <literal> => literal <id> => R<valOf(id, E)>(E) <name "." id> => R<valOf(id, R<name>(E))>(E) <"$" name> => "$" name <invocation "[" item* "]"> => apply(invocation, R<item*>(E), E) <"(" term "|" item1* "|" item2* ")"> => if R<term>(E) then R<item1*>(E) else R<item2*>(E) <"(" item* ")"> => "(" R<item*>(E) ")" <"{" item* "}"> => "{" R<"Sub" item*>([Null | "Outer" "=" E]) "}" <> => Nil <item1 item*> => R<item1>(E) R<item*>(B<item1>(E)) <name mode rhs> => Nil <"'" item* "'"> => item* <"[|" binding* "]"> => [B<binding*>([Null | "Outer" "=" E]) | "Outer" "=" Null] <"[" invocation "|" binding* "]"> => [B<binding*>([R<invocation>(E) | "Outer" "=" E]) | "Outer" "=" Null] <invocation "#"> => R<invocation>(E) "#" <link> => link C: expression > expression -- Content of node C<construct> = CASE <construct> OF <value> => value <label> => Nil <item1 item*> => C<item1> C<item*> B: expression, environment > environment -- Bindings B<construct>(E) = CASE <construct> OF <id> => B<valOf(id, E)>(E) <name "." id> => B<valOf(id, R<name>(E))>(E) <"(" term "|" item1* "|" item2* ")">(E) => if R<term>(E) then B<item1*>(E) else B<item2*>(E) <"(" item* ")"> => B<item*>(E) <"{" item* "}"> => locVal("Outer", (B<"Sub" item*>([Null | "Outer" "=" E]))) <item1 item*> => B<item*>(B<item1>(E)) <name mode rhs> => bind(name, mode, R<rhs>(E), E) ELSE => E M: expression > expression -- Sequence of marks M<construct> = CASE <construct> OF <name "#"> => name <item1 item*> => M<item1> M<item*> ELSE => Nil L: expression > expression -- Sequence of links L<construct> = CASE <construct> OF <id "@!"> => id <item1 item*> => L<item1> L<item*> ELSE => Nil S: expression > expression -- Sequence of source links S<construct> = CASE <construct> OF <name "!"> => prefixes(name) <item1 item*> => S<item1> S<item*> ELSE => Nil T: expression > expression -- Sequence of target links T<construct> = CASE <construct> OF <name "@"> => name <item1 item*> => T<item1> T<item*> ELSE => Nil SEMANTICS: REDUCTION AND BINDING R: expression, environment > expression -- Reduction B: expression, environment > environment -- Bindings R&B<construct>(E) denotes the pair R<construct>(E); B<construct>(E) [Unless explicitly given below, B<construct>(E) = E.] R<primary op term>(E) = R<primary>(E) op R<term>(E) R<literal>(E) = literal R&B<id>(E) = R&B<valOf(id, E)>(E) R&B<name "." id>(E) = R&B<valOf(id, R<name>(E))>(E) R<"$" name>(E) = "$" name R<invocation "[" item* "]">(E) = apply(invocation, R<item*>(E), E) R&B<"(" term "|" item1* "|" item2* ")">(E) = if R<term>(E) then R&B<item1*>(E) else R&B<item2*>(E) R&B<"(" item* ")">(E) = "(" R<item*>(E) ")" ; B<item*>(E) R&B<"{" item* "}">(E) = "{" R<"Sub" item*>([Null | "Outer" "=" E]) "}"; locVal("Outer", (B<"Sub" item*>([Null | "Outer" "=" E]))) R<>(E) = Nil R&B<item1 item*>(E) = R<item1>(E) R<item*>(B<item1>(E)); B<item*>(B<item1>(E)) R&B<name mode rhs>(E) = Nil; bind(name, mode, R<rhs>(E), E) <name mode op term> = <name mode name op term> -- Syntactic sugar R<"'" item* "'">(E) = item* --Usable only in rhs of binding R<"[|" binding* "]">(E) = [B<binding*>([Null | "Outer" "=" E]) | "Outer" "=" Null] R<"[" invocation "|" binding* "]">(E) = [B<binding*>([R<invocation>(E) | "Outer" "=" E]) | "Outer" "=" Null] R<invocation "#">(E) = R<invocation>(E) "#" R<link>(E) = link SEMANTICS OF CONTENTS AND LABELS C: expression > expression -- Contents of node M: expression > expression -- Sequence of marks L: expression > expression -- Sequence of links S: expression > expression -- Sequence of source links T: expression > expression -- Sequence of target links [These functions all return the empty string, Nil (which does not lengthen a sequence), except as specified below.] C<value> = value C<label> = Nil C<item1 item*> = C<item1> C<item*> M<name "#"> = name M<item1 item*> = M<item1> M<item*> L<id "@!"> = id L<item1 item*> = L<item1> L<item*> S<name "!"> = prefixes(name) S<item1 item*> = S<item1> S<item*> T<name "@"> = name T<item1 item*> = T<item1> T<item*> prefixes(id) = id prefixes(name "." id) = name "." id prefixes(name) *start* 01902 00024 USt Date: 15 Sept. 1981 10:28 am PDT (Tuesday) From: Horning.pa Subject: Notes from our Inter-X discussion yesterday To: Lampson, Mitchell cc: Horning Butler's global message was that the current presentation of the formal semantics was hard to grasp and confusing. It would be a mistake to circulate it at this point and frighten people off. We might be able to improve the situation by attempting to isolate a smaller semantic base, and explain more of the language as sugarings Literal values Structured value constructors Node Environment Vector ("Do we really need this?", he asks.) Bindings Generic operations Applications and Invocations Selections Specific functions Arithmetic Comparison Logical Subscript ... Butler questioned the desirability of a semantic class rhs, with values that could only be used in bindings--specifically, quotations and environments. A "free-standing" environment as a transformation would give us a way to bind a set of rhs values, but delay putting the bindings into the environment. Rather than quoting some rhs values, we should simply not evaluate an rhs unless it is explicitly requested. The ops should be presented as sugar for applications. The nonterminal "sequence" should be "vector" (or some such). The nonterminal "value" should be "content" (or some such). A mark should be <universal> "#", rather than <invocation> "#". Parentheses should NOT be used both to form vectors and to override precedence (use functional notation for the latter). VAR attributes and persistent bindings should NOT be used simply to record attributes that tend to persist across boundaries in the dominant structure, but ONLY to record the intent that something change "running" information. Butler prefers "Interscript" to either "Interdoc" or "Interedit", although he is uncertain why.