*start* 14263 00024 USt Date: 5 March 1982 2:25 pm PST (Friday) From: Mitchell.PA Subject: Interdoc syntax and semantics To: Interdoc.pa Reply-To: Mitchell.pa Here are the current versions of the Interdoc syntax and semantics that I promised to distribute to you all at this morning's meeting. I have laced the semantic equations with some explanatory text to help you wade through them, but I am under no illusions that this has made them easy to read. Please feel free to send me questions about them. Such questions will undoubtedly help us to clarify the presentation. Jim M. ------------------- GRAMMAR script ::= versionId item item ::= value | binding | label value ::= term | node term ::= primary | primary op term op ::= "+" | "-" | "*" | "/" primary ::= literal | invocation | application | selection | sequence literal ::= Boolean | integer | hexint | real | string invocation ::= name | universal name ::= id ( "." id )* universal ::= "$" name application ::= invocation "[" item* "]" selection ::= "(" term "|" item* "|" item* ")" sequence ::= "(" ( value | binding )* ")" node ::= "{" item* "}" binding ::= name mode rhs mode ::= "=" | ":" | ":=" | "_" rhs ::= value | op term | "'" item* "'" | "[" [ invocation ] "|" binding* "]" label ::= mark | link mark ::= invocation "#" link ::= id "@!" | name "@" | name "!" NOTATION FOR ENVIRONMENTS Environments bind identifiers to expressions, in various modes ("=", ":", ":=", "_"): Null denotes the "empty" environment [E | id m e] means "E with id mode m bound to e" locBinding(id, E) denotes the binding mode of id in E locBinding(id, Null) = None locBinding(id, [E | id' m e]) = if id=id' then m else locBinding(id, E) locVal(id, E) denotes the value locally bound to id in E locVal(id, Null) = Nil = "" locVal(id, [E | id' m e]) = if id=id' then e else locVal(id, E) SEMANTIC FUNCTIONS R: expression, environment --> expression -- Reduction R is used for evaluating right-hand sides: identifiers, expressions, etc. C: expression --> expression -- Contents C is basically used to indicate which evaluated expressions become part of the content of a node B: expression, environment --> environment -- Bindings B indicates the effect a binding has on an environment. B and R are mutually recursive functions (e.g., the evaluation of an expression may cause some bindings to occur as well) The following four semantic functions occur less frequently in any substantive way in the semantics below. You might wish to skip them until they occur in a nontrivial manner in the semantics. M: expression --> expression -- Marks M indicates when an identifier is to be included in the mark set for a node L: expression --> expression -- Links L indicates link declarations S: expression --> expression -- Link sources S indicates a link to the set of nodes having associated target links T: expression --> expression -- Link targets T indicates that the node is to be included in the target set of all the names which are prefixes of the name to which the expression should evaluate PRESENTATION BY FEATURE [E is used to represent the value of the environment in which the feature occurs.] term ::= primary op term op ::= "+" | "-" | "*" | "/" R = C = R(E) op R(E) B = E M = L = S = T = Nil -- Both the primary and the term must reduce to numbers; the arithmetic operators are evaluated right-to-left (a la APL, without precedence) and bind less tightly than application. primary ::= literal literal ::= Boolean | integer | hexint | real | string R = C = literal B = E M = L = S = T = Nil -- The basic contents of a document. invocation ::= id R = R(E) B = B(E) where valOf(id, E) = locVal(id, whereBound(id, E)) -- Gets innermost value whereBound(id, E) = CASE -- Gets innermost binding locBinding(id, E) ~= None => E locBinding("Outer", E) ~= None => whereBound(id, locVal("Outer", E)) True => Null -- Both attributes and definitions are looked up in the current environment; depending on the current binding of id, this may produce values and/or bindings; if the binding's rhs was quoted, the expression is evaluated at the point of invocation. -- When an id is referred to and locBinding(id, E)=None, then the value is sought recursively in locVal("Outer"). The (implicit) "outermost" environment binds each id to the "universal" name $id. invocation ::= name "." id R = R(E))>(E) B = B(E))>(E) -- Qualified names are treated as "nested" environments. universal ::= "$" name R = C = "$" name B = E M = L = S = T = Nil -- Names prefixed with a $ are presumed to be directly meaningful, and are not looked up in the environment. application ::= invocation "[" item* "]" R = apply(invocation, R(E), E) B = E where apply(invocation, value*, E) = CASE R(E) OF "$equal" => value1 = value2 "$greater" => value1 > value2 . . . "$subscript" => value1[value2] -- value1: sequence, value2: int "$contents" => "(" C ")" "$marks" => "(" M ")" "$links" => "(" L ")" "$sources" => "(" S ")" "$targets" => "(" T ")" ELSE => R([[Null | "Outer" "=" E] | "Value" "=" value*]) inner("{" value* "}") = value* -- If the invocation does not evaluate to one of the standard external function names, the current environment is augmented with a binding of the value of the argument list to the identifier Value, and the value is the result of the invocation in that environment; this allows function definition within the language. selection ::= "(" term "|" item1* "|" item2* ")" R = if R(E) then R(E) else R(E) B = if R(E) then B(E) else B(E) -- The notation for selections (conditionals) is borrowed from Algol 68: ( | | ) This is consistent with our principles of using balanced brackets for compound constructions and avoiding syntactically reserved words; the true part and false part may each contain an arbitrary number of items (including none). sequence ::= "(" item* ")" R = C = "(" R(E) ")" B = B(E) M = L = S = T = Nil -- Parentheses group a sequence of items as a single value; bindings in the sequence affect the environment of items to the right in the containing node, but labels are disallowed. Parentheses may also be used to override the right-to-left evaluation of arithmetic operators; an operand sequence must reduce to a single numeric value. node ::= "{" item* "}" R = C = "{" R<"Sub" item*>([Null | "Outer" "=" E]) "}" B = locVal("Outer", (B<"Sub" item*>([Null | "Outer" "=" E]))) M = L = S = T = Nil -- Nodes have nested environments, and affect the containing environment only through persistent (:=) bindings to ids with outer VAR (:) bindings. The items of a node are implicitly prefixed with the id Sub, which may be bound to any information intended to be common to all subnodes in a scope. item* ::= "" R = C = M = L = S = T = Nil B = E -- The empty sequence of items has no value and no effect; this is the basis for the following recursive definition. item* ::= item1 item* R = R(E) R(B(E)) B = B(B(E)) For F in {C, M, L, S, T}: F = F F -- In general, the value of a sequence of items is just the sequence of item values; binding items affect the environment of items to their right; Nil does not change the length of a result sequence. binding ::= name mode rhs R = Nil B = bind(name, mode, R(E), E) where bind(id, mode, value, E) = CASE bindingOf(id, E) = "=" => E -- Can't rebind constants mode = ":=" => assign(id, value, E) True => [E | id mode value] bind(id "." name, mode, value, E) = [E | id bindingOf(id, E) bind(name, mode, value, valOf(id, E))] bindingOf(id, E) = locBinding(id, whereBound(id, E)) assign(id, value, E) = CASE locBinding(id, E) = ":" => [E | id ":" value] bindingOf(id, E) = ":" => bind("Outer." id, ":=", value, E) True => E -- Can only assign to vars -- This adds a single binding to E; bindings have no other "side effects" and no value. -- Each environment, E, initially contains only its "inherited" environment (bound to the id Outer). Most bindings take place directly in E. To allow for "persistent" bindings, the value of a bind(id, ":=", val, E) will change E by rebinding id in the "innermost" environment (following the chain of Outers) in which it is bound, if that binding has the binding ":" (Var). Identifiers bound with binding "=" (Const) may not be rebound in inner environments. binding ::= name mode op term = -- This is just a convenient piece of syntactic sugar for the common case of updating a binding. rhs ::= "'" item* "'" R = item* -- If the rhs of a binding is surrounded by single quotes, it will be evaluated in the environments where the name is invoked, rather than the environment in which the binding is made. rhs ::= "[|" binding* "]" R = [B([Null | "Outer" "=" E]) | "Outer" "=" Null] -- This creates a new environment value that may be used much like a record. rhs ::= "[" invocation "|" binding* "]" R =[B([R(E) | "Outer" "=" E]) | "Outer" "=" Null] -- This creates a new environment value that is an extension of an existing one. mark ::= invocation "#" R = R(E) "#" M = invocation B = E C = L = S = T = Nil -- This gives the containing node the property denoted by the mark to which the invocation reduces. link ::= id "@!" R = id "@!" L = id B = E C = M = S = T = Nil -- This defines the scope of the set of links whose "main" component is id. -- A label N! on a node makes that node a "target" of the link N (and its prefixes); a label N@ makes it a "source." The "main" identifier of a link must be declared (using id@!) at the root of a subtree containing all its sources and targets. The link represents a set of directed arcs, one from each of its sources to each of its targets. Multiple target labels make a node the target of multiple links. A target label that appears only on a single node places it in a singleton set, i.e., identifies it uniquely. link ::= name "@" R = name "@" S = name B = E C = M = L = T = Nil -- This identifies the containing node as a "source" of the link name. link ::= name "!" R = name "!" T = prefixes(name) B = E C = M = L = S = Nil where prefixes(id) = id prefixes(name "." id) = name "." id prefixes(name) -- This identifies the containing node as a "target" of each of the links that is a prefix of name. DISCUSSION Each environment, E, initially contains only its "inherited" environment (bound to the id Outer). Most bindings take place directly in E. To allow for "persistent" bindings, the value of a bind(id, ":=", val, E) will change E by rebinding id in the "innermost" environment (following the chain of Outers) in which it is bound, if that binding has the binding ":" (Var). Identifiers bound with binding "=" (Const) may not be rebound in inner environments. If the rhs of a binding is surrounded by single quotes, it will be evaluated in the environments where the name is invoked, rather than the environment in which the binding is made. When an id is referred to and locBinding(id, E)=None, then the value is sought recursively in locVal("Outer"). The (implicit) "outermost" environment binds each id to the "universal" name $id. Nodes are delimited by brackets. The contents of each node are implicitly prefixed by Sub, which will generally be bound in the containing environment to a quoted expression performing some bindings, and perhaps supplying some labels (marks and links). Parentheses are used to delimit sequence values. Square brackets are used to delimit the argument list of an operator application and to denote environment constructors, which behave much like records. Expressions involving the four infix ops (+, -, *, /) are evaluated right-to-left (a la APL); since we expect expressions to be short, we have not imposed precedence rules. The notation for selections (conditionals) is borrowed from Algol 68: ( | | ) This is consistent with our principles of using balanced brackets for compound constructions and avoiding syntactically reserved words; the true part and false part may each contain an arbitrary number of items (including none). A label N! on a node makes that node a "target" of the link N (and its prefixes); a label N@ makes it a "source." The "main" identifier of a link must be declared (using id@!) at the root of a subtree containing all its sources and targets. The link represents a set of directed arcs, one from each of its sources to each of its targets. Multiple target labels make a node the target of multiple links. A target label that appears only on a single node places it in a singleton set, i.e., identifies it uniquely. GRAMMATICAL FEATURE X SEMANTIC FUNCTION MATRIX FEATURES: FUNCTIONS: R C B M L S T term ::= primary op term + = - - - - - primary ::= literal == - - - - - invocation ::= id + - + - - - - invocation ::= name "." id + - + - - - - universal ::= "$" name == - - - - - application ::= invocation "[" item* "]" + - - - - - - selection ::= "(" term "|" item1* "|" item2* ")" + - + - - - - node ::= "{" item* "}" + = + - - - - sequence ::= "(" ( value | binding )* ")" + = + - - - - item* ::= item1 item* + + + + + + + binding ::= name mode rhs - - + - - - - rhs ::= "'" item* "'" + - - - - - - rhs ::= "[|" binding* "]" + - - - - - - rhs ::= "[" invocation "|" binding* "]" + - - - - - - mark ::= invocation "#" + - - + - - - link ::= id "@!" =- - - + - - link ::= name "@" =- - - - + - link ::= name "!" =- - - - - + - Semantic function produces Nil or E or does not apply. + Non-trivial semantic equation. =For R: passes value unchanged; for C: value same as R. ------------------- *start* 16832 00024 USt Date: 12 May 1982 4:55 pm PDT (Wednesday) From: Mitchell.PA To: Interdoc.PA Subject: Interdoc syntax and semantics Categories: Save Reply-To: Mitchell.PA ------------------- The syntax has been brought up to date and the semantics have been updated to add some new features (1) the meaning of a tag (formerly mark) includes evaluating the tag name as well so that its default bindings can be obtained simply by writing the tag (e.g., PARA$ places the tag PARA on the node and evaluates PARA%). (2) the notion of a scope (the "unit" that owns an environment) has been added. ------------------- GRAMMAR script ::= versionId node versionID ::= "Interscript/Interchange/1.0 " content ::= term | node term ::= primary | primary op term op ::= "+" | "-" | "*" | "/" primary ::= literal | invocation | indirection | application | selection | vector literal ::= Boolean | integer | intSequence | real | string | universal universal ::= ucID ( "." ucID )* name ::= id ( "." id )* invocation ::= name indirection ::= name "%" application ::= ( name | universal ) "[" scope* "]" selection ::= "(" term "|" item* "|" item* ")" vector ::= "(" scope* ")" node ::= "{" nodeItem* "}" nodeItem ::= label | scope scope ::= binding* content content* binding ::= name mode rhs mode ::= "_" | "=" | ":=" rhs ::= content | op term | "'" item* "'" | "[" [ primary ] "|" binding* "]" item ::= label | binding | content label ::= tag | link tag ::= universal "$" link ::= id "@!" | name "@" | name "!" NOTATION FOR ENVIRONMENTS Environments bind identifiers to expressions, in various modes ("=", ":", ":=", "_"): Null denotes the "empty" environment [E | id m e] means "E with id mode m bound to e" locBinding(id, E) denotes the binding mode of id in E locBinding(id, Null) = None locBinding(id, [E | id' m e]) = if id=id' then m else locBinding(id, E) locVal(id, E) denotes the value locally bound to id in E locVal(id, Null) = Nil = "" locVal(id, [E | id' m e]) = if id=id' then e else locVal(id, E) SEMANTIC FUNCTIONS R and B are intended to propagate the effects of the environment into an expression. R: expression, environment --> expression -- Reduction R is used for evaluating right-hand sides: identifiers, expressions, etc. B: expression, environment --> environment -- Bindings B indicates the effect a binding has on an environment. B and R are mutually recursive functions (e.g., the evaluation of an expression may cause some bindings to occur as well) The following five functions all apply to expressions independent of environment and are intended to be used on the result of reducing an expression in an environment. C: expression --> expression -- Contents C is basically used to indicate which evaluated expressions become part of the content of a node The following four semantic functions occur less frequently in any substantive way in the semantics below. You might wish to skip them until they occur in a nontrivial manner in the semantics. T: expression --> expression -- Tags T indicates when an identifier is to be included in the tag set for a node L: expression --> expression -- Links L indicates link declarations LF: expression --> expression -- Links From LF indicates a link to the set of nodes having associated target links LT: expression --> expression -- Links To LT indicates that the node is to be included in the target set of all the names which are prefixes of the name to which the expression should evaluate PRESENTATION BY FEATURE [E is used to represent the value of the environment in which the feature occurs.] script ::= versionId node R = C = R(EXTERNAL) B=EXTERNAL T = L = LF = LT = Nil -- a script is evaluated in the pre-existing EXTERNAL environment common to all Interscript/Interchange/1.0 scripts term ::= primary op term op ::= "+" | "-" | "*" | "/" R = C = R(E) op R(E) B = E T = L = LF = LT = Nil -- Both the primary and the term must reduce to numbers; the arithmetic operators are evaluated right-to-left (a la APL, without precedence) and bind less tightly than application. primary ::= literal literal ::= Boolean | integer | intSequence | real | string | universal R = C = literal B = E T = L = LF = LT = Nil -- The basic contents of a document. universal ::= ucID R = C = ucID B = E T = L = LF = LT = Nil -- universals (all upper case) are presumed to be directly meaningful, and are not looked up in the environment. universal ::= universal "." ucID R = C = universal "." ucID B = E T = L = LF = LT = Nil -- a qualified universal also just stands for itself invocation ::= id R = R(E) B = B(E) where valOf(id, E) = CASE whereBound(id, E) = Null => MakeUniversal(id) whereBound(id, E) = Nil => Nil True => locVal(id, whereBound(id, E)) and whereBound(id, E) = CASE -- Gets innermost binding locBinding(id, E) ~= None => E locBinding("Outer", E) ~= None => whereBound(id, locVal("Outer", E)) E=EXTERNAL => Null True => Nil -- Makeuniversal(id) produces the universal corresponding to id (in the current version its uppercase equivalent) -- Both attributes and definitions are looked up in the current environment; depending on the current binding of id, this may produce values and/or bindings; if the binding's rhs was quoted, the expression is evaluated at the point of invocation. -- When id is referred to and locBinding(id, E)=None, then the value is sought recursively in locVal("Outer", E). The outermost environment, EXTERNAL, binds each id to an universal which is the uppercase version of the id. Otherwise, the value of the id is assumed to be Nil invocation ::= name "." id R = R(E))>(E) B = B(E))>(E) -- Qualified names are treated as "nested" environments. indirection ::= name "%" R = R(E))>(E) B = B(E))>(E) -- Indirection combines the facility for invocation plus recording the fact that the expansion resulted from evaluating a particular name (recording the indirection is not yet included in these semantics). application ::= name "[" scope* "]" R = apply(name, R(E), E) B = E where apply(name, value*, E) = CASE R(E) OF "EQUAL" => value1 = value2 "GREATER"=> value1 > value2 . . . "SUBSCRIPT"=> value1[value2] -- value1: sequence, value2: int "CONTENTS"=> "(" C ")" "TAGS" => "(" T ")" -- ?? this doesn't seem right "LINKS" => "(" L ")" "SOURCES" => "(" LF ")" "TARGETS"=> "(" LT ")" ELSE => R([[Null | "Outer" "=" E] | "Value" "=" value*]) and where inner("{" value* "}") = value* -- If the name does not evaluate to one of the standard external function names, the current environment is augmented with a binding of the value of the argument list to the identifier Value, and the value is the result of the invocation in that environment; this allows function definition within the language. selection ::= "(" term "|" nodeItem1* "|" nodeItem2* ")" R = if R(E) then R(E) else R(E) B = if R(E) then B(E) else B(E) -- The notation for selections (conditionals) is borrowed from Algol 68: ( | | ) This is consistent with our principles of using balanced brackets for compound constructions and avoiding syntactically reserved words; the true part and false part may each contain an arbitrary number of nodeItems (including none). vector ::= "(" scope* ")" R = C = "(" R(E) ")" B = B(E) T = L = LF = LT = Nil -- Parentheses group a sequence of values as a single, vector value; bindings in the sequence of scopes affect the environment of scopes to the right in the containing node, but labels are disallowed. Parentheses may also be used to override the right-to-left evaluation of arithmetic operators; an operand sequence must reduce to a single numeric value. node ::= "{" nodeItem* "}" R = C = "{" R<"Sub$" nodeItem*>([Null | "Outer" "=" E]) "}" B = locVal("Outer", (B<"Sub" nodeItem*>([Null | "Outer" "=" E]))) T = L = LF = LT = Nil -- Nodes have nested environments and affect the containing environment only through global (:=) bindings. The nodeItems of a node are implicitly prefixed with the id Sub, which may be bound to any information intended to be common to all subnodes in a scope. nodeItem* ::= "" R = C = T = L = LF = LT = Nil B = E -- The empty sequence of items has no value and no effect; this is the basis for the following recursive definition. nodeItem* ::= binding* content1 content* R = R(R(B(E)) B = B(B(B(E)) C = C(C(B(E)) For F in {T, L, LF, LT}: F = F