*start* 01712 00024 USt Date: 3 May 1982 2:39 pm PDT (Monday) From: Horning.pa Subject: Class "Inheritance" To: Interdoc During a discussion Friday afternoon, Gael Curry remarked that there were two common kinds of inheritance in documents, but that Interdoc presently only makes one of them easy; furthermore, he suggested a solution that I think has a lot of appeal and fits into the Interdoc framework very nicely. Interdoc makes it very natural for attributes to be inherited via the dominant hierarchy (indeed, we have suggested that a strong heuristic for choosing the dominant hiearchy is to look at the inheritance structure). This works very well for attributes (e.g., margins) that are relevant across many levels of the hiearchy and/or to a wide variety of node types, but not so well for attributes specific to particular "classes" of nodes (e.g., line width and style). The suggestion is very simple: The occurrence of T$ in a node not only adds T to the set of tags of that node (formerly "marks"), it also includes T's value; i.e., it has the effect of T$T% in the current language. Besides its simplicity, this idea has the following going for it: -since the same name is used for both the tag and the included value, it should greatly reduce the amount of name invention required; -in the typical case, it will shorten the script; and -it systematizes something that we found ourselves doing in an ad hoc way in the examples. The value associated with T can be established using the normal binding (and quotation) features of Interdoc. Note that it can include bindings and/or content. For tags defined in the standard, an initial value must be specified, even if it is just Null. Jim H. *start* 02140 00024 USt Date: 3 May 1982 12:10 pm PDT (Monday) From: Horning.pa Subject: Re: Document Model - framing In-reply-to: gcurry.ES's message of 26-Apr-82 15:55:33 PDT (Monday) To: gcurry.ES cc: Interdoc [I think this has been superceded by our discussion Friday; I transmit it in its partially-completed form just for the record. -- Jim H.] Gael, Your message does a pretty good job of identifying issues that must be addressed. The following is an attempt to provide my "going in" answers (where I have them). It is intended in the same spirit of focussing the discussion (but I will not qualify each suggestion with all the appropriate caveats). "InterDoc is silent on rendering": This standard card was developed in my absence; I'm glad to see some discussion of what it does and does not mean, since I've been a bit uncomfortable with a strong interpretation of it. A prime requirement of Interdoc is that it must be able to transmit ANY content/form x structure/value information that an editor considers to be part of a document. For example, Bravo has two kinds of line breaks: those that result from explicit CRs, and those that result from the next word not fitting on the current line. I assume that a Bravo-like editor would transcribe the former into a script, but not the latter. Any conforming editor that "understands" TEXT$ must render these CRs as line breaks. However, since the second kind of breaks are not intrinsic to the document, and not transcribed into the script, Interdoc must be silent on how to compute implicit line breaks. I think we have a couple of notions (rendition and presentation) slightly confused. Scripts are something we interchange, documents are (the user's abstract model of) the data structure manipulated by the editor. Masters are sufficently tightly bound to produce a hardcopy or screen representation. I'd prefer to use "rendering" for the script-to-document, and "presentation" for the document-to-master (or screen, or page), translation. \/ Script /\ | Render Transcribe | \/ Document /\ | Present Infer | \/ Master (display, page) /\ *start* 01127 00024 USt Date: 3 May 1982 2:22 pm PDT (Monday) From: Horning.pa Subject: Terminology To: Interdoc Nobody seems very pleased with Interdoc (or InterDoc) as the name for the standard we are working on. The only suggestion I have heard that aroused some enthusiam was Interscript. I have been mildly negative on this because I didn't want to change "script" and wasn't too pleased with the prospect of repeatedly using phrases like "an Interscript script." However, I have hit on the following idea: Call the standard Interscript, and DEFINE script to be "the representation of an editable document in the Interscript language." Then we would always just say "script," rather than "an Interscript script." Comments, objections? We also realized on Friday that there had been some confusion about the use of the term "rendition." "Import/Export" was suggested as a replacement for "Render/Transcribe," but that leads to connotation problems for Mesa programmers. Would you believe "Recover/Discover"? "Invoke/Evoke"? "Impress/Express"? "Replicate/Transcribe"? "Construe/Transcribe"? [I was afraid not!] Jim H. *start* 00550 00024 USt Date: 3 May 1982 9:26 pm PDT (Monday) From: Mitchell.PA Subject: Re: Terminology In-reply-to: Horning's message of 3 May 1982 2:22 pm PDT (Monday) To: Horning cc: Interdoc I think "Interscript" is a fine replacement for "Interdoc". I propose that we call the process of compiling a script into some document format "internalizing" (I) the script, the process of producing a script from an internal document format "externalizing" (E) the document, and the composition function E(I(script)) "transcription" (T). Jim M. *start* 01047 00024 USt Mail-from: Arpanet host SUMEX-AIM rcvd at 4-MAY-82 0810-PDT Mail-from: SU-NET host SU-SHASTA rcvd at 4-May-82 0805-PDT Date: Tuesday, 4 May 1982 08:07-PDT To: Interdoc at PARC-MAXC Subject: Re: Terminology In-reply-to: Your message of 3 May 1982 9:26 pm PDT (Monday). From: Brian Reid "Interscript" is certainly better than "Interdoc", though for some reason I find the "rscr" hard to say. I am not particularly thrilled with any of the words for the operations. "Internalizing" and "Externalizing" have the least number of semantic side effects, but I grew up in Washington DC and those words really trigger my Bogometer. Too many syllables, and too obviously artificial. It's ok if we don't need to use them much or write them in manuals much. I always thought of the function E(I(script)) as being called "editing". I don't understand why it needs to be called something else. Agreed that there can exist null edits, that simply ingest something and then spit it out again. Hmmph. Brian *start* 01364 00024 USt Date: 4 May 1982 11:31 PDT From: Mitchell at PARC-MAXC Subject: Re: Terminology In-reply-to: reid@Shasta's message of Tuesday, 4 May 1982 08:07-PDT To: Brian Reid cc: Interdoc Brian, I don't know what a Bogometer is, but I quote the following definitions from the American Heritage Dictionary: internalize tr.v. 1. to make internal externalize tr.v. 1. To make external; give external existence to. These may be nouns that have been verbed, but I recall them being acceptable words when I was in elementary school participating in spelling contests (and that's a LONG time ago!). E(I(script)) is nothing more than that. Let's assume that (a particular session of) editing can be described by a function Edit: then E(Edit(I(script))) includes editing. I didn't include Edit in the definition of "transcribe" because the composition script'=E(I(script)) is interesting itself if we would like to say something about the "equivalence" of script' and script. It seemed natural to me to want to say what properties of a script are preserved under transcription, so that's why I called it that. Remember: "the best is the enemy of the good" (A. Perlis). Jim M. P.S. "Hmmph" is not in the American Heritage dictionary, so I assume you meant humph interj. Used to express doubt, displeasure, or contempt. *start* 01110 00024 USt Date: 4 May 1982 15:20 CDT From: Johnston.DLOS at PARC-MAXC Subject: Re: Terminology In-reply-to: Mitchell's message of 4 May 1982 11:31 PDT To: Mitchell.PA cc: Brian Reid , Interdoc.pa Jim, Accuracy at the expense of comprehensibility results in a degradation of net communication. I personally don't have any particular problem with Internalize and Externalize except that while strictly accurate they may be, they are somewhat symbolic (obscure) also, which is a problem the entire industry already suffers from greatly. Of course, they may be better (more understandable) than transcribe and render. What about encode and decode? These are somewhat understood in this context already, although your ideas as to accuracy would be welcome. I know it doesn't suggest that the script is a "script," but I feel they're reasonable terms for the action taking place. Following your comments about E(I(script)), I must agree with you, particularly since equivalences of scripts are in question. No editing is necessarily implied in this function. Rick *start* 00906 00024 USt Mail-from: Arpanet host SUMEX-AIM rcvd at 4-MAY-82 1343-PDT Mail-from: SU-NET host SU-SHASTA rcvd at 4-May-82 1342-PDT Date: Tuesday, 4 May 1982 13:44-PDT To: Interdoc.PA at PARC-MAXC Subject: Re: Terminology In-reply-to: Your message of 4 May 1982 15:20 CDT. From: Brian Reid Mumble. I still consider E(I(script)) to be a degenerate special case of E(edit(I(script))), and I fail to see the need for two names, implying two separate concepts. The important properties are: (1) Idempotent equality: E(I(E(I(script)))) == E(I(script)) (2) Process equivalence: E(I(script)) equiv script I claim that equality is a stronger notion than equivalence, and that both of these properties have to be met by a conforming editor. I suspect that I will have difficulty teaching my verbal reflexes to say "Internalize" instead of "process into internal form". *start* 02148 00024 USt Date: 5-May-82 19:16:03 PDT (Wednesday) From: Ayers.pa Subject: Minutes of Interdoc Meeting of 30 April To: Interdoc.pa We began with a general discussion of form vs content. Mitchell suggested that this was a spectrum, but with very little not at one end or the other; he suggested tabs and a few other format effectors as the middle. This led into a tabs discussion and we had to back out after a bit ... Scott suggested "Look Capitalized" as a touchstone; this proved fruitful. Is it distinct from font choice? People thought so. Note that it can't be the same as a replacement of character codes, because it is a style. After much discussion, we agreed that it is a legit look, akin to italic. Are attributes back to "one damn thing after another"? [This was sparked by the preceeding.] Can we parameterize? Brian suggested that it would help paragraph attributes to be able to say "First n lines style foo%" Nte that there is another touchstone here: if you would like the paragraph "first line italic" but Interdoc doesn't support that OR your editor doesn't let you say it, then you will end up saying "first nine words italic", which may look ok now, but is relly quite a different thing. A sub-group: Brian and Scott to report back on parameterization, especially as it relates to paragraph properties. Some worry on what ARE form and content. Note that these two words can be applied either to things in the Interdoc SCRIPT or to things in the DOCUMENT that the script describes. The latter is more the user's view; they are not necessarily the same. Brian suggested that the document might be considered to contain "textualk content" "structural content" "private data" and "styles". Gael suggested that we need to come up with some names. We agreed that a "property" was a thing like "bold _ TRUE" and that a "reference to a style" was a thing like "emphasis%" in an ordinary node and that a style" was a thing like "emphasis := "something". It was suggested that the mesage system be used to suggest/vote on other words, including possible new words for "accept" and "emit" scripts. Bob *start* 00972 00024 USt Date: 13-May-82 17:40:40 PDT (Thursday) From: Ayers.PA Subject: Interscript Meeting Friday 14 May 9:00 Bayhill 100G To: Interdoc (should we change the dl name?) Nomenclature: we have embraced the following words: Interscript (replaces Interdoc). And a "script" IS an Interscript script, so we don't have to use the latter phrase. Externalize and Internalize Tag (e.g. PARA$) Layer (i.e. "layer zero" and "layer one") Last Friday's meeting discussed the document model .. perhaps prematurely. I won't recap the (partly baked) discussion because we expect that it will be a little better baked tomorrow. There were some comments about the rarity of persistent (:=) assignment. We agreed that if there are just one or two particular needs for this construct [e.g. running figure numbers] then we would be better off by particularizing those few needs and eliminating persistent assignment from the general scheme of things. Bob *start* 01204 00024 USt Date: 14 May 1982 12:32 pm PDT (Friday) From: Mitchell.PA Subject: semantics of Interscript without environments To: Horning cc: Mitchell Here are the contents of my whiteboard. Comments and questions follow them. ------------------- X: (reduced) exp -- context RR: x * e -> (reduced) exp -- with node structure R: x * e -> (reduced) exp L: x * id -> exp -- the Lookup function C(R(x,e)) projects to contents Lab(R(x,e)) projects to labels RR(x, e) = append(x, R(x, e)) RR({item*}, e) = { item* R({item*}, e) } R(x, Nil) = Nil R(x, item item*) = R(RR(x, item), item*) R(x, label) = label R(x, {item*}) = {R(x, item*)} R(x, id "_" rhs) = id_R(x, rhs) R(x, id.name "_" rhs) = id_{inner(id%) name_R(x, rhs)} R(x, literal) = literal R(x, id) = R(x, L(x, id)) R(x, name "%") = name% L(Nil, id) = Universal(id) L({item* id' "_" rhs}, id) = IF id=id' THEN rhs ELSE L({item*}, id) -- eval ids!! L({item* nonbindingItem}, id) = L({item*}, id) L(x, id.name) = L(L(x, id), name) L({item* name' "%"}, name) = L({item* R(item*, name')}, name) C, Lab similarly ------------------- I think we forgot to make L(x, id) or R(x, id) evaluate ids whose values are quoted expressions. *start* 02343 00024 USt Date: 20-May-82 16:03:15 PDT (Thursday) From: Ayers.PA Subject: Meeting at Bayhill 100G at 9:00 21 May To: InterDoc Jim Mitchell and Jim Horning have been drafted for a two-three week project in CSL/Cedar. I suggest that at this week's meeting we (whoever shows) try to work out how our various prototype implementations (820, Tajo, Star) are going to be developped and how they can lean on one another for parsers etc. Notes from the meeting of 14 May: We discussed the "document model" as it has been advanced so far. In particular, we discussed the "extrinsic" vs "intrinsic" characteristics of text. The model saparates these by making the extrinsic (i.e. the metrics of the text's container) things by defined in a (say) "LAYOUTSTYLE$" node while the intrinsic (e.g. font) things remain in the (say) "PARA$". The two nodes can be connected by both being children of a new node, say "MARKUP$" Several people had difficulties with this. Their main point was that the "extrinsic" things are in fact few: say the set width and possibly left margin of a paragraph. Initial-line-indent, for instance is not a geometrical metric because it depends on the point size of the font in use. The "document model"lers agreed to try baking this a little more. We also discussed what it means to "understand" a tag [Editor's note: this continues from an earlier discussion sparked by Horning -- several people thought it was fair to claim to "understand", say, a PARA$ without allowing the user to see or edit some attribute of a paragraph -- after all, if the editor correctly remembers the font, whether he lets the user see or change it is an editor issue. Horning pointed out that an editor, by that reasoning, could claim to understand most any node, just stating that it happens to display and allow edits to none of the attributes.]: To Understand a Tag: (Human Implementor) 1. Knows set of relevent attributes and contents 2. Knows all invariants that must be maintained (editor) 1. Is able to provide some rendition of node. 2. Allows insert/delete of direct subnode. To Implement a Tag: (Editor) 1. Understands tag 2. Able to render it. 3. Able to change it. To Fully Implement a Tag: (Editor) Implement all attributes and content. *start* 04976 00024 USt Date: 9 May 1982 12:34 am PDT (Sunday) From: Mitchell.PA To: Horning.PA,GCurry.ES,McGregor.PA Subject: Precis of Friday's meeting on Document Models Categories: Save cc: Interdoc.PA,Mitchell.PA Friday afternoon, Jim Horning, Gael Curry, and I met to talk about "document models". I think we made some progress in understanding a few things in that discussion, so here is a summary of it. Gael suggested that we consider the "multiple coverings" idea in more detail. We soon mapped this notion into the following idea: Consider the list C of actual contents of a document obtained by enumerating the leaves of its abstract tree in left-to-right order. Now for every kind of hierarchy that makes sense for a document, one can have a tree that represents that hierarchy and whose leaves are C (In Gael's terminology, each of these trees is a covering). We can use multi-colored brackets in place of the single kind used in Interdoc (or Lisp) to linearize (sorry, Brian) these multiple (chromatic?) trees with their common, shared leaf structure. Wonderful generality has been attained, so let's list the various hierarchies of a document to see how often we get to use it. Certainly those two old favorites Chapter-SectionParagraph (CSP - logical structure), and Containers-Lines (CL - geometric structure), are in the list. How about the various aspects of fonts (italic, bold, etc.) since the document is structured into regions (is covered by) by what is or is not italic, is or is not bold, etc? Well, this isn't such a good set of hierarchies for two reasons: (1) The hierarchies are all uninteresting (they each just describe contiguous regions and have no more depth), which does not allow us to use them for scoping and thereby gain some economy of expression. Besides, we can readily indicate such regions in current Interdoc without resorting to extra hierarchies, e.g. { <...> italic_T <...> italic_F <...> } (2) Moreover, what scoping we would like to have for font information has a strong tendency to follow the CSP hierarchy anyway for stylistic reasons. So, font attributes don't seem like good candidates for being hierarchies in their own rights. We generated one more candidate hierarchy, "Star-like fields", but decided that they, too, tended to follow the PSC hierarchy. Thus we were left with our original two hierarchies, PSC and CL. Then it was noticed that a good deal of the CL hierarchy actually follows the PSC hierarchy as well, e.g., boundaries for paragraphs, frames for figures. There still seems to be some geometry that doesn't follow the PSC structure, e.g., page margins. So we are now down to 1.5 interesting colors in our chromatic tree. Not much payoff for the complexity. Then we discussed the notion of pages acting like frames and some of the Star plans for having linked frames that text is formatted through (i.e., if there is text still to be displayed when the first frame is full, continue displaying in the second, etc.). We found we could describe this by having a node tagged as FRAMEDTEXT$ with an attribute that describes the frame structure and a list of nodes to be displayed using that structure; e.g., {FRAMEDTEXT$ frames _ ([frame | h_4 w_4] [frame | h_2 w_5] ... ) list-of-nodes-to-be-framed } This seemed like such a good idea that we thought we ought to use it to describe the entire document if it is to be viewed as having some page structure; e.g., {PAGEDTEXT$ pageLayout _ ([page | ...] [page | ...] ... ) list-of-nodes-to-be-framed } This model doesn't attempt to say where the page breaks are (just like specifying the margins of a paragraph doesn't say where the actual line breaks are), but we could probably handle that using links to say where all the page breaks are; e.g., {PAGEDTEXT$ pb@! -- declare the link class pageLayout _ ([page | ...] [page | ...] ... ) {PAGEBREAKS$ pb@} -- contents is set of places with label pb! {<...> pb! <...>} ... } } I am not sure that Gael and Jim H. agree, but I believe that the above says that the notion of chromatic trees is not necessary or desirable for Interdoc and also shows how to resolve the apparent conflict between the CSP and CL hierarchies. This insight about document hierarchies led us to look harder at the uses for persistent bindings (":=") in scripts, vis., chapter, section, page, figure, etc. numbers and page headings. We now believe that we can make the rule that all persistent variables can be global to a document, which is a great simplyifying assumption. There is no need to declare persistent variables, the only difference between them and local variables is that one uses ":=" instead of "_" when assigning them. No local variable can have the same name as a persistent variable (since they are essentially global), and persistent variables disappear from environment values (records), where they have always behaved a little peculiarly. Jim M. *start* 01694 00024 USt Date: 10-May-82 10:55:22 PDT (Monday) From: Ayers.PA To: Mitchell.PA Subject: Re: Precis of Friday's meeting on Document Models Categories: Save In-Reply-To: Mitchell's message of 9 May 1982 12:34 am PDT (Sunday) cc: Horning.PA,GCurry.ES,McGregor.PA,Interdoc.PA I find the FRAMEDTEXT and PAGEDTEXT notions to be very intriguing. The parallel drawn by "This model doesn't attempt to say where the page breaks are (just like specifying the margins of a paragraph doesn't say where the actual line breaks are)" is especially provocative. I hope we hear more about this at the next meeting. A note on the page break/line break parallel drawn in the above quote: The cases indeed seem very similar. Upon reflection, I see one difference that I would like to hear discussed. Case one: a galley of lines-in-paragraphs case two: a set of nodes-in-pages. In the first case, if you change margins, you do so at a paragraph end, and there is partial line in the old style before the typography switches to the new margins. That is, the line-length switch does not occur at an arbitrary place in the text, but at a particular stylized place. In the second case, however, the text is broken at an arbitrary line, with no (user defined) typographic stylization marking the switch. Put another way: If you have the Curry text-flows-between-linked-blocks view, in the lines-within-pargraph-galley case charcters do NOT in fact flow from one line to another line OF DIFFERENT FORMATTING CARCTERISTICS. But in the second case, characters DO flow from one page/block to another page/block of different formatting characteristics. Is this note germane? Bob *start* 02128 00024 USt Date: 11 May 1982 12:48 pm PDT (Tuesday) From: Horning.PA To: Mitchell.PA,Ayers.PA Subject: Re: Precis of Friday's meeting on Document Models Categories: Save In-Reply-To: Mitchell's and Ayers' messages of 10-May-82 (Monday) cc: Interdoc.PA I think Jim's message captured the essence of the discussion (stripped of the interesting but irrelevant side-issues). The thing that suprised me most was his placement of the geometric structure in the environment. It seems to me more natural to let FRAMEDTEXT$ and PAGEDTEXT$ each have two content nodes, one for each contained hierarchy. The basic insight seemed to be that it is better to place these hierarchies side-by-side (perhaps with a few links) than to attempt superimposition by intermingling. Then an editor that knows about only one can edit it freely. (One can easily imagine separate editors for page layout and for galley preparation.) Of course, within any style that forces certain correspondences between the two hierarchies (e.g., each chapter starts on a new page), the parallel structures need only be carried up to the known common level. (E.g., each CHAPTER$ node has a geometric and a content node. But a BOOK$ node has only a sequence of CHAPTER$ nodes.) Since links point to nodes, it may be necessary to create dummy nodes to be the target of the links that tie the two structures together. We should be careful not to confuse "clues" that avoid the need for recomputation with user-specified information ("force a page break here"). I'm not sure I completely understood Bob's question. The widows-and-orphans issue indicates that placement of nodes in frames should not be completely arbitrary, while a paragraph that runs from one page to another may change line-lengths in the process. Surely the point in both cases is that individual editors may have varying levels of aspiration, not that Interscript should standardize these things. By more explicitly decoupling the geometric and content information in a document, we give high-capability editors considerable freedom in arranging the most satisfactory merger. Jim H. *start* 16832 00024 USt Date: 12 May 1982 4:55 pm PDT (Wednesday) From: Mitchell.PA To: Interdoc.PA Subject: Interdoc syntax and semantics Categories: Save Reply-To: Mitchell.PA ------------------- The syntax has been brought up to date and the semantics have been updated to add some new features (1) the meaning of a tag (formerly mark) includes evaluating the tag name as well so that its default bindings can be obtained simply by writing the tag (e.g., PARA$ places the tag PARA on the node and evaluates PARA%). (2) the notion of a scope (the "unit" that owns an environment) has been added. ------------------- GRAMMAR script ::= versionId node versionID ::= "Interscript/Interchange/1.0 " content ::= term | node term ::= primary | primary op term op ::= "+" | "-" | "*" | "/" primary ::= literal | invocation | indirection | application | selection | vector literal ::= Boolean | integer | intSequence | real | string | universal universal ::= ucID ( "." ucID )* name ::= id ( "." id )* invocation ::= name indirection ::= name "%" application ::= ( name | universal ) "[" scope* "]" selection ::= "(" term "|" item* "|" item* ")" vector ::= "(" scope* ")" node ::= "{" nodeItem* "}" nodeItem ::= label | scope scope ::= binding* content content* binding ::= name mode rhs mode ::= "_" | "=" | ":=" rhs ::= content | op term | "'" item* "'" | "[" [ primary ] "|" binding* "]" item ::= label | binding | content label ::= tag | link tag ::= universal "$" link ::= id "@!" | name "@" | name "!" NOTATION FOR ENVIRONMENTS Environments bind identifiers to expressions, in various modes ("=", ":", ":=", "_"): Null denotes the "empty" environment [E | id m e] means "E with id mode m bound to e" locBinding(id, E) denotes the binding mode of id in E locBinding(id, Null) = None locBinding(id, [E | id' m e]) = if id=id' then m else locBinding(id, E) locVal(id, E) denotes the value locally bound to id in E locVal(id, Null) = Nil = "" locVal(id, [E | id' m e]) = if id=id' then e else locVal(id, E) SEMANTIC FUNCTIONS R and B are intended to propagate the effects of the environment into an expression. R: expression, environment --> expression -- Reduction R is used for evaluating right-hand sides: identifiers, expressions, etc. B: expression, environment --> environment -- Bindings B indicates the effect a binding has on an environment. B and R are mutually recursive functions (e.g., the evaluation of an expression may cause some bindings to occur as well) The following five functions all apply to expressions independent of environment and are intended to be used on the result of reducing an expression in an environment. C: expression --> expression -- Contents C is basically used to indicate which evaluated expressions become part of the content of a node The following four semantic functions occur less frequently in any substantive way in the semantics below. You might wish to skip them until they occur in a nontrivial manner in the semantics. T: expression --> expression -- Tags T indicates when an identifier is to be included in the tag set for a node L: expression --> expression -- Links L indicates link declarations LF: expression --> expression -- Links From LF indicates a link to the set of nodes having associated target links LT: expression --> expression -- Links To LT indicates that the node is to be included in the target set of all the names which are prefixes of the name to which the expression should evaluate PRESENTATION BY FEATURE [E is used to represent the value of the environment in which the feature occurs.] script ::= versionId node R = C = R(EXTERNAL) B=EXTERNAL T = L = LF = LT = Nil -- a script is evaluated in the pre-existing EXTERNAL environment common to all Interscript/Interchange/1.0 scripts term ::= primary op term op ::= "+" | "-" | "*" | "/" R = C = R(E) op R(E) B = E T = L = LF = LT = Nil -- Both the primary and the term must reduce to numbers; the arithmetic operators are evaluated right-to-left (a la APL, without precedence) and bind less tightly than application. primary ::= literal literal ::= Boolean | integer | intSequence | real | string | universal R = C = literal B = E T = L = LF = LT = Nil -- The basic contents of a document. universal ::= ucID R = C = ucID B = E T = L = LF = LT = Nil -- universals (all upper case) are presumed to be directly meaningful, and are not looked up in the environment. universal ::= universal "." ucID R = C = universal "." ucID B = E T = L = LF = LT = Nil -- a qualified universal also just stands for itself invocation ::= id R = R(E) B = B(E) where valOf(id, E) = CASE whereBound(id, E) = Null => MakeUniversal(id) whereBound(id, E) = Nil => Nil True => locVal(id, whereBound(id, E)) and whereBound(id, E) = CASE -- Gets innermost binding locBinding(id, E) ~= None => E locBinding("Outer", E) ~= None => whereBound(id, locVal("Outer", E)) E=EXTERNAL => Null True => Nil -- Makeuniversal(id) produces the universal corresponding to id (in the current version its uppercase equivalent) -- Both attributes and definitions are looked up in the current environment; depending on the current binding of id, this may produce values and/or bindings; if the binding's rhs was quoted, the expression is evaluated at the point of invocation. -- When id is referred to and locBinding(id, E)=None, then the value is sought recursively in locVal("Outer", E). The outermost environment, EXTERNAL, binds each id to an universal which is the uppercase version of the id. Otherwise, the value of the id is assumed to be Nil invocation ::= name "." id R = R(E))>(E) B = B(E))>(E) -- Qualified names are treated as "nested" environments. indirection ::= name "%" R = R(E))>(E) B = B(E))>(E) -- Indirection combines the facility for invocation plus recording the fact that the expansion resulted from evaluating a particular name (recording the indirection is not yet included in these semantics). application ::= name "[" scope* "]" R = apply(name, R(E), E) B = E where apply(name, value*, E) = CASE R(E) OF "EQUAL" => value1 = value2 "GREATER"=> value1 > value2 . . . "SUBSCRIPT"=> value1[value2] -- value1: sequence, value2: int "CONTENTS"=> "(" C ")" "TAGS" => "(" T ")" -- ?? this doesn't seem right "LINKS" => "(" L ")" "SOURCES" => "(" LF ")" "TARGETS"=> "(" LT ")" ELSE => R([[Null | "Outer" "=" E] | "Value" "=" value*]) and where inner("{" value* "}") = value* -- If the name does not evaluate to one of the standard external function names, the current environment is augmented with a binding of the value of the argument list to the identifier Value, and the value is the result of the invocation in that environment; this allows function definition within the language. selection ::= "(" term "|" nodeItem1* "|" nodeItem2* ")" R = if R(E) then R(E) else R(E) B = if R(E) then B(E) else B(E) -- The notation for selections (conditionals) is borrowed from Algol 68: ( | | ) This is consistent with our principles of using balanced brackets for compound constructions and avoiding syntactically reserved words; the true part and false part may each contain an arbitrary number of nodeItems (including none). vector ::= "(" scope* ")" R = C = "(" R(E) ")" B = B(E) T = L = LF = LT = Nil -- Parentheses group a sequence of values as a single, vector value; bindings in the sequence of scopes affect the environment of scopes to the right in the containing node, but labels are disallowed. Parentheses may also be used to override the right-to-left evaluation of arithmetic operators; an operand sequence must reduce to a single numeric value. node ::= "{" nodeItem* "}" R = C = "{" R<"Sub$" nodeItem*>([Null | "Outer" "=" E]) "}" B = locVal("Outer", (B<"Sub" nodeItem*>([Null | "Outer" "=" E]))) T = L = LF = LT = Nil -- Nodes have nested environments and affect the containing environment only through global (:=) bindings. The nodeItems of a node are implicitly prefixed with the id Sub, which may be bound to any information intended to be common to all subnodes in a scope. nodeItem* ::= "" R = C = T = L = LF = LT = Nil B = E -- The empty sequence of items has no value and no effect; this is the basis for the following recursive definition. nodeItem* ::= binding* content1 content* R = R(R(B(E)) B = B(B(B(E)) C = C(C(B(E)) For F in {T, L, LF, LT}: F = F F nodeItem* ::= label nodeItem* R = R(E) R(B(E)) B = B(B(E)) C = Nil For F in {T, L, LF, LT}: F = F F -- In general, the value of a sequence of nodeItems is just the sequence of nodeItem values; binding items affect the environment of items to their right; Nil does not change the length of a result sequence. nodeItem* ::= scope nodeItem* For F in {R, B, C, T, L, LF, LT} F = F(E) F(B(E)) For F in {C, T, L, LF, LT}: F = F F -- In general, the value of a sequence of nodeItems is just the sequence of nodeItem values; binding items affect the environment of items to their right; Nil does not change the length of a result sequence. item* ::= "" R = C = T = L = LF = LT = Nil B = E -- The empty sequence of items has no value and no effect; this is the basis for the following recursive definition. item* ::= item1 item* R = R(E) R(B(E)) B = B(B(E)) For F in {C, T, L, LF, LT}: F = F F -- In general, the value of a sequence of items is just the sequence of item values; binding items affect the environment of items to their right; Nil does not change the length of a result sequence. binding ::= name mode rhs -- how can we change this to create micro-scopes?? R = Nil B = bind(name, mode, R(E), E) where bind(id, mode, value, E) = CASE bindingOf(id, E) = "=" => E -- Can't rebind constants mode = ":=" => assign(id, value, E) True => [E | id mode value] bind(id "." name, mode, value, E) = [E | id bindingOf(id, E) bind(name, mode, value, valOf(id, E))] bindingOf(id, E) = locBinding(id, whereBound(id, E)) assign(id, value, E) = CASE locBinding(id, E) = ":" => [E | id ":" value] bindingOf(id, E) = ":" => bind("Outer." id, ":=", value, E) True => E -- Can only assign to vars -- This adds a single binding to E; bindings have no other "side effects" and no value. -- Each environment, E, initially contains only its "inherited" environment (bound to the id Outer). Most bindings take place directly in E. To allow for "persistent" bindings, the value of a bind(id, ":=", val, E) will change E by rebinding id in the "innermost" environment (following the chain of Outers) in which it is bound, if that binding has the binding ":" (Var). Identifiers bound with binding "=" (Const) may not be rebound in inner environments. binding ::= name mode op term R = Nil B = bind(name, mode, R(E), E) -- This is just a convenient piece of syntactic sugar for the common case of updating a binding. rhs ::= "'" item* "'" R = item* -- If the rhs of a binding is surrounded by single quotes, it will be evaluated in the environments where the name is invoked, rather than the environment in which the binding is made. rhs ::= "[|" binding* "]" R = [B([Null | "Outer" "=" E]) | "Outer" "=" Null] -- This creates a new environment value that may be used much like a record. rhs ::= "[" [ item* ] "|" binding* "]" R =[B([R(E) | "Outer" "=" E]) | "Outer" "=" Null] -- This creates a new environment value that is an extension of an existing one. tag ::= universal "$" R = R<"default" "." universal "%">(E) B = B<"default" "." universal "%">(E) C = C<"default" "." universal "%">(E) T = universal L = LF = LT = Nil -- This gives the containing node the property denoted by the tag named by the universal and also evaluates the indirection "default.universal%". link ::= id "@!" R = id "@!" B = E L = id C = T = LF = LT = Nil -- This defines the scope of the set of links whose "main" component is id. -- A label N! on a node makes that node a "target" of the link N (and its prefixes); a label N@ makes it a "source." The "main" identifier of a link must be declared (using id@!) at the root of a subtree containing all its sources and targets. The link represents a set of directed arcs, one from each of its sources to each of its targets. Multiple target labels make a node the target of multiple links. A target label that appears only on a single node places it in a singleton set, i.e., identifies it uniquely. link ::= name "@" R = name "@" -- ?? why isn't R=Nil? B = E LF = name C = T = L = LT = Nil -- This identifies the containing node as a "source" of the link name. link ::= name "!" R = name "!" -- ?? why isn't R=Nil? B = E LT = prefixes(name) C = T = L = LF = Nil where prefixes(id) = id prefixes(name "." id) = name "." id prefixes(name) -- This identifies the containing node as a "target" of each of the links that is a prefix of name. NOTES Each environment, E, initially contains only its "inherited" environment (bound to the id Outer). Most bindings take place directly in E. To allow for "persistent" bindings, the value of a bind(id, ":=", val, E) will change E by rebinding id in the "innermost" environment (following the chain of Outers) in which it is bound, if that binding has the binding ":" (Var). Identifiers bound with binding "=" (Const) may not be rebound in inner environments. If the rhs of a binding is surrounded by single quotes, it will be evaluated in the environments where the name is invoked, rather than the environment in which the binding is made. When an id is referred to and locBinding(id, E)=None, then the value is sought recursively in locVal("Outer"). The (implicit) "outermost" environment binds each id to the "universal" name formed by using the uppercase version of each character of id. Nodes are delimited by brackets. The contents of each node are implicitly prefixed by Sub, which will generally be bound in the containing environment to a quoted expression performing some bindings, and perhaps supplying some labels (tags and links). Parentheses are used to delimit sequence values. Square brackets are used to delimit the argument list of an operator application and to denote environment constructors, which behave much like records. Expressions involving the four infix ops (+, -, *, /) are evaluated right-to-left (a la APL); since we expect expressions to be short, we have not imposed precedence rules. The notation for selections (conditionals) is borrowed from Algol 68: ( | | ) This is consistent with our principles of using balanced brackets for compound constructions and avoiding syntactically reserved words; the true part and false part may each contain an arbitrary number of items (including none). A label N! on a node makes that node a "target" of the link N (and its prefixes); a label N@ makes it a "source." The "main" identifier of a link must be declared (using id@!) at the root of a subtree containing all its sources and targets. The link represents a set of directed arcs, one from each of its sources to each of its targets. Multiple target labels make a node the target of multiple links. A target label that appears only on a single node places it in a singleton set, i.e., identifies it uniquely. ------------------- *start* 01154 00024 USt Date: 14 May 1982 2:58 pm PDT (Friday) From: Mitchell.PA To: Horning.PA Subject: semantics of Interscript without environments Categories: Save cc: Mitchell.PA Here are the contents of my whiteboard. Comments and questions follow them. ------------------- X: (reduced) exp -- context RR: x * e -> (reduced) exp -- with node structure R: x * e -> (reduced) exp L: x * id -> exp -- the Lookup function C(R(x,e)) projects to contents Lab(R(x,e)) projects to labels RR(x, e) = append(x, R(x, e)) RR({item*}, e) = { item* R({item*}, e) } R(x, Nil) = Nil R(x, item item*) = R(RR(x, item), item*) R(x, label) = label R(x, "'" item "'") = item R(x, {item*}) = {R(x, item*)} R(x, id "_" rhs) = id_R(x, rhs) R(x, id.name "_" rhs) = id_{inner(id%) name_R(x, rhs)} R(x, literal) = literal R(x, id) = R(x, L(x, id)) R(x, name "%") = name% L(Nil, id) = Universal(id) L({item* id' "_" rhs}, id) = IF id=id' THEN rhs ELSE L({item*}, id) -- eval ids!! L({item* nonbindingItem}, id) = L({item*}, id) L(x, id.name) = L(L(x, id), name) L({item* name' "%"}, name) = L({item* R(item*, name')}, name) C, Lab similarly ------------------- *start* 01322 00024 USt Date: 14-May-82 17:40:56 PDT From: Horning.PA To: Mitchell.PA Subject: Re: semantics of Interscript without environments Categories: Save cc: Horning.PA I think that we want to call R and L something else; maybe Normalize and Value? We can drop the curlies around a context. There is no consistency in quoting terminals. X: (normalized) exp -- context NC: x, e -> (reduced) item* -- the Normalization into context function N: x, e -> (reduced) exp -- the Normalization function V: x, id -> exp -- the Lookup function C(N(x, e)) projects to contents L(N(x, e)) projects to labels NC(item*, e) = item* N(item*, e) N(x, Nil) = Nil N(x, item item*) = N(NC(x, item), item*) N(x, label) = label N(x, {item*}) = {N(x, item*)} N(x, id "_" e) = id "_" N(x, e) N(x, id "_" "'" item* "'") = id "_" item* N(x, id.name "_" rhs) = id "_" {inner(id "%") N(x, name "_" rhs)} N(x, literal) = literal N(x, id) = N(x, V(x, id)) N(x, name "%") = name "%" V(Nil, id) = Universal(id) V(item* id' "_" rhs, id) = IF id=id' THEN rhs ELSE V(item*, id) -- eval ids!! V(item* {, id) = V(item*, id) -- here's where we go to the containing scope V(item* nonbindingItem, id) = V(item*, id) V(x, e.id) = V(V(x, e), id) V(item* name' "%", name) = V(item* N(item*, name'), name) C, L similarly *start* 02363 00024 USt Date: 20-May-82 16:03:15 PDT (Thursday) From: Ayers.PA To: Interdoc.PA Subject: Meeting at Bayhill 100G at 9:00 21 May Categories: Save Jim Mitchell and Jim Horning have been drafted for a two-three week project in CSL/Cedar. I suggest that at this week's meeting we (whoever shows) try to work out how our various prototype implementations (820, Tajo, Star) are going to be developped and how they can lean on one another for parsers etc. Notes from the meeting of 14 May: We discussed the "document model" as it has been advanced so far. In particular, we discussed the "extrinsic" vs "intrinsic" characteristics of text. The model saparates these by making the extrinsic (i.e. the metrics of the text's container) things by defined in a (say) "LAYOUTSTYLE$" node while the intrinsic (e.g. font) things remain in the (say) "PARA$". The two nodes can be connected by both being children of a new node, say "MARKUP$" Several people had difficulties with this. Their main point was that the "extrinsic" things are in fact few: say the set width and possibly left margin of a paragraph. Initial-line-indent, for instance is not a geometrical metric because it depends on the point size of the font in use. The "document model"lers agreed to try baking this a little more. We also discussed what it means to "understand" a tag [Editor's note: this continues from an earlier discussion sparked by Horning -- several people thought it was fair to claim to "understand", say, a PARA$ without allowing the user to see or edit some attribute of a paragraph -- after all, if the editor correctly remembers the font, whether he lets the user see or change it is an editor issue. Horning pointed out that an editor, by that reasoning, could claim to understand most any node, just stating that it happens to display and allow edits to none of the attributes.]: To Understand a Tag: (Human Implementor) 1. Knows set of relevent attributes and contents 2. Knows all invariants that must be maintained (editor) 1. Is able to provide some rendition of node. 2. Allows insert/delete of direct subnode. To Implement a Tag: (Editor) 1. Understands tag 2. Able to render it. 3. Able to change it. To Fully Implement a Tag: (Editor) Implement all attributes and content. *start* 01302 00024 USt Date: 26-May-82 11:15:26 PDT (Wednesday) From: gcurry.ES To: Mitchell.PA Subject: Document Modeling Categories: Save cc: Horning.PA,McGregor.PA,GCurry.ES Jim, Shall we agree to postpone the document modeling meetings until your Cedar commitments dwindle a bit? Or do you have enough time to continue the Friday meetings? I can continue to work on the Star Document Model (document) until you are less busy; that would be one data point. An interesting sidelight of last week's InterDoc meeting seemed to be an agreement to consider chained text in InterScript layer 2 (Bob may remember differently). There is a large class of documents (namely, "running" forms) which are more graphics than text, in which text flows from one layout area to another, and which are important commercial applications for Star. I am also thinking back to the labeled coverings model of text and remember your assertion that that model was not as powerful as the one proposed by InterScript. By that did you mean that, for example, it is awkward to represent multiple levels of section without hierarchy in the (labeled coverings) model? I am not resurrecting that argument again; I am only considering adopting some form of that model in a standard string format to be proposed. Gael