*start* 00888 00024 USt Date: 19 April 1982 2:11 pm PST (Monday) From: Horning.pa Subject: Here is the question To: Mitchell --------------------------- Date: 19-Apr-82 11:24:51 PST (Monday) From: Ayers.pa Subject: Query on InterDoc Meeting of 16 April .. To: Reid, Karlton, GCurry.es, Horning cc: Ayers In last Friday's meeting, we were discussing "italic _ TRUE" and the law that you can code in assembler in any language, and we wrote on the board, according to my notes, a suggested script fragment; {emphasis$ } < stuff> I claim that we didn't mean to write the "$" and the example should be {emphasis } < stuff> where "emphasis" is an indirection to a style that does a local binding to "italic" and the node serves to "pop" the binding after the emphatic word. Bob ------------------------------------------------------------ *start* 00470 00024 USt Date: 19 April 1982 2:10 pm PST (Monday) From: Horning.pa Subject: Re: Query on InterDoc Meeting of 16 April .. In-reply-to: Ayers' message of 19-Apr-82 11:24:51 PST (Monday) To: Ayers cc: Reid, Karlton, GCurry.es, Horning, Mitchell Bob, What you suggest is probably better (but you need emphasis% for the indirection). What we discussed could also work, but would mean adding an "emphasis" node type, rather than merely an attribute. Jim H. *start* 04834 00024 USt Date: 12-Mar-82 14:04:03 PST (Friday) From: gcurry.es Subject: InterDoc To: InterDoc.pa Reply-To: gcurry.es cc: , gcurry.es Some general comments on the meeting this morning. I hope not to have misrepresented anyone seriously. On "Dynamic Extension" - a rephrasing of Ayer's suggestion. Every InterDoc consumer must understand the syntax completely; we expect the syntax never to change. Every consumer must also understand the semantics of the base language completely; we expect the semantics of the base language never to change. In the published, level 2 standard, one will find a list of marks and, for each mark, a list of its relevant attributes; we expect the list of marks to grow slowly and expect the list of attributes for some marks to grow slowly also. Each list will probably want to grow more quickly than it is politically possible for it to do so. An InterDoc consumer will be able to recognize any mark, even if it is not in the published standard, since it is syntactically distinguished (by $). However, he cannot easily distinguish attributes which are relevant to a mark from arbitrary identifiers. If he could recognize such attributes, we would have better extensibility at level 2, since two parties could informally "extend" the standard without worrying that another party would discard information they considered to be important. One mechanism for recognizing extra attributes of a mark is to list them somewhere in the script. Another is to establish a syntactic convention which allowed a consumer to detect whether an identifier was a relevant attribute of some mark, and if so, the mark to which it was relevant. I think Ayer's suggestion is a good one and we should pick one way of doing it. On Naming Relevant Attributes so Parent Mark is Derivable from the Name This assumes there is a single parent mark for each relevant attribute. As Mitchell points out, it may be desirable for the same attribute to be relevant to different marks, as in the case of margins. It might be nice for a paragraph$ to be able to modify the same "margin" attributes that the column$ or page$ did. If this is so, then the relation between relevant attributes and marks isn't functional. However it is also possible for a nested context, such as paragraph$, to see STRUCTURED margins, so that effectiveMargin := column.Margin + paragraph.Margin, rather than margin = margin := 10. I don't know if it is a good idea to PROHIBIT an attribute from being relevant to several different marks - probably not. On InterDoc's Ability to Represent Editing Rules which are a Function of Presentation. This is a recap of the three points I made during the meeting; any of the three is arguable. Assertion #1. It is desirable for InterDoc to be able to encode behavior/editing rules. Assertion #2. Reasonable editing rules exist which modify content as a function of PRESENTATION (e.g., hyphenation, when hyphen is seen as a character rather than a presentation artifact). The set of such rules is open ended. Assertion #3. InterDoc has no way of representing such rules except by naming them and enumerating the names in the standard. Mitchell's position is that it is not clear that a standard with the full capability can exist; he says we should bound the problem definition to include some representative and relevant set of editors (e.g., 820, 860, Star, Scribe). He believes that the set of such rules for that set of editors is not open ended. I believe that and am inclined to agree. However, I also believe that editors like Star will find it natural to begin to define editing rules which take account of presentation, especially since the user's primary perception of the document is through its presentation rather than its representation. Suppose CUSP, the Star programming language, had text manipulation primitives like the function "the mth character in the nth line" or the predicate "the line l has at least x points of white space left". Suppose also that CUSP programs can be invoked automatically when the user points at a character (in order to apply that program to that character). If such a program is something which can be associated with a given document and which LOCALLY modify the behavior of the Star editor, then it is desirable (believing assertions #1 and #2 above) to be able to represent those programs in InterDoc. Star does not have such a capability yet, but it is not inconceivable; it is probably unlikely. Something which makes it more unlikely is the acceptance of an InterDoc standard which is adequate now, but which cannot extend to this sort of document object. Acceptance of InterDoc as it stands might imply a willingness to prohibit this sort of document object from Star documents for all time. Gael *start* 02866 00024 USt Date: 18-Mar-82 20:52:41 PST (Thursday) From: Ayers.PA Subject: InterDoc Meeting 9:00 Friday 100G Bayhill To: InterDoc Reply-To: Ayers We will start when two people show up and plan on collecting the non-early-worms as we go along. My notes from the meeting of 12 March: There was some discussion on the form of the TWG's standards document output, sparked by a query from Reid. A discussion on "How does the editor get the "5" in "see figure 5" (the real five, we know about the @!s)" began a discussion which lasted most of the meeting. Mitchell discussed the "HIDDEN$" mode and made the important point that the actions of producing the "5" can be bound within the editor-implemented semantics of a node of type, e.g., FIGREF$ Jim also pointed out something that we might have missed from the language: a "Foo!" containing node is in the dominent hierarchy. NOT, e.g., a _ {foo!}. Brian Reid brought up Scribe's ways of dealing with (that is handling the editor-implemented semantics of) a node like FIGREF$. The TWG really didn't want to talk about implementations [editors interpretation], but this was a good point at which to discuss general styles and how you express things. The basic question is this: Does the InterDoc language and semantics allow you to describe, within the language, "all" the neat things you would like your editor to do? Consider the FIGREF$ example from above. We could try to write an InterDoc fragment that specifically went out and captured some global (bumpable) variable, converted it to text, and put in into a TEXT$ node as content. Or we could say that such an action was part of the semantics of the FIGREF$ node. [The editor thinks that this is an important question.] People can keep bringing in neat things to express -- Brian brought in "This margin is 8 points unless the list reaches two-digit numbers, else 10 points" Treating these as NEW semantic entities has the problem of "creeping grandiosity" [I don't recall whose phrase that is] .. we keep inventing new node marks to handle special cases. But the alternative is to show how to do the wierd things within the base language -- difficult. [The editor has been trying to separate, in his head, "presentation" attributes from other attributes, hoping to claim that we can leave the presentation attributes (advice to the layout person) open ended without doing violence to the standard ..] On node deletion. You can delete a node you don't understand; if it contains global assignment, there may be side effects of the delete; this is an editor issue. Do we try to keep attributes grouped by putting them in records? E.g. p.prelead, p.postlead, c.italic, c.bold? There are pros and cons. Mitchell: can we say that all marks preceed all content within a node? Can we go further in that direction? Bob *start* 01756 00024 USt Date: 25-Mar-82 19:18:39 PST (Thursday) From: Ayers.PA Subject: InterDoc Meeting Friday 26 March Bayhill 100G To: InterDoc .. or a little later depending on when people show up .. I have a wine tasting tonight .. The meeting of 19 March was mainly about what made an editor a "correct" InterDoc editor, and about rendering. Those two items came together because Phil ofered the strawman proposal: if two InterDoc editors understand a given script, and if they both implement the "print" command (say converting to Interpress), then the printed output will be the same. After talking about this, the members agreed that the INTERDOC STANDARD IS SILENT ON RENDERING and that therefore the above is not a legit (or worthwhile) test. Rather, being a "correct" InterDoc editor is based solely on comparing incoming and outgoing InterDoc scripts. Consider an editor which has a "widen letters on crt for visibility" command. If you hit this command, the letters appear to get wider. Maybe this characteristic lasts if the editor outputs a script and then reads it back; maybe not. If it stays, then its (clearly) in the InterDoc script. But any editor is free to implement the command either way -- it just should say what it does in its user manual. Other items: Jean-Marie is the curator of marks and attributes. Phil pointed out that, in the node {WELLKNOWN$ PRIVATE$ ..... }, the editor who created the node might like to hide some info. How? He cannot just say ... myinfo1 _ 123 myinfo2 _ 456 becuase we agreed (earlier) that he cannot depend on the identifier name spellings. He could say {PRIVATE2$ 123 456} inside, but that's a little ugly. Do we want to revisit attaching semantics to identifier spellings? Bob *start* 02033 00024 USt Date: 1-Apr-82 19:15:50 PST (Thursday) From: Ayers.pa Subject: InterDoc Meeting Friday 2 April 9:00 Bayhill 100G To: InterDoc.pa Notes from the meeting of 26 March: [There were only three people at this meeting, so we didn't persue things very far] Phil Karlton asked for an agenda item "soon" on deciding whether a putative InterDoc accepting editor is a "real" "legit" one. Ayers feels that this item is a "biggie." Jean-Marie will collect marks and attributes from Star, 860, and Scribe, as well as inputs from the committee members. Phil worried about piecemeal document preparation. Since we do not allow @!s to be unresolved, we can't split a book into chapters and independently edit each chapter. Suggested issues for Horning and PARC researchers to worry: 1. Conformability test on a pair of scripts .. did the editor emit the "same" script it was passed? 2. Proper-ness of an editor's script alterations. [Editors note: the latter is a big tar bucket, because you end up discussing editing. But it probably needs to be addresed, because we all "know" that the editor that always emits the NIL script if its user made an edit isn't "really" an interdoc editor, even though it always outputs a legit script.] Discussed again the "privateinfo _ 17" issue. A private node can't say that, we claimed, to remember information, because we could re-name all the attributes. We don't like the alternatives: {PARA$ {PRIV$ 17} ... } {PRIV$ 17 {PARA$ ... } } and would like to revisit the attribute-names-do-not-carry-semantic-info decision. We think tht there should be only one "margin" attribute and that a "paragraph margin" value should be represented by bumping the "margin e.g. {PARA$ margin _ margin + 10 ..... } On hyphenation and friends. Consider this claim: You need the expressive power of "hyphenStyle _ Knuth27" if and only if some editor has two or more ways of hyphenating (of course counting "i never do" as one way) and needs to remember that in the script. Bob *start* 01563 00024 USt Date: 14-Apr-82 16:24:26 PST (Wednesday) From: Ayers.pa Subject: Jean-Marie's Role in Collecting Text/Para Semantics To: delaBeaujardiere cc: Interdoc Jean-Marie, here's my conception of the role/task that we discussed yesterday. I'm putting it "on the table" so the Interdoc community can more easily cooperate, and so they can tell us that we're barkng up the wrong tree if they feel so ... I see, in the eventual standard, a chapter or chapters that describe the semantics of each node type. The flavor of the chapter is standard-ish, in that it tries to be precise and deals often in total enumerations rather than in descriptive phrases. A first cut at creating this chapter for the node marked TEXT$ and for the node marked PARA$ seems appropriate. We might find, in this chapter, wording like (illustrative only!) A node marked TEXT$ may contain assignmments and content. It may not contain imbedded nodes. Each piece of content must be of type STRING. Although arbitrary assignments may be made within the node, and may be inherited from outside the node, only the values of the bound variables in the following list affect the semantics of the TEXT$ node's content: italic: BOOLEAN -- indicates ..... It seems to me that such a chapter, prepared over the next few weeks (say between now and 3 May) with the inputs, advice, and examples of the other TWG members, would be a very valuable anchor in subsequent discussions and would also act to show us how settled we really are. Comments, everyone? Bob *start* 02276 00024 USt Date: 14-Apr-82 16:49:11 PST (Wednesday) From: Ayers.pa Subject: InterDoc Meeting 100G Bayhill Friday at 9:00 To: InterDoc Notes from the meeting of April second: Scott suggested that we might produce/obtain a parser that inputted a purported script and returned whether or not the script was syntactically valid. Phil asked for a parser a la JAM that would put things in dictionaries and make appropriate procedure calls .. We discussed the notion of "equivalent" Interdoc scripts. (In the context that if your editor accepts a script it has to output an 'equivalent' one.) Jim Horning felt that equivalence was reasonably decidable via the formal syntax; he agreed that it was harder to decide on what "equivalent except for the edits" might be. In the discussion on (formal) equivalence, we could see a political spectrum, with Jim on the "right wing" -- a narrow view of equivalence which is more restrictive but makes it easier to test for. [Categories from the editor] We then contrasted ABBREVIATIONS, which are not part of the semantic content, and (therefore) to not participate in equivalence -- they are an encoding trick, with INDIRECTION (e.g. to styles) which definitely IS part of the content of the script. After a good deal of talking about "accelerators" and "hints", via insights from Phil and Gael, we made a breakthrough as follows: An accelerator is always correct; if present, it is a speed-up. A hint may be correct; there exists a predicate which can be tested to see if it is correct; to be useful, the predicate is testable more easily than recalculating the hint. A guess may be wrong and there is no way to see if it is wrong short of recalculation. It is useful in Brian's Scribe example where the editor would like to note the likely value of a figure reference so that future renderers could output the guess if they would like to one-pass things. This all can be sumed up by considering each f the abouve a triple: Hint: {expression, value, predicate} Accelerator: {expression, value, TRUE} Guess: {expression, value, FALSE} where the above elements, of course, need not be explicitly stated but may be inferrable from a node-type or other semantics of the script. Bob *start* 00459 00024 USt Date: 15 April 1982 9:20 am PST (Thursday) From: Horning.pa Subject: Re: Jean-Marie's Role in Collecting Text/Para Semantics In-reply-to: Ayers' message of 14-Apr-82 16:24:26 PST (Wednesday) To: Ayers, delaBeaujardiere cc: Interdoc Bob, Jean-Marie, Sounds good to me. That fits very well with my model of what belongs in layers 2 and 3 of the standard, and seems like a reasonable way to develop a first cut for discussion. Jim H. *start* 01965 00024 USt Date: 22-Apr-82 19:50:37 PST (Thursday) From: Ayers.PA Subject: Meeting 9:00 Friday 23 April at 100G Bayhill To: InterDoc An agenda item: I would like to hear a meta-discussion on nodes and attributes -- how should we GO ABOUT THE JOB of codifying the semantics of TEXT$ and PARA$ etc. Notes from the previous meeting: Some more on " .. privateIdentifier _ 17 .." Basically no one has a strong opinion on this point. Maybe we'll take a vote. I asked whether it would be useful to have a distinguished syntactic arrangement which would certify that the node did not have any effects that lasted past its right curly -- i.e. it made no global assingments. The consensus was that this is unnecessary. Phil and Brian believe that merging documents / pasting documents together IS an InterDoc issue. Phil suggested a sub-group to look into this. Volunteers? Brian raised the issue of what if placement algorithms in an editor do not converge, or converge do different results depending on the initial 'guess'. We viewed this as an editor problem, with resolution by the editor -- and InterDoc remains silent because it is a rendering issue. Some discussion of script equivalence. We-all basically agree with Jim Horning's "right wing" statements about strict equivalence. Brian touched off a discussion on DOING DOCUMENTATION e.g. what the Mesans have been (frustratedly) working on the past few weeks. Your editor rules this an "editor issue" and does not report the discussions/flames. We discussed the example "italic _ TRUE" and the law that you can code in assembler in any language, and we wrote on the board, according to my notes, a suggested script fragment; {emphasis$ } < stuff> What we meant was {emphasis% } < stuff> where "emphasis" is an indirection to a style that does a local binding to "italic" and the node serves to "pop" the binding after the emphatic word. *start* 07905 00024 USt Date: 26-Apr-82 15:55:33 PDT (Monday) From: gcurry.ES Subject: Document Model - framing To: Mitchell.pa, McGregor.pa cc: Stepak.pa, Horning.pa, Ayers.pa To get things going, let me relate some of the argument from last Friday. This is my perception, so speak up if you saw things differently. Since the InterDoc language issues seemed to be pretty much resolved, or at least no longer controversial to the group, it appeared to be time to look at the particular semantics for the standard. It was agreed that level two of the standard could look like a list of marks and attributes and a description of their meanings. Bob wondered about how the semantics of the marks would be specified. We seemed to be having some difficulty generating an initial list, however. Jean-Marie distributed a document containing a list of properties of parts of documents. The document included properties of many layout objects (such as page, spread, sheet, margin, column, tab, etc.). There was some surprise that an object like "sheet", with associated attributes "width" and "height" were in the list at all, since "InterDoc is silent on rendering". It became clear before long that we must be careful what we mean by this (catch)phrase. WHAT IT DOES MEAN: DoesMean.1. The appearance of an InterDoc script, when rendered by editor E1, bears no relation to the appearance of the same script when rendered by editor E2 (E1 and E2 both InterDoc-conforming editors). An editor can render an InterDoc script anyway it wishes. WHAT IT DOES NOT MEAN: DoesntMean.1. No rendering-related-information is represented in an InterDoc script. DoesntMean.2. If rendering-related-information is represented in an InterDoc script, then it is in a private encoding, particular to a specific editor (or class of editors). SOME DISCRIMINATING DIMENSIONS for "rendering-related-information" Discriminant.1. Independent vs. derived - Some rendering information is derivable from other components/properties of the document (e.g. columnWidth = f(pagewidth, nColumns), or columnWidth = f(style, content)) while other is not (pagewidth, style). Discriminant.2. Permanent vs. temporary - Some rendering information might be a property of the current edit session, and might not be preserved with ANY filed representation of the document (e.g., the color of the phosphor at a character, or line breaks (if fonts.widths may vary from environment to environment and are not filed)) while other information is preserved (laurel linebreaks). Discriminant.3. User-intended vs. system artifacts - Some rendering related aspect might be a DIRECT consequence of user actions (paragraph leading) whereas another might be an INDIRECT consequence (widows/orphans). Discriminant.4.Defn (where Defn is anyone's favorite definition of "content"). isContent vs. NOT isContent - Some people may wish to view certain rendering related aspects as "content" (e.g., font, tabs, columns, etc) where others won't. --********** A SCENARIO which meeting the criterion "InterDoc is silent on rendering" (Gedanken expt - you can skip this if you wish): The InterDoc committee adopts a construct for level 2 which permits the exact form (ala Interpress) of the document to be expressed. Each conforming editor, nevertheless, may display the document in its own way. In this scenario, ALL the rendering information is transmitted, but NONE of it is binding on an editor when it renders the document. This is similar to font substitution (without a warning) but extended to cover all other aspects of layout and rendering. This scenario is iikely to be controversial. I'll ask some questions myself: SCENARIO-Question.1. What is the status of embedded layout/rendering information in an InterDoc script? Some possibilities: A. It can be carried along by the script, but is irrelevant to the "meaning" of the script. ( we noted that it was desirable to be able to do this, and moved to incorporate a notion of "guesses" in the language with a concomitant notion of "script-equivalence". This is not to say necessarily that ALL layout-related information is irrelevant to the meaning of the script.) B. It can carried along as a private rendering information, which might not be generally understood (at level 1), but which must be passed along (as it is taken to be part of the meaning of the script). C. It can be carried along as public rendering information with loosely-defined semantics, which are subject to re-interpretation by particular editors; but which does not affect an editors RENDERING of the document. D. It can be carried along as public rendering information with tightly-defined semantics in terms of some well-understood layout model, and yet which does not affect an editor's RENDERING of the document. E. It can impose constraints on an editor's RENDERING of a document (we've rejected this). I don't know quite what to do with the scenario above, so I'll ignore it for now. --*************** ONE VIEW OF WHAT NEEDS TO BE DONE We know that InterDoc must be able to represent all the CONTENT of every conforming editor's document (whatever that is). If the perception of such an editor is that InterDoc does not preserve content, then it will be very difficult to argue that InterDoc is a document interchange language (even though the editor may conform in the InterDoc sense), since converting a document to InterDoc and back will lose "content". Not all of that content will be represented in a form that is completely specified by the InterDoc standard (i.e., there is such a thing as private content). All aspects of CONTENT are semantically meaningful (for the purposes of InterDoc) and must be able to be transmitted by InterDoc. We also know that certain aspects of the FORM of a document will be effectively discarded when that document is cast in InterDoc form by its own editor (e.g., line breaks in some cases). Those are the aspects of form that are semantically meaningless for the purposes of InterDoc as determined by that editor. That semantically meaningless form is sometimes represented as guess in both public and private variants; it is sometimes not even represented at all. We also know that, for a given editor, certain aspects of form are considered to be meaningful and will be retained by that editor when it casts its documents into InterDoc form. There are public and private variants here also. Typically the editor will encode as much of its meaningful form in public InterDoc as it can (i.e., as the standard permits), although there is nothing forcing this. InterDoc will specify standard formats for certain kinds of content AND certain kinds of form. Every kind of form and content which is meaningful (for the PURPOSES of InterDoc) to an editor must be representable in InterDoc - at least in private format. The distinction between public and private formats in InterDoc is not the distinction between meaningfulness and non-meaningfulness (which is editor specific at least); it is not the distinction between between form and content. Rather it is a utilitarian distinction, encompassing aspects of both form and content. The question which arose at last week's meeting was whether those aspects of form and content which end up in public InterDoc can be placed in the framework of a "document model" which would be consistent with all conforming editors. There was some disagreement as to the value/possibility of this. A separate question is whether such a document model should itself make a distinction between content and form, and model each of these to some extent separately. This message was more framing that any concrete attempt at formulation. The preceding paragraph articulates the major decisions which we must make before beginning. I'll take a stab at formulation tomorrow. Gael *start* 00969 00024 USt Date: 29-Apr-82 14:32:31 PDT (Thursday) From: Ayers.PA Subject: Meeting 9:00 Friday 30 April 100G Bayhill To: InterDoc The subcommittee on "document model"s has exchanged some messages, but their results are certainly not well-baked at this time. Perhaps one of the members would like to lay out some of the issues. Thanks to Scott for chairing the meeting of last Friday and covering for my late arrival. The main discussion of 23 April hinged on the "document model" issue. Jean-Marie presented his list of "attributes". Examination of this list suggested that collecting the attributes is a "one damn thing after another" exercise and we should first define a "document model" to hang things on (and bounce things off?). The model would certainly be useful in cleaning up our nomenclature. A subcommittee on "document model"s was formed and consists of McGregor, Curry, Mitchell, with Stepak, Horning and Ayers on the sidelines. Bob *start* 06113 00024 USt Date: 28-Apr-82 14:15:16 PDT (Wednesday) From: gcurry.ES Subject: Document Model To: Mitchell.pa, McGregor.pa cc: Stepak.pa, Horning.pa, Ayers.pa, GCurry.es Excuse my tardiness. I got wrapped up in something else yesterday. Let me first play Devil's advocate. CONSIDER THE FOLLOWING DOCUMENT MODEL: A document is a sequence of "characters" and a family of labeled coverings for that sequence. (A covering of a sequence is a partitioning of the sequence into subsequences such that every element of the sequence falls in exactly one subsequence; a labeled covering also associates a label with each subsequence; two subsequences may be labeled identically). For every interesting dimension of a document (whether it be fontness, paragraphness, fieldness, pageness, columnness, etc) there is a labeled covering of the document corresponding to that dimension. For example, for font, the document sequence would be broken into subsequences (runs) of constant font, and each of these would be tagged with a font identifier designating that font. For fieldness, subsequences might be tagged with the field name (or Nil). A "character" can be "simple"(atomic), like the characters we are used to dealing with; it can alternatively be complex (and have substructure). Simple characters might be like the letters 'A, or 'Z; complex characters might be like format blocks, frame anchors, etc. Complex characters are not characters in the sense that they have "character" value - in the sense that we usually speak of it. They are characters in the sense that they occupy a position between other characters, and are often rendered as if they were characters, and flow long with other characters as the sequence is edited. To continue with the model: some complex characters have substructure which is itself similar to the structure postulated for the document as a whole. That is, they consist of a sequence of "characters" and a family of labeled coverings (I don't know about the relationship of this covering to the ones at higher levels, but say they are separate for now). Other complex characters have substructure which is only unnaturally interpreted as a sequence of "characters". For example, it seems clear that the text in a figure in a document is subordinate to the text in the main body of the document (whatever that means). Yet it has the same general structure. On the other hand the figure itself isn't naturally seen as a sequence of characters with some strange rendering style (that style being interactively, incrementally specified by moving the "characters"/pieces-of-figure and remembering where they went). (Labeled coverings can also be defined for unordered sets). Use the term "parts" (of a document) to mean characters (both simple and complex) as well as those components of things not looking like sequences of characters. Let us also say that the parts of documents may be arbitrarily inter-related. IS THIS DIFFERENT FROM THE INTERDOC FRAMEWORK? If in the above discussion, we substitute "node" for "parts", we get something which looks pretty much like the InterDoc framework. Are there any differences? We would hope there are no aspects of the model above which can't be handled by InterDoc, since the above is a reasonable model for an editor. And yet the model above talks about "labeled coverings" of sequences of characters. A SLIGHT DIGRESSION : Why are labeled coverings of sequences of characters interesting? My opinion is that it is because there is no natural, canonical hierarchy on text. Perhaps the best candidate is the book/chapter/section/paragraph hierarchy. Or is it book/section/section...section/paragraph? Or is it book/.../section/paragraph/sentence? There are other interesting hierarchies; we've discussed the layout hierarchy (doc/pages/columns/blocks/lines) many times. There is also field structure, which can be quite independent of the other two. Labeled coverings allow independent aspects of the sequence to be represented We have pretty much decided not to use the layout hierarchy as the InterDoc "dominant hierarchy". One question to ask is whether we should use the book/section*/paragraph hierarchy as the dominant hierarchy. We don't have to do so. Certain domains do have a clear hierarchy. Those should use the dominant hierarchy in InterDoc. We have never really looked at the possibility of using other than the node structure of an InterDoc script to represent the book/section*/paragraph hierarchy. The motivation FOR making that the dominant hierarchy is that it gives us something - inheritance/nested environments, as I recall. How much does inheritance help for those concepts? One cost of deciding to represent the book/section*/paragraph hierarchy in terms of the InterDoc dominant hierarchy is that we must find a syntax for document content which is consistent with that of all conforming editors. AN ALTERNATIVE REPRESENTATION FOR THE BOOK/SECTION*/CHAPTER HIERARCHY IF book = "Introduction" = "Overview" , =... = "blah blah", =... then we have been talking about representing this as {BOOK$ "Introduction" {Section$ "Overview" {PARA$ "blah-blah"} {...}}{...}} We could also represent is as {TEXT$ section#:1 para#:1 "Introduction" {section# := section#+1} "Overview" {para# :=para#+1} "blah-blah" {para# :=para#+1} ... {section# := section#+1}...}. Excuse my syntax. If there were other dimensions of interest in the text, nodes which persistently rebound the label for that dimension would be introduced at the point where the label changed. This allows the various elements of the book/.../para hierarchy to be treated the same as other as dimensions of text. These embedded nodes are like "control characters" (heaven forbid!). Other nodes would correspond to real, complex characters and would also be embedded. DOES THIS CHOICE OF REPRESENTATION MATTER TO OUR DOCUMENT MODEL? I'm not sure. I'll mail this now and continue with a subsequent message Gael *start* 07953 00024 USt Date: 29-Apr-82 18:43:37 PDT (Thursday) From: gcurry.ES Subject: Document Model To: McGregor.pa, Mitchell.pa cc: Ayers.pa, Horning.pa, Stepak.pa, gcurry Here are some assertions and questions which might qualify as the "issues" that Bob wanted: 1. There will be many conforming editors and one InterDoc. The "document models" question is whether we can find a framework in terms of which we can define marks and attributes in the language of InterDoc level 1, and which is consistent with each conforming editor's view of itself. 2. Such a model must address document content as well as document rendering/layout. 3. Every editor sees its subject documents along a "content-to-form" spectrum. That is, every document individually makes its own choice about what aspects of its documents are more content and which aspects are more form. This spectrum is independent of InterDoc 4. Every editor sees its subject documents along a "meaningful-for-the-purposes-of-interchange" spectrum, with more meaningful things corresponding to content and less meaningful things corresponding to form. There must be a fairly hard discriminating line for each editor between "meaningful" and "not meaningful". The meaningful aspects of its documents are those things it chooses to encode in InterDoc - in either public or private format; the non-meaningful things it chooses to lose upon translation to InterDoc. 5. For each editor, all of what it regards as content is meaningful (by definition), and must be encoded in InterDoc in some form. 6. Many editors will regard some aspects of form as meaningful and will encode those aspects of their documents in InterDoc - in either public or private format. 7. It is a foregone conclusion that all conforming editors will consider that aspect of form given by FONT to be meaningful and will encode that in public (level 2) InterDoc. 8. It is a foregone conclusion that all conforming editors will consider that aspect of form given by PARAGRAPH style to be meaningful to some extent and will encode that in public (level 2) InterDoc. 9. There is a range of aspects of a document which might be considered to constitute "form". Some are a. parts of a document which independently designate form. Examples are font designators, pagagraph style designators, page format characters. I assume that these parts are not redundant expressions of other parts of the document. Call these "independent form designators" b. parts of a document which redundantly designate form. Examples are the paragraph formatting parameters which derive from document style, the sizes of blocks of text immediately after pagination, and so forth. These might be called "rendering artifacts", since they are re-computable by re-running the rendering algorithm. c. parts of a document which, together with the independent form designators, AND the rendering environment, redundantly designate form. Examples are the size of blocks of text rendered in the context of a (spatially or temporally changing) fonts.widths, for example. If the InterDoc representation of a document does not include that part of the rendering environment which was used during the rendering process, and we have no guarantee that that part of the environment is unchanged, then form information will be lost by this conversion. Call these "environmental artifacts". d. parts of the rendered representation of a document derive from properties of the device upon which it is rendered. Examples of this are that vertices of graphical objects coincide on a discrete device even though internally their coordinates are represented as real numbers. Call these aspects of form "device artifacts". e. parts of the rendered representation of a document are random and bear no relation to anything else. Call these aspects of form "random artifacts". 10. An argument can be made that each of the five aspects of form mentioned above could be considered to be meaningful for some particular editor. SOME STAR-SPECIFIC ASSERTIONS AS CONCRETE EXAMPLES 11. Every aspect of a Star document which can be changed by a user (either directly or via property sheets) and which the document will remember is meaningful and must be encoded in InterDoc. 12. There are aspects of Star documents which cannot be changed by users (e.g. fonts metrics - environmental artifacts in general) which nevertheless produce meaningful form (e.g. absence of widows, orphans, rivers, hyphenations, etc). 13. Here are some concepts of equivalence of Star documents: a. equivalent iff backing storage is identical. b. equivalent iff the documents are equivalent as data structures (e.g., pointer values are unimportant) c. equivalent modulo random artifacts (you can strip out guesses). d. equivalent iff pagination/rendering in a fixed environment gives the same result. This has been suggested as the working/interim definition of "same document" as it relates to transcription fidelity. e. equivalent modulo equivalence of parts of a document (i.e., define equivalence of text blocks, then documents are equivalent iff corresponding text blocks are equivalent and corresponding other parts are equal. 14. The equivalence mentioned above is not script-equivalence, rather is editor specific; call it "editor-equivalence". It is just as important, however, for explaining what InterDoc does. That is, if E is a conforming editor, and E-IN is the transform from InterDoc script to editor format, E-OUT is the transform from editor format to script, then we must have E-IN(E-OUT(doc)) editor-equivalent doc, and E-OUT(E-IN(script)) script-equivalent script. We must warn clients that the mere acts of passing a document/script through InterDoc/an-editor will lose "information" in the sense above. 14. Assuming 13.d. above, Star must encode all aspects of its documents which a user can edit into a document and which will be remembered across pagination in a fixed environment. This puts a lower bound on what Star must encode. MORE EDITOR INDEPENDENT ASSERTIONS AND QUESTIONS 15. All meaningful (in the sense of 4 above) aspects of all conforming editors' documents must be preserved under the map E-IN(E-OUT(...)). 16. Is there any substantial argument for or against a view of text as a sequence of characters with labeled coverings versus book/section*/paragraph hierarchy? 17. We were motivated to introduce guesses into the base language in order that form may be transmitted as information, but not as value. While we have probably handled the language level issues of guesses, we still need to handle the level 2 issues. If the main motivation is to be able to express coordinate/size information, then we need a coordinate system. The coordinate system is in terms of some model, isn't it? 18. In either of the above, we think of text as a sequence of characters with some kind of auxiliary structure. In particular, we don't think of it as TWO sequences of characters. What is our position on the French-canadian documents, which have parallel streams of French and English text. 19. The canonical example of a paragraph property is margin. In order to talk about margin, we need an edge. Is it true that whenever we talk about any margins, they are all with respect to this single edge which (we seem forced to) know about? Or are there two edges, separated by some distance, in order that we can specify margin offsets from the right edge as well as offsets from the left edge. Can we express publicly anything that has to do with a position in a column, or a position in a page, so that we would need to introduce notions of top edge and bottom edge as well. 20. Are the interpretations of level 2 marks and attributes subject to interpretation by specfic editors, or are the semantics to be concretely (whatever that means) specified by the InterDoc standard? Gael *start* 13851 00024 USt Date: 29 April 1982 10:42 pm PDT (Thursday) From: Mitchell.PA Subject: Re: Document Model In-reply-to: gcurry.ES's message of 29-Apr-82 18:43:37 PDT (Thursday) To: gcurry.ES cc: McGregor, Mitchell, Ayers, Horning, Stepak Gael, I read your message with interest and I was moved to comment on the points you have raised. I have reproduced your points (including the two versions of 14) below along with my comments following the points they discuss: ------------------- 1. There will be many conforming editors and one InterDoc. The "document models" question is whether we can find a framework in terms of which we can define marks and attributes in the language of InterDoc level 1, and which is consistent with each conforming editor's view of itself. 2. Such a model must address document content as well as document rendering/layout. Comment: Hear! Hear! ------------------- 3. Every editor sees its subject documents along a "content-to-form" spectrum. That is, every document individually makes its own choice about what aspects of its documents are more content and which aspects are more form. This spectrum is independent of InterDoc Comment: I have a different view of this: In the "content-to-form" spectrum I can perceive only three points. One extreme point corresponds to some aspect of a document clearly being "form" (I claim that margins are an aspect that is clearly form and not content); the other end of the spectrum correpsonds to an aspect of a document clearly being "content" (e.g., the character codes that represent the body of this message are clearly content and not form); the third point involves a so far small set of features that have been represented as content in most editing systems but as form in others, vis., TAB, CR, and LF, which have to do with form and presentation but are often found embedded in content. I definitely don't concur that "every document individually makes its own choice about what aspects of its documents are more content and which aspects are more form". It is just such (arbitrary) choices that the Interdoc standard must make. For instance, whether or not the coordinates of a Bezier curve are viewed as content or form may well be less important than that we decide which they are. ------------------- 4. Every editor sees its subject documents along a "meaningful-for-the-purposes-of-interchange" spectrum, with more meaningful things corresponding to content and less meaningful things corresponding to form. There must be a fairly hard discriminating line for each editor between "meaningful" and "not meaningful". The meaningful aspects of its documents are those things it chooses to encode in InterDoc - in either public or private format; the non-meaningful things it chooses to lose upon translation to InterDoc. Comment: A 'thing" will be represented either as form or as content depending on the Interdoc standard not on whether a given editor views that "thing" as "meaningful". I would prefer that the decision as to whether or not something is to be lost upon translation to InterDoc be based on criteria such as whether that thing is readily derivable from other components of the document rather than on "meaningfulness" to the conforming editor that generates the script. ------------------- 5. For each editor, all of what it regards as content is meaningful (by definition), and must be encoded in InterDoc in some form. 6. Many editors will regard some aspects of form as meaningful and will encode those aspects of their documents in InterDoc - in either public or private format. Comment: I don't understand these two assertions. ------------------- 7. It is a foregone conclusion that all conforming editors will consider that aspect of form given by FONT to be meaningful and will encode that in public (level 2) InterDoc. 8. It is a foregone conclusion that all conforming editors will consider that aspect of form given by PARAGRAPH style to be meaningful to some extent and will encode that in public (level 2) InterDoc. Comment: Yup. ------------------- 9. There is a range of aspects of a document which might be considered to constitute "form". Some are a. parts of a document which independently designate form. Examples are font designators, pagagraph style designators, page format characters. I assume that these parts are not redundant expressions of other parts of the document. Call these "independent form designators" b. parts of a document which redundantly designate form. Examples are the paragraph formatting parameters which derive from document style, the sizes of blocks of text immediately after pagination, and so forth. These might be called "rendering artifacts", since they are re-computable by re-running the rendering algorithm. c. parts of a document which, together with the independent form designators, AND the rendering environment, redundantly designate form. Examples are the size of blocks of text rendered in the context of a (spatially or temporally changing) fonts.widths, for example. If the InterDoc representation of a document does not include that part of the rendering environment which was used during the rendering process, and we have no guarantee that that part of the environment is unchanged, then form information will be lost by this conversion. Call these "environmental artifacts". d. parts of the rendered representation of a document derive from properties of the device upon which it is rendered. Examples of this are that vertices of graphical objects coincide on a discrete device even though internally their coordinates are represented as real numbers. Call these aspects of form "device artifacts". e. parts of the rendered representation of a document are random and bear no relation to anything else. Call these aspects of form "random artifacts". Comment: a. Yes b. These are definitely form, and should probably be private because they are not essential c. This is getting very close to Interpress issues. d. Definitely not part of an Interdoc script e. I am hopelessly confused as to what this means. ------------------- 10. An argument can be made that each of the five aspects of form mentioned above could be considered to be meaningful for some particular editor. Comment: The argument is strong for (a), weak for (b), possible for (c), probably not provable for (d), and I am still confused about (e). ------------------- SOME STAR-SPECIFIC ASSERTIONS AS CONCRETE EXAMPLES 11. Every aspect of a Star document which can be changed by a user (either directly or via property sheets) and which the document will remember is meaningful and must be encoded in InterDoc. Comment: Yup. ------------------- 12. There are aspects of Star documents which cannot be changed by users (e.g. fonts metrics - environmental artifacts in general) which nevertheless produce meaningful form (e.g. absence of widows, orphans, rivers, hyphenations, etc). Comment: Theses aspects do affect Star documents, but I think they probably do not affect an Interdoc standard. ------------------- 13. Here are some concepts of equivalence of Star documents: a. equivalent iff backing storage is identical. b. equivalent iff the documents are equivalent as data structures (e.g., pointer values are unimportant) c. equivalent modulo random artifacts (you can strip out guesses). d. equivalent iff pagination/rendering in a fixed environment gives the same result. This has been suggested as the working/interim definition of "same document" as it relates to transcription fidelity. e. equivalent modulo equivalence of parts of a document (i.e., define equivalence of text blocks, then documents are equivalent iff corresponding text blocks are equivalent and corresponding other parts are equal. Comment: Option (a) has the virtue of simplicity for testing but I fear it is too strict and it would be rare that two scripts obtained by running a single script through two conforming editors would be equal to each other or the source script. Probably we would have to talk about comparing two scripts as a pair of token streams or some such, although this still seems too restrictive to me. I would prefer being able to talk about equivalence of two scripts as equivalence of their "abstract syntax trees decorated with environments, marks, and links. This is very like your options (b) and (e) [I think]. I don't understand option (c). Although equivalence of scripts should imply equivalence of presentation on a given medium (your option (d)), it is clear that the converse is not true: it is possible to have two scripts print exactly the same without being equivalent (e.g., one might use styles to get all the paragraphs to looks the same while the other may simply say that each individual paragraph has some set of properties that just happen to be the same for all paragraphs). ------------------- 14. The equivalence mentioned above is not script-equivalence, rather is editor specific; call it "editor-equivalence". It is just as important, however, for explaining what InterDoc does. That is, if E is a conforming editor, and E-IN is the transform from InterDoc script to editor format, E-OUT is the transform from editor format to script, then we must have E-IN(E-OUT(doc)) editor-equivalent doc, and E-OUT(E-IN(script)) script-equivalent script. We must warn clients that the mere acts of passing a document/script through InterDoc/an-editor will lose "information" in the sense above. Comment: I guess I agree with this. Your E-IN is what we have been calling rendering and your E-OUT is what we have been calling transcription. I don't think we can talk about editor-equivalence very precisely, but we should definitely understand script-equivalence. ------------------- 14. Assuming 13.d. above, Star must encode all aspects of its documents which a user can edit into a document and which will be remembered across pagination in a fixed environment. This puts a lower bound on what Star must encode. Comment: Yup. ------------------- MORE EDITOR INDEPENDENT ASSERTIONS AND QUESTIONS 15. All meaningful (in the sense of 4 above) aspects of all conforming editors' documents must be preserved under the map E-IN(E-OUT(...)). Comment: well, I disagree with the "meaningful" characterization of the problem, and I don't think we can define editor-equivalence, so ... ------------------- 16. Is there any substantial argument for or against a view of text as a sequence of characters with labeled coverings versus book/section*/paragraph hierarchy? Comment: We should discuss this at the next meeting. Jim Horning and I believe (a) the labelled covering view can be used with the current Interdoc basis, and (b) the labelled covering view alone cannot represent scripts with sufficient generality. ------------------- 17. We were motivated to introduce guesses into the base language in order that form may be transmitted as information, but not as value. While we have probably handled the language level issues of guesses, we still need to handle the level 2 issues. If the main motivation is to be able to express coordinate/size information, then we need a coordinate system. The coordinate system is in terms of some model, isn't it? Comment: We definitely need agreement on a coordinate system model. ------------------- 18. In either of the above, we think of text as a sequence of characters with some kind of auxiliary structure. In particular, we don't think of it as TWO sequences of characters. What is our position on the French-canadian documents, which have parallel streams of French and English text. Comment: If I may be permitted to rephrase: "which have parallel pieces of French and English text which are to be formatted so as to be juxtaposed at certain critical points (e.g., at section boundaries such as 1.6.2.1)". This is an issue of specifying formatting constraints, which is important, but not unique to my homeland. ------------------- 19. The canonical example of a paragraph property is margin. In order to talk about margin, we need an edge. Is it true that whenever we talk about any margins, they are all with respect to this single edge which (we seem forced to) know about? Or are there two edges, separated by some distance, in order that we can specify margin offsets from the right edge as well as offsets from the left edge. Can we express publicly anything that has to do with a position in a column, or a position in a page, so that we would need to introduce notions of top edge and bottom edge as well. Comment: I imagine the external environment containing bindings for margins or possibly page frames that can be accessed in scripts. The specific values of these external bindings might depend on the country in which the workstation resides (e.g., A4 versus 8.5x11) or the use to which it is being put (e.g., legal size pages versus normal 8.5x11 pages). In this model, expressing a position in a page is equivalent to expressing a position in any frame, which we clearly must be able to do. ------------------- 20. Are the interpretations of level 2 marks and attributes subject to interpretation by specfic editors, or are the semantics to be concretely (whatever that means) specified by the InterDoc standard? Comment: Hmmm. Some attributes better not be subject to interpretation (e.g., that there are standard units for expressing distances), while others, such as Justified_T, are certainly subject to limitations (e.g., typically Knuth's TEX will do a better job than Bravo, but both provide justification). ------------------- Whew! Those were a lot of points. We need to talk about these in a higher bandwidth manner. Jim Mitchell *start* 14263 00024 USt Date: 5 March 1982 2:25 pm PST (Friday) From: Mitchell.PA Subject: Interdoc syntax and semantics To: Interdoc.pa Reply-To: Mitchell.pa Here are the current versions of the Interdoc syntax and semantics that I promised to distribute to you all at this morning's meeting. I have laced the semantic equations with some explanatory text to help you wade through them, but I am under no illusions that this has made them easy to read. Please feel free to send me questions about them. Such questions will undoubtedly help us to clarify the presentation. Jim M. ------------------- GRAMMAR script ::= versionId item item ::= value | binding | label value ::= term | node term ::= primary | primary op term op ::= "+" | "-" | "*" | "/" primary ::= literal | invocation | application | selection | sequence literal ::= Boolean | integer | hexint | real | string invocation ::= name | universal name ::= id ( "." id )* universal ::= "$" name application ::= invocation "[" item* "]" selection ::= "(" term "|" item* "|" item* ")" sequence ::= "(" ( value | binding )* ")" node ::= "{" item* "}" binding ::= name mode rhs mode ::= "=" | ":" | ":=" | "_" rhs ::= value | op term | "'" item* "'" | "[" [ invocation ] "|" binding* "]" label ::= mark | link mark ::= invocation "#" link ::= id "@!" | name "@" | name "!" NOTATION FOR ENVIRONMENTS Environments bind identifiers to expressions, in various modes ("=", ":", ":=", "_"): Null denotes the "empty" environment [E | id m e] means "E with id mode m bound to e" locBinding(id, E) denotes the binding mode of id in E locBinding(id, Null) = None locBinding(id, [E | id' m e]) = if id=id' then m else locBinding(id, E) locVal(id, E) denotes the value locally bound to id in E locVal(id, Null) = Nil = "" locVal(id, [E | id' m e]) = if id=id' then e else locVal(id, E) SEMANTIC FUNCTIONS R: expression, environment --> expression -- Reduction R is used for evaluating right-hand sides: identifiers, expressions, etc. C: expression --> expression -- Contents C is basically used to indicate which evaluated expressions become part of the content of a node B: expression, environment --> environment -- Bindings B indicates the effect a binding has on an environment. B and R are mutually recursive functions (e.g., the evaluation of an expression may cause some bindings to occur as well) The following four semantic functions occur less frequently in any substantive way in the semantics below. You might wish to skip them until they occur in a nontrivial manner in the semantics. M: expression --> expression -- Marks M indicates when an identifier is to be included in the mark set for a node L: expression --> expression -- Links L indicates link declarations S: expression --> expression -- Link sources S indicates a link to the set of nodes having associated target links T: expression --> expression -- Link targets T indicates that the node is to be included in the target set of all the names which are prefixes of the name to which the expression should evaluate PRESENTATION BY FEATURE [E is used to represent the value of the environment in which the feature occurs.] term ::= primary op term op ::= "+" | "-" | "*" | "/" R = C = R(E) op R(E) B = E M = L = S = T = Nil -- Both the primary and the term must reduce to numbers; the arithmetic operators are evaluated right-to-left (a la APL, without precedence) and bind less tightly than application. primary ::= literal literal ::= Boolean | integer | hexint | real | string R = C = literal B = E M = L = S = T = Nil -- The basic contents of a document. invocation ::= id R = R(E) B = B(E) where valOf(id, E) = locVal(id, whereBound(id, E)) -- Gets innermost value whereBound(id, E) = CASE -- Gets innermost binding locBinding(id, E) ~= None => E locBinding("Outer", E) ~= None => whereBound(id, locVal("Outer", E)) True => Null -- Both attributes and definitions are looked up in the current environment; depending on the current binding of id, this may produce values and/or bindings; if the binding's rhs was quoted, the expression is evaluated at the point of invocation. -- When an id is referred to and locBinding(id, E)=None, then the value is sought recursively in locVal("Outer"). The (implicit) "outermost" environment binds each id to the "universal" name $id. invocation ::= name "." id R = R(E))>(E) B = B(E))>(E) -- Qualified names are treated as "nested" environments. universal ::= "$" name R = C = "$" name B = E M = L = S = T = Nil -- Names prefixed with a $ are presumed to be directly meaningful, and are not looked up in the environment. application ::= invocation "[" item* "]" R = apply(invocation, R(E), E) B = E where apply(invocation, value*, E) = CASE R(E) OF "$equal" => value1 = value2 "$greater" => value1 > value2 . . . "$subscript" => value1[value2] -- value1: sequence, value2: int "$contents" => "(" C ")" "$marks" => "(" M ")" "$links" => "(" L ")" "$sources" => "(" S ")" "$targets" => "(" T ")" ELSE => R([[Null | "Outer" "=" E] | "Value" "=" value*]) inner("{" value* "}") = value* -- If the invocation does not evaluate to one of the standard external function names, the current environment is augmented with a binding of the value of the argument list to the identifier Value, and the value is the result of the invocation in that environment; this allows function definition within the language. selection ::= "(" term "|" item1* "|" item2* ")" R = if R(E) then R(E) else R(E) B = if R(E) then B(E) else B(E) -- The notation for selections (conditionals) is borrowed from Algol 68: ( | | ) This is consistent with our principles of using balanced brackets for compound constructions and avoiding syntactically reserved words; the true part and false part may each contain an arbitrary number of items (including none). sequence ::= "(" item* ")" R = C = "(" R(E) ")" B = B(E) M = L = S = T = Nil -- Parentheses group a sequence of items as a single value; bindings in the sequence affect the environment of items to the right in the containing node, but labels are disallowed. Parentheses may also be used to override the right-to-left evaluation of arithmetic operators; an operand sequence must reduce to a single numeric value. node ::= "{" item* "}" R = C = "{" R<"Sub" item*>([Null | "Outer" "=" E]) "}" B = locVal("Outer", (B<"Sub" item*>([Null | "Outer" "=" E]))) M = L = S = T = Nil -- Nodes have nested environments, and affect the containing environment only through persistent (:=) bindings to ids with outer VAR (:) bindings. The items of a node are implicitly prefixed with the id Sub, which may be bound to any information intended to be common to all subnodes in a scope. item* ::= "" R = C = M = L = S = T = Nil B = E -- The empty sequence of items has no value and no effect; this is the basis for the following recursive definition. item* ::= item1 item* R = R(E) R(B(E)) B = B(B(E)) For F in {C, M, L, S, T}: F = F F -- In general, the value of a sequence of items is just the sequence of item values; binding items affect the environment of items to their right; Nil does not change the length of a result sequence. binding ::= name mode rhs R = Nil B = bind(name, mode, R(E), E) where bind(id, mode, value, E) = CASE bindingOf(id, E) = "=" => E -- Can't rebind constants mode = ":=" => assign(id, value, E) True => [E | id mode value] bind(id "." name, mode, value, E) = [E | id bindingOf(id, E) bind(name, mode, value, valOf(id, E))] bindingOf(id, E) = locBinding(id, whereBound(id, E)) assign(id, value, E) = CASE locBinding(id, E) = ":" => [E | id ":" value] bindingOf(id, E) = ":" => bind("Outer." id, ":=", value, E) True => E -- Can only assign to vars -- This adds a single binding to E; bindings have no other "side effects" and no value. -- Each environment, E, initially contains only its "inherited" environment (bound to the id Outer). Most bindings take place directly in E. To allow for "persistent" bindings, the value of a bind(id, ":=", val, E) will change E by rebinding id in the "innermost" environment (following the chain of Outers) in which it is bound, if that binding has the binding ":" (Var). Identifiers bound with binding "=" (Const) may not be rebound in inner environments. binding ::= name mode op term = -- This is just a convenient piece of syntactic sugar for the common case of updating a binding. rhs ::= "'" item* "'" R = item* -- If the rhs of a binding is surrounded by single quotes, it will be evaluated in the environments where the name is invoked, rather than the environment in which the binding is made. rhs ::= "[|" binding* "]" R = [B([Null | "Outer" "=" E]) | "Outer" "=" Null] -- This creates a new environment value that may be used much like a record. rhs ::= "[" invocation "|" binding* "]" R =[B([R(E) | "Outer" "=" E]) | "Outer" "=" Null] -- This creates a new environment value that is an extension of an existing one. mark ::= invocation "#" R = R(E) "#" M = invocation B = E C = L = S = T = Nil -- This gives the containing node the property denoted by the mark to which the invocation reduces. link ::= id "@!" R = id "@!" L = id B = E C = M = S = T = Nil -- This defines the scope of the set of links whose "main" component is id. -- A label N! on a node makes that node a "target" of the link N (and its prefixes); a label N@ makes it a "source." The "main" identifier of a link must be declared (using id@!) at the root of a subtree containing all its sources and targets. The link represents a set of directed arcs, one from each of its sources to each of its targets. Multiple target labels make a node the target of multiple links. A target label that appears only on a single node places it in a singleton set, i.e., identifies it uniquely. link ::= name "@" R = name "@" S = name B = E C = M = L = T = Nil -- This identifies the containing node as a "source" of the link name. link ::= name "!" R = name "!" T = prefixes(name) B = E C = M = L = S = Nil where prefixes(id) = id prefixes(name "." id) = name "." id prefixes(name) -- This identifies the containing node as a "target" of each of the links that is a prefix of name. DISCUSSION Each environment, E, initially contains only its "inherited" environment (bound to the id Outer). Most bindings take place directly in E. To allow for "persistent" bindings, the value of a bind(id, ":=", val, E) will change E by rebinding id in the "innermost" environment (following the chain of Outers) in which it is bound, if that binding has the binding ":" (Var). Identifiers bound with binding "=" (Const) may not be rebound in inner environments. If the rhs of a binding is surrounded by single quotes, it will be evaluated in the environments where the name is invoked, rather than the environment in which the binding is made. When an id is referred to and locBinding(id, E)=None, then the value is sought recursively in locVal("Outer"). The (implicit) "outermost" environment binds each id to the "universal" name $id. Nodes are delimited by brackets. The contents of each node are implicitly prefixed by Sub, which will generally be bound in the containing environment to a quoted expression performing some bindings, and perhaps supplying some labels (marks and links). Parentheses are used to delimit sequence values. Square brackets are used to delimit the argument list of an operator application and to denote environment constructors, which behave much like records. Expressions involving the four infix ops (+, -, *, /) are evaluated right-to-left (a la APL); since we expect expressions to be short, we have not imposed precedence rules. The notation for selections (conditionals) is borrowed from Algol 68: ( | | ) This is consistent with our principles of using balanced brackets for compound constructions and avoiding syntactically reserved words; the true part and false part may each contain an arbitrary number of items (including none). A label N! on a node makes that node a "target" of the link N (and its prefixes); a label N@ makes it a "source." The "main" identifier of a link must be declared (using id@!) at the root of a subtree containing all its sources and targets. The link represents a set of directed arcs, one from each of its sources to each of its targets. Multiple target labels make a node the target of multiple links. A target label that appears only on a single node places it in a singleton set, i.e., identifies it uniquely. GRAMMATICAL FEATURE X SEMANTIC FUNCTION MATRIX FEATURES: FUNCTIONS: R C B M L S T term ::= primary op term + = - - - - - primary ::= literal == - - - - - invocation ::= id + - + - - - - invocation ::= name "." id + - + - - - - universal ::= "$" name == - - - - - application ::= invocation "[" item* "]" + - - - - - - selection ::= "(" term "|" item1* "|" item2* ")" + - + - - - - node ::= "{" item* "}" + = + - - - - sequence ::= "(" ( value | binding )* ")" + = + - - - - item* ::= item1 item* + + + + + + + binding ::= name mode rhs - - + - - - - rhs ::= "'" item* "'" + - - - - - - rhs ::= "[|" binding* "]" + - - - - - - rhs ::= "[" invocation "|" binding* "]" + - - - - - - mark ::= invocation "#" + - - + - - - link ::= id "@!" =- - - + - - link ::= name "@" =- - - - + - link ::= name "!" =- - - - - + - Semantic function produces Nil or E or does not apply. + Non-trivial semantic equation. =For R: passes value unchanged; for C: value same as R. -------------------