*start* 00926 00024 USt Date: 1 June 1981 9:15 am EDT (Monday) From: Lampson.PA Subject: Re: Some Doc Issues In-reply-to: Mitchell's message of 31 May 1981 12:27 pm PDT (Sunday) To: Mitchell cc: InterDoc↑ I think the question you are asking is: what happens to an Interdoc of level i when it is edited by an Interdoc editor of level j<i. That is certainly a question which doesn't arise with Press. I agree that the proper answer is unclear. It might be worth noticing that the simplest answer is that the Interdoc is restricted to level j first. This presumably means that anything not expressible at level j is discarded, or mapped in some unimaginative and irreversible fashion into someting expressible at level j. This solution seems too draconian, but I have a feeling that it will not be easy to find a less draconian solution which is still reasonably well defined. Still, it is certainly worth some effort. *start* 00944 00024 USt Date: 1 June 1981 9:15 am PDT (Monday) From: McGregor.PA Subject: Re: Some Doc Issues In-reply-to: Lampson's message of 1 June 1981 9:15 am EDT (Monday) To: Lampson cc: Mitchell, InterDoc↑ I recall a number of proposals in SDD where a level j (like a Cub or something) editor would find itself editing a Star document. The goal was to preserve all of the information, yet simply be unable to display/edit the higher level objects. Question: Do we believe that any k-level document object that is incomprehensible to a level j editor can be represented and displayed as a "black box" that can be at least deleted and perhaps moved or copied? I'm unhappy with Butler's "draconian" solution of discarding all the k level objects. At least in the product organisations, the ability to do touch-up editing using a lower class of editor than the one that originally composed the document seems quite important. Scott. *start* 01637 00024 USt Date: 1 June 1981 10:44 am PDT (Monday) From: Mitchell.PA Subject: Re: Some Doc Issues In-reply-to: McGregor's message of 1 June 1981 9:15 am PDT (Monday) To: McGregor cc: Mitchell, InterDoc↑ I am sending this message again because Laurel threw in a reply-to field that I didn't want on the previous transmission. I also find Butler's solution a little too Draconian, and I think it is worthwhile trying to find some way to preserve structure in an InterDoc script even when it is edited by a low-level editor. Think of an InterDoc script, S, as a database containing document pieces with attributes, no matter how represented. If level(S)<level(some editor, E), this could be handled by E accessing S through a level filter (i.e., a database view), which would hide the intricacies of pieces of higher level than E's and might well present a different interface, X, to E for the pieces it does understand than a higher level editor might use. E's updates would then not disturb structures that it couldn't touch, and its changes would be reflected back into S via X. If changes were reflected back to S in a batch or only in relatively large chunks, this could even be made relatively efficient. There are certainly difficulties with this approach; e.g., If E doesn't understand paragraph structure but S contains paragraphs, how would X map alterations back to S (some of which cross paragraph boundaries)? A Note: Even if E doesn't understand illustrations, for instance, it might still be able to delete or move entire illustrations in S as long as E does understand about embedded objects. JGM *start* 00536 00024 USt Date: 1 June 1981 3:01 pm EDT (Monday) From: Lampson.PA Subject: Re: Some Doc Issues In-reply-to: Mitchell's message of 1 June 1981 10:44 am PDT (Monday) To: Mitchell cc: InterDoc↑ This idea has promise, but it has some problems. The most common situation may be one in which the code for the editor and the machine running the editor don't understand about higher levels at all. This means that the filter must be quite simple and quite generic, so that it can be included in the editor from the beginning. *start* 01665 00024 USt Date: 1 June 1981 12:10 pm PDT (Monday) From: Horning.pa Subject: Re: Some Doc Issues In-reply-to: Mitchell's message of 31 May 1981 12:27 pm PDT (Sunday) To: Mitchell cc: InterDoc↑ I think that several issues got confused in this conversation. To Leo's original question: "Do we really expect that when a document is transformed from a representation private to an <editor-formatter> to InterDoc format and back, that there will be no information loss?" I hope that the answer is an unequivocal "Yes." Most of the discussion focussed on what we expect to happen when an Interdoc script of some level is edited by an editor of lower level. I had thought that one of the "saving graces" of this excercise was the recognition that we were specifying a standard for something STATIC (an editable document), not the dynamics of what various editors do to documents. I doubt that a standard can enforce many guarantees in this area. I agree that, as a matter of good design, if an editor is applied to a script of higher level, it should generally preserve unchanged elements and structure that it is not prepared to deal with. It would even be a good idea if the standard made that relatively easy to do (as the Bravo format, for example, does not). For example, I could imagine a simple editor that was prepared to edit any string in a script that had uniform attributes (producing a new script in which the edited string had the same attributes), but forced the user to edit separately each string delimited by attribute changes. Similarly, the user might be restricted to moving entire hierarchical units within the script. Jim H. *start* 01549 00024 USt Mail-from: Arpanet host SU-SCORE rcvd at 1-JUN-81 1514-PDT Date: 1 Jun 1981 1512-PDT From: Brian K. Reid <CSL.BKR at SU-SCORE> Subject: issues To: InterDoc↑ at PARC-MAXC Either I am missing out on a big piece of the design conversations here or you guys' minds have been rotted by JaM and Interpress. Possibly both. In Interpress/JaM semantics is everything, and the language design is primarily concerned with execution semantics and imaging primitives. A language for the representation of editable documents should have almost diametrically opposite properties. Syntax is everything. If your mental model of the alleged InterDoc (I don't like this name either) is that it is "something like Interpress", then I can see how you are worried about being able to ignore pieces that you don't understand. But since the essence of InterDoc should be syntactic and not semantic, it should be a complete cake walk to parse and ignore some piece of it that you don't understand. At the moment I believe that a useful property of InterDoc would be that it had absolutely no inherent semantics. Something like an S-expression language, for example, although I think that the pure nesting structure is far too restrictive. I think that our initial design energy should be going into determining the kind of structural information that must be representable syntactically and the kind of structural information that it is ok to punt to the (arbitrary) semantics. I look forward to our meeting tomorrow afternoon. Brian ------- *start* 00890 00024 USt Date: 2 June 1981 11:36 am PDT (Tuesday) From: Ayers.PA Subject: Re: Editing at level "j" In-reply-to: McGregor's message of 1 June 1981 9:15 am PDT (Monday) To: McGregor cc: InterDoc↑ Reply-To: Ayers "I recall a number of proposals in SDD where a level j (like a Cub or something) editor would find itself editing a Star document. The goal was to preserve all of the information, yet simply be unable to display/edit the higher level objects." I proposed an arrangement that would, I claimed, allow a Star document's text to be edited at an 860, even though the 860 cannot display graphics, fonts, etc. The proposal required ancillary smarts -- the 860 was given a construct, edited itm, and then the returned construct was carefully compared with the original. Not really InterDoc, tho I will be happy to circulate the claims at an appropriate time ... Bob *start* 01956 00024 USt Date: 3 June 1981 2:43 pm PDT (Wednesday) From: Mitchell.PA Subject: Meeting #1, 2 June 1981 To: Interdoc↑.pa This message is intended to capture some of the points discussed and questions raised (there are more of the latter) at the first meeting to discuss an interchange standard for editable documents. The next meeting of the InterDoc group will take place Friday, 5 June at 13:00 in the CSL Commons Attendees: John Warnock (JW), Robert Ayers (RA), Scott McGregor (SMcG), Alan Perlis (AJP), Brian Reid (BR), Jim Horning (JJH), and Jim Mitchell (JGM) The meeting began by our trying to determine what we are doing and what an editable document is, anyway. BR: Sees a document standard as a syntactic structuring mechanism. Having only hierarchical structuring is insufficient (e.g., linear order for pagination vs. hierarchical chapter, section, paragraph structure) JGM: Even structure that is not understood by a given editor should be preserved (e.g., if a neanderthal editor was used to change text in a Star document, the illustrations and structuring information should not thereby be deleted). JW: Defaults are important. How are properties inherited? Perhaps a property should say how it is to be inherited. RA: Inheritance is a property of the editor, not of the document pieces (e.g., Star does it both ways: typing inherits from the left, but copying text does not cause it to inherit the properties of contiguous text) SMcG: Properties should have properties. JGM: Where does it end? Is there some base set of meta-properties to which a certain amount of semantics can be attached or is everything syntactic? BR: Some properties must not be inherited; e.g., "the paragraph must begin a new page" should not be inherited by an illustration embedded in that same paragraph. BR: What are the set of meta-properties of properties? General: Do we need to handle inter-document references? JGM *start* 03993 00024 USt Date: 8 June 1981 2:19 pm PDT (Monday) From: Mitchell.PA Subject: Meeting #2, 5 June 1981 To: Interdoc↑.pa Attendees: Bill Paxton (WHP), Robert Ayers (RA), Scott McGregor (SMcG), Alan Perlis (AJP), Brian Reid (BR), Jim Horning (JJH), and Jim Mitchell (JGM) AJP: If we do this job right, it ought to be possible to build a parametric editor that could accept any kind of document. BR: some such editors are one by Nievergelt at ETH Zurich, Zog at CMU, and Steve Woods' thesis at Yale. BR will distribute copies of papers by Nievergelt and Woods. JGM: Tioga is also an example of such an editor (based on CedarDocs) SMcG: We ought to be concerned with the efficiency of transforming an InterDoc script into an internal encoding and back again. BR: If I were building an editor and had the InterDoc standard, I would make the internal format be well matched to the standard. There was some general discussion about wanting to be able to represent changes (deltas) to documents in the standard. RA: (wrt to SMcG's comment above) In general, we can't slurp up an entire document, but need to do updates to them. JGM: (in an attempt to side step the issue of deltas and history) View a script as a set of nodes with properties and then store history or deltas as extra properties. BR: We should use the InterPress notion of lossless transformations a la Lampson, vis., require that f(g(S))=g(f(S)), where f and g are transformations in the appropriate directions. RA: We should stay away from determining which set of basic editing primitives should be used as a model (e.g., cut/paste vs. append/delete) JGM: InterDoc scripts must be able to represent audio documents as well (general agreement) BR: We need a strong formal idea of what an editor is. AJP: We need to understand either documents or editors, but we can't let both float free. RA: I only understand a lot of individual editors. WHP: Tioga uses properties (as suggested by JGM above) BR: Is there some kind of distinguished property of each node? JJH: The only thing that every node needs to have is its own identity, as in the PIE system (e.g., a unique ID) JGM: This might help with external document references also: an external reference would be a (file name, node UID) pair, which is location independent within a document. WHP: What happens if you have an external reference to a paragraph and then you move it from one document to another? Does the reference follow it? JJH: If we used a write-once storage model, the problem might be more tractable. BR: We need to be able to handle documents like collected bibliographies by reference. JGM: Perhaps we should distinguish different binding strengths; some external references follow the referees, others (the majority) do not. JJH: Unique IDs only make sense in immutable files (generally not agreed with) SMcG: The InterDoc standard should be pretty basic, almost a syntactic convention. We also need a set of recommendations for text documents, graphics documents, etc. JJH (quoting Tony Hoare): We should only standardize on those things that don't matter. There followed some general discussion on the fact that document nodes must have types. Two models were proposed: (1) each node has a list of types (i.e., the ways it can be viewed); and (2) how you view a node depends on how you got there. It seems we need both. RA: As SMcG proposed, we really need two standards, one basic, almost meta-standard, and a second set of specific standards. For the next meeting in two weeks, various people will try to produce initial base-level proposals (BR and JGM said they would). It would be nice if these proposals were distributed at least two days before the next meeting. Proposed next meeting: CSL Commons, Friday, June 19, 13:00 (let me know if you have a conflict). Written proposals to participants by Wednesday, June 17, therefore. JGM ------------------------------------------------------------ *start* 03713 00024 USt Date: 8 June 1981 2:24 pm PDT (Monday) From: Mitchell.PA Subject: Some Meta issues about any InterDoc standard To: InterDoc↑.pa In thinking about the form that an InterDoc standard might take, it occurred to me that (1) If, as was discussed in meeting #2 (81.06.05) we view an InterDoc script as a set of nodes with properties, then, following InterPress, we need to find a "source language" to represent them. Whatever the form of this source language, it must be efficiently processable by programs, which processing must include at least generation (from some private encoding), parsing (into some private encoding), and probably some kind of random access, depending on how many smarts an editor has (e.g., it might form a private encoding by recording that information in the InterDoc script to which it needs frequent access in a private structure while still accessing some of the original information in the script - somewhat as Bravo does for text pieces) (2) Many of the syntactic decisions made in the InterPress standard fit well here, too, and we should just adopt them where there is no obvious advantage in inventing our own. For example, an InterPress master is readable by both humans and programs, which, besides the obvious debugging/reading benefits, also has the virtue of thereby staying away from many machine-dependent issues, e.g., word size. Given that almost all InterPress masters and InterDoc scripts will be produced and consumed by software, their encoded forms should be as efficiently generatable and parseable as possible. In addition, InterDoc scripts can be expected to have rather long lifetimes, so minimizing the storage taken by them is a strong secondary priority. I propose the following mechanism to keep us honest about efficiency of script processing and storage: We should build both a generator and a parser for any language we come up with. The generator will take a Bravo file and produce an equivalent InterDoc script for it; the parser will take an InterDoc script and produce an equivalent Bravo file for it. This gives us a large test set (all the Bravo files in the world!) for measuring both processing speed and storage requirements. In particular, the generator and parser times give some idea of the overhead in transforming between private and InterDoc representations, and the ratio of size of InterDoc script to size of Bravo file can give us some idea of the space efficiency. Perhaps just as importantly, it will allow us to take any Bravo file and make it suitable for editing with another editor (e.g., Tioga when Scott's idea of having it be able to ingest InterDoc scripts is realized). If we parsed InterDoc scripts into a parse-tree form as part of generating a Bravo file, the size of the parse tree would represent a loose upper bound on the space that a private encoding ought to take, assuming the parse-tree form would not take advantage of any special knowledge about scripts. These would be good numbers to have in hand and would greatly increase my confidence in our work. (If we had the manpower, we might also consider writing a translator to map InterDoc scripts into InterPress masters to provide a further test of the standard(s) as well as a transition mechanism for printing old but valuable Bravo files on InterPress print servers.) Being able to sample a large number of Bravo files (or any other for which an InterDoc generator/parser pair can be made) makes lowering the average entropy of an InterDoc script more of an engineering and less a by-guess-and-by-golly task (it has worked exceedingly well for Mesa byte codes, why not here?). Your comments are earnestly solicited. JGM *start* 00832 00024 USt Date: 8 June 1981 5:09 pm PDT (Monday) From: Ayers.PA Subject: Re: Some Meta issues about any InterDoc standard In-reply-to: Mitchell's message of 8 June 1981 2:24 pm PDT (Monday) To: Mitchell cc: InterDoc↑ Reply-To: Ayers Your proposed mechanism (parser/generator) is an interesting thought. One quickie keep-the-ball-rolling comment: You are postulating, in your discussion on efficiency and size, that the Interdoc standard (some straightforward encoding thereof) will be the format that documents are normally stored in. This is not yet clear to me. I could believe, for example, that Star, in 1984, might file documents in its internal format (perhaps formally equivalent to the Interdoc standard, but in no sense an "encoding") and be willing to supply an interchange version on demand. Bob *start* 01218 00024 USt Date: 8 June 1981 5:41 pm PDT (Monday) From: Mitchell.PA Subject: Re: Some Meta issues about any InterDoc standard In-reply-to: Your message of 8 June 1981 5:09 pm PDT (Monday) To: InterDoc↑.pa If documents are not stored as InterDoc scripts, my arguments on efficiency and size are certainly less important, and your arguments about Star documents sound right. What I also envision in 1984, however, is a cloud of Ethernet-compatible text-editing systems able to send messages. I don't expect that cloud to be in the shape of a big X (for Xerox): it will have Wangs, IBMs, and who knows what else in it. Thus, many messages will travel as InterDoc scripts and the time you spend waiting for one to be ingested so you can read it will be important. The same arguments apply to sending messages. Assume there are N such product lines using InterDoc. By analogy with the argument that the value of a communications system goes up as the square of the number of its subscribers, the value of an InterDoc standard goes up as N↑2. I expect that we will (after much blood and screaming) have a good standard. So efficiency matters because of the possibility of an N↑2 success disaster. JGM *start* 00233 00024 USt Date: 9 June 1981 2:00 pm PDT (Tuesday) From: Ayers.PA Subject: Re: Some Meta issues about any InterDoc standard In-reply-to: Mitchell's message of 8 June 1981 5:41 pm PDT (Monday) To: Mitchell Sounds fair. *start* 01023 00024 USt Date: 9 June 1981 5:10 pm EDT (Tuesday) From: Lampson.PA Subject: Re: Some Meta issues about any InterDoc standard In-reply-to: Mitchell's message of 8 June 1981 2:24 pm PDT (Monday) To: Mitchell cc: InterDoc↑ You should be aware that compactness was not a major consideration in the design of the Interpress interchange encoding. This encoding probably costs at least a factor of two in space over would could be easily achieved, and much more in some cases. I think your ideas for translating exisitng documents into a proposed representation are excellent. Sil and Draw files (especially Sil files) might also be candidates. A document design with styles (which I strongly favor) raises a problem for translation from existing documents which do not have styles. This problem can be solved surprisingly well by deducing the styles in some simple-minded way from observed repetitions of the same properties. This was in fact done about three years ago by Bill Maybury for Bravo documents. *start* 00453 00024 USt Date: 9 June 1981 3:09 pm PDT (Tuesday) From: Mitchell.PA Subject: Re: Some Meta issues about any InterDoc standard In-reply-to: Lampson's message of 9 June 1981 5:10 pm EDT (Tuesday) To: Lampson cc: Mitchell Roger on the issue of styles, Sil, and Draw. I am currently building a first draft of a syntax for documents and intend it to include styles. It should be ready in about a week (he says with eternal optimism!). Jim *start* 02907 00024 USt Date: 16 June 1981 3:21 pm PDT (Tuesday) From: Mitchell.PA Subject: Mitchell's whiteboard as of Friday, June 12 To: Horning cc: Mitchell Basically, the interdoc specifications can be viewed as having three levels. At level 0, the lowest and most general level, there is a very general, 1980-style S-expression syntax with no semantics. At level 1, some simple semantics are associated with various of the syntactic constructs (e.g., definitions, meta-properties, internal references, node identity, and node types). Most of the specifications actually lie at level 2, where properties are specified in groups and given semantics (e.g., Star documents, Star text documents, Star graphics, 860 text, Tioga nodes, Tioga paragraphs, etc.). Most of our discussion centered around the parse-tree representation of an InterDoc script as a model. Level 0: In the basic document-expression (D-expr) syntax, the fundamental literals are numbers, strings, type names, records, and sequences. Numbers and strings are taken directly from InterPress. A type name is simply an identifier. A record has the form record ::= [identifier] "[" literal* "]" A sequence has the form sequence ::= [identifier] "{" literal* "}" Conceptually, a sequence is supposed to be homogeneous, although there is no way to specify this in the basic grammar. We think of the components of a record or sequence as an ordered set of properties associated with a node whose type is given by the (optional) identifier prefixing the record or sequence. Level 1: We tried to identify a set of meta-properties of properties. This set must ultimately converge to a small, constant set. Our first candidates are inherited | overriding | local -- no default; pick one ordered | unordered -- default is ordered separate | concatenable -- default is separate integral | deletable | ignorable -- default is integral; a deletable property is one which caches information - it should be deleted if the node to which it is attached is altered; an ignorable property is like a hint - an editor can safely ignore it if it doesn't understand it, but if it does understand it, it must check the hint's validity. We noticed that one could use the level 0 syntax to describe D-exprs in which the basic editable information was in a fringe of the parse tree or one in which it was buried in the tree and not just in the fringe. Example (a Laurel message): TextDoc[ origin[Laurel] margins{85 530} msgFields[from[<Mitchell>] to[<Horning>] date{1981 6 16} cc{<Lampson|Mitchell|InterDoc↑.pa>} Subject[<Mitchell's whiteboard as of Friday, June 12>] body[<Basically, the interdoc ...Tioga paragraphs, etc.).> <...> ... <...>] authenticated ] ~justified font[TimesRoman 10] ] Let me know what I have forgotten, and let's get together tomorrow a.m. to discuss this some more. Jim *start* 01882 00024 USt Date: 16 June 1981 4:30 pm PDT (Tuesday) From: Horning.pa Subject: Re: Mitchell's whiteboard as of Friday, June 12 In-reply-to: Your message of 16 June 1981 3:21 pm PDT (Tuesday) To: Mitchell cc: Horning Jim, That summarizes most things pretty well. Let me just mention a couple of points where our models may not yet have matched: -I tend to think of a script as containing four sorts of information: "content" (e.g., the character string itself) "structure" (e.g., hierarchical + internal links) "properties" (e.g., fonts, constraints) "definitions" (e.g., styles) It feels like you are trying to subsume everything under either "structure" or "properties," primarily by making content a property like anything else. I was trying to make "content" syntactically distinct by calling it the "fringe." We should at least discuss the relative merits of the two forms. -Since Friday I have come to wonder how many of the meta-properties can be handled by "definitions"/styles. In fact, how many properties are to be treated may well be a function of the environment (target editor and purpose for editing), rather than the property, per se. (E.g., for some purposes, it is fine to display footnotes "inline" while editing; for others, they might be suppressed entirely. The choice is surely not intrinsic in the script or the property.) If a Style is a Property -> Property* mapping, then we might imagine for each editor a small library of styles of dealing with foreign documents, or prefixing a document with such a style. This focusses attention on whether there are properties that we expect to be widely understood (e.g., this node contains script text that remains unparsed), rather than on meta-properties. There is a Cedar coordinators meeting at 11, so we should get together enough before that to have time for discussion. Jim H. *start* 05894 00024 USt Date: 17 June 1981 5:18 pm PDT (Wednesday) Sender: Horning.pa Subject: Interdoc Proposal, 1 - Metaconsiderations From: Horning, Mitchell To: Interdoc↑ [Recall that at our last meeting Mitchell/Horning and Reid agreed to produce definite proposals by today, so there would be a tangible basis for discussion and determining areas of agreement and disagreement at our meeting Friday. Mitchell and I have decided to present separately our conclusions about requirements and constraints on Interdoc, and a concrete proposal illustrating that it is feasible to satisfy them. This message presents the former. You can expect a companion message from Mitchell with the latter. -- Jim H.] Although we do not need (thank God) to define a "Standard Editor" nor standards for editors, Interdoc will require us to take a fairly strong position on what constitutes "an editable document." This note tries to explore that question at a fairly abstract level. If we restrict scripts to ASCII characters (or anything similar) then any editor prepared to deal with arbitrary strings can be used to edit a script (much like ↑Z Unformatted Get in Bravo), but it will not be easy to use an editor in this fashion to transform valid scripts into valid scripts that correspond to intended changes. However, where a script is "mostly" understood by a particular editor, dealing with the remaining parts as an uninterpreted string/D-expression may not be a bad escape hatch. In line with the requirement that, for each editor, there be an "information lossless" transformation to Interdoc and back, there must be provisions in Interdoc for the representation of all kinds of information maintained by "reasonable" editors. We don't think it is reasonable be very constraining about what that information will be; at best, we can hope to provide a general and open-ended framework in which it can be placed. In surveying the editors we know and love/hate, some or all of them maintain information in the following four broad categories, which we will discuss in turn: "content" (e.g., a character string, a set of spline coefficients) "structure" (e.g., word/sentence/paragraph ... , line/page/signature ...) "properties" (e.g., fonts, constraints) "definitions" (e.g., styles) Interdoc should not attempt to replace these examples with complete enumerations. It must be prepared to accept scripts with, for example, completely different hierarchies (e.g., macro/cell/chip/board ...). Content: Clearly text and graphics are common special cases, and we should cater to them as well as Interpress does. Indeed, in this area the intersection of requirements with those of Interpress is so high that we should strive for consistency. (It's presumably not too late for small additions to Interpress, if we discover a need for them.) It may be sensible to restrict content to literals (although not conversely!). Structure: A general structure, of which all the editors we know use special cases, is the labelled directed graph. (We can quibble about whether the content should be at the nodes or on the arc labels.) However, there are two specializations of general graphs that will be so common in practice that they should be treated specially: trees--most editors that support any structure at all have a "dominant" hierarchy that maps well into trees (although they use these trees to represent different hierarchies). We need a good linear notation for these trees (D-expressions?). In the case of multiple hierarchies, the "dominant" one will certainly be the one used to control the scopes of properties and definitions (i.e., we consider some form of block structure to be a practical necessity). sequences--the most important, and most frequent, relationship between pieces of content is logical adjacency, which should be representable by textual juxtaposition in the script. This implies a less compact notation for sets, where order is insignificant, but they are less heavily used. Structure beyond that contained in the "dominant" hierarchy will need to be represented explicitly. My going-in position is that explicit "labels" in the script will suffice for this purpose. We are undecided whether such labels belong at the syntactic level, or whether they can be pushed off onto "properties." Properties: Not much can be said in general. It must be possible to unambiguously parse scripts without knowing the meanings of any of the properties it refers to. There should be standard syntax for referring to properties. To first approximation, property names seem very much like "free variables" in logic and lambda calculus. There will be a small set of properties that every conforming parser/interpreter can be expected to interpret; we may also need to define some standard meta-properties that make it possible for an editor to deal reasonably with properties it does not understand. Definitions: Either in the basic syntax, or at a very low level, we need to include a mechanism for introducing definitions with restricted scope. This should have approximately the semantics (if not the syntax) of lambda expressions, i.e., it should bind a property name to a property expression. We expect that scope will be (primarily) controlled by the tree structure of the script; it is still a matter for debate whether standard Algol/lambda calculus rules (inner bound variables are not affected by an outer binding) is adequate for all the important cases. We expect this mechanism to be used by sophisticated editors for such things as styles. Some scripts will come with a prefix containing non-standard property definitions that are global to the document. There may be standard libraries containing definitions that allow complex properties to be edited in terms of properties understood by simpler editors. *start* 05311 00024 USt Date: 18 June 1981 4:19 pm PDT (Thursday) From: Mitchell.PA Subject: A first cut at a document syntax - for your comment before I send it To: Horning.PA cc: Mitchell I envision an InterDoc standard as having three levels. Level 0, the lowest and most general level, provides a general, 1980s-style S-expression syntax. Level 1 associates some simple semantics with various of the syntactic constructs (e.g., definitions, scope, node identity, node types). Most of the specifications actually lie at level 2, where document "types" are developed and given semantics (e.g., Star documents, Star text documents, Star graphics, 860 text, Tioga nodes, Laurel messages, etc.). Level 0: In the basic document-expression (D-expr) syntax, the fundamental literals are numbers, strings, identifiers, labels (for internal references), records, and sequences. Numbers and strings are taken directly from InterPress. Rather than work out a set of syntax productions, I have produced an example of an (imaginary) Laurel message embellished with paragraphs, fonts, etc. I hope the commentary to the right of the example will aid understanding. The first version of the document uses a number of abbreviating mechanisms for common cases; these are listed in the notes following the example, and, for comparison, a fully expanded version follows those notes: TextDoc | Message -- can be viewed as TextDoc or Message [ justified←F -- "←" means overridable default font←[face=TimesRoman size=10 style=n] -- keyword notation margins←[2540 19050] -- positional notation leading[x=1 y=1] MsgInfo [ -- keyword notation, "="s elided pieces[hdr=#1 body=#2] -- internal references to labelled parts date{1981 6 18} from=<Mitchell.PA> -- collected by Laurel; could use refs? subject=<A Sample Document Syntax> to=<Horning.PA> cc=<Mitchell,InterDoc↑.pa> authenticated -- authenticated=T ] CONTENTS= [ 1: Section -- "1" is the label of this section [ CONTENTS=Paragraph[font[style=bold]] -- distributed over sequence { -- a sequence of paragraphs all in boldface <Date: 18 June 1981 9:18 am PDT (Thursday)> <From: Mitchell.PA> <Subject: A Sample Document Syntax> <cc: Mitchell, InterDoc↑.pa> } 2: Section [ leading[y=6] -- override outer y leading CONTENTS=Paragraph { <text of paragraph> <text of paragraph> <text of paragraph> <text of paragraph> } ] ] ] Notes: x←value means that x has that value in the absence of any surrounding assignment of the form x=value "=" is the default binding and means local value wins (Algol scope). Can elide "=" when followed by "[" or "{". x=T can be replaced by x; x=F can be replaced by ~x The contents field of a node can be stated explicitly as "CONTENTS=" or can be given as the last literal of a node (if last, it makes semantics, inheritance, etc. easier for an editor ingesting a script) When each field of a structured literal has a distinct type, the type names can be used to identify the fields. This has been used heavily in the example. Here is the fully expanded version of the above example: DOCUMENT=TextDoc | Message [ justified←F, font←font[face=TimesRoman,size=10,style=n], margins←margins[left=2540,right=19050], MsgInfo=MsgInfo [ pieces=pieces[hdr=#1,body=#2], date=date[yy=1981,mm=6,dd=18], from=from<Mitchell.PA>, subject=subject<A Sample Document Syntax>, to=to<Horning.PA>, cc=cc<Mitchell,InterDoc↑.pa>, authenticated=T ] CONTENTS= [ leading=leading[x=1], 1: Section=Section [ leading=leading[y=1], CONTENTS=Paragraph[font.style=bold] { <Date: 18 June 1981 9:18 am PDT (Thursday)>, <From: Mitchell.PA>, <Subject: A Sample Document Syntax>, <cc: Mitchell, InterDoc↑.pa> } 2: Section=Section [ leading=leading[y=6], CONTENTS=Paragraph { <text of paragraph>, <text of paragraph>, <text of paragraph>, <text of paragraph> } ] ] ] Level 1: We tried to identify a set of meta-properties of properties. This set must ultimately converge to a small, constant set. Our first candidates are inherited | overriding | local -- no default; pick one ordered | unordered -- default is ordered separate | concatenable -- default is separate integral | deletable | ignorable -- default is integral An integral property is one which cannot be thrown away without losing important information (e.g., the textual content of a node). A deletable property is one which caches information - it should be deleted if the node to which it is attached is altered; an ignorable property is like a hint - an editor can safely ignore it if it doesn't understand it, but if it does understand it, it must check the hint's validity. See Jim Horning's message about InterDoc meta-thoughts for a discussion of scope of properties, dominant structure, and definitions. JGM *start* 06333 00024 USt Date: 19 June 1981 9:37 am PDT (Friday) From: Mitchell.PA Subject: A first cut at a document syntax To: InterDoc↑.PA I envision an InterDoc standard as having three levels. Level 0, the lowest and most general level, provides a general, 1980s-style S-expression syntax. Level 1 associates some simple semantics with various of the syntactic constructs (e.g., definitions, scope, node identity, node types). Most of the specifications actually lie at level 2, where document "types" are developed and given semantics (e.g., Star documents, Star text documents, Star graphics, 860 text, Tioga nodes, Laurel messages, etc.). Level 0: In the basic document-expression (D-expr) syntax, the fundamental literals are numbers, strings, identifiers, labels (for internal references), records, and sequences. Numbers and strings are taken directly from InterPress. Rather than work out a set of syntax productions, I have produced an example of an (imaginary) Laurel message embellished with paragraphs, fonts, etc. I hope the commentary to the right of the example will aid understanding. The first version of the document uses a number of abbreviating mechanisms for common cases; these are listed in the notes following the example, and, for comparison, a fully expanded version follows those notes: TextDoc | Message -- can be viewed as TextDoc or Message [ justified←F -- "←" means overridable default font←[face=TimesRoman size=10 style=n] -- keyword notation margins←[2540 19050] -- positional notation leading[x=1 y=1] MsgInfo [ -- keyword notation, "="s elided pieces[hdr=#1 body=#2] -- internal references to labelled parts date{1981 6 18} from=<Mitchell.PA> -- collected by Laurel; could use refs? subject=<A Sample Document Syntax> to=<Horning.PA> cc=<Mitchell,InterDoc↑.pa> authenticated -- authenticated=T ] CONTENTS= [ 1: Section -- "1" is the label of this section [ CONTENTS=Paragraph[font[style=bold]] -- distributed over sequence { -- a sequence of paragraphs all in boldface <Date: 18 June 1981 9:18 am PDT (Thursday)> <From: Mitchell.PA> <Subject: A Sample Document Syntax> <cc: Mitchell, InterDoc↑.pa> } 2: Section [ leading[y=6] -- override outer y leading CONTENTS=Paragraph { <text of paragraph> <text of paragraph> <text of paragraph> <text of paragraph> } ] ] ] Notes: x←value means that x has that value in the absence of any surrounding assignment of the form x=value "=" is the default binding and means local value wins (Algol scope). Can elide "=" when followed by "[" or "{". x=T can be replaced by x; x=F can be replaced by ~x The contents field of a node can be stated explicitly as "CONTENTS=" or can be given as the last literal of a node (if last, it makes semantics, inheritance, etc. easier for an editor ingesting a script) When each field of a structured literal has a distinct type, the type names can be used to identify the fields. This has been used heavily in the example. Here is the fully expanded version of the above example: DOCUMENT=TextDoc | Message [ justified←F font←font[face=TimesRoman size=10 style=n] margins←margins[left=2540 right=19050] MsgInfo=MsgInfo [ pieces=pieces[hdr=#1 body=#2] date=date[yy=1981 mm=6 dd=18] from=from<Mitchell.PA> subject=subject<A Sample Document Syntax> to=to<Horning.PA> cc=cc<Mitchell InterDoc↑.pa> authenticated=T ] CONTENTS= [ leading=leading[x=1] 1: Section=Section [ leading=leading[y=1] CONTENTS=Paragraph[font.style=bold] { <Date: 18 June 1981 9:18 am PDT (Thursday)> <From: Mitchell.PA> <Subject: A Sample Document Syntax> <cc: Mitchell InterDoc↑.pa> } 2: Section=Section [ leading=leading[y=6] CONTENTS=Paragraph { <text of paragraph> <text of paragraph> <text of paragraph> <text of paragraph> } ] ] ] Level 1: At this level we must talk about how and when properties are inherited. In general, this is a merging operation since many properties will be acquired from "styles" by a level of indirection. In fact, a style has the same syntax as any node (except that it needs to be defined). One might ask whether styles should be lambda expressions so that a document node could particularize a style to better match constraints of its own. I think the answer is "no", for the following reason: In general, a node will have a number of styles, and we are interested more in combining the sets of properties derived from the styles than by parametrizing each one, because after parametrization and expansion we still have to somehow combine the properties. Let's just have one mechanism, the combining rule, rather than two. We tried to identify a set of meta-properties of properties. This set must ultimately converge to a small, constant set. Our first candidates are inherited | overriding | local -- default is inherited integral | deletable | ignorable -- default is integral Normally, Algol scope rules hold for inheritance, and a local definition of some property overrides a less local one. However, it seems desirable also to allow some properties in an outer scope to override a local definition. Finally, some properties are meant to be local only and not to be imported into subnodes in the dominant hierarchy An integral property is one which cannot be thrown away without losing important information (e.g., the textual content of a node). A deletable property is one which caches information - it should be deleted if the node to which it is attached is altered; an ignorable property is like a hint - an editor can safely ignore it if it doesn't understand it, but if it does understand it, it must check the hint's validity. See Jim Horning's message about InterDoc meta-thoughts for a discussion of scope of properties, dominant structure, and definitions. JGM *start* 07051 00024 USt Date: 19 June 1981 2:52 pm PDT (Friday) From: Mitchell.PA Subject: A first cut at a document syntax - slightly revised To: InterDoc↑.PA I envision an InterDoc standard as having three levels. Level 0, the lowest and most general level, provides a general, 1980s-style S-expression syntax. Level 1 associates some simple semantics with various of the syntactic constructs (e.g., definitions, scope, node identity, node types). Most of the specifications actually lie at level 2, where document "types" are developed and given semantics (e.g., Star documents, Star text documents, Star graphics, 860 text, Tioga nodes, Laurel messages, etc.). Level 0: In the basic document-expression (D-expr) syntax, the fundamental literals are numbers, strings, identifiers, labels (for internal references), records, and sequences. Numbers and strings are taken directly from InterPress. Rather than work out a set of syntax productions, I have produced an example of an (imaginary) Laurel message embellished with paragraphs, fonts, etc. I hope the commentary to the right of the example will aid understanding. The first version of the document uses a number of abbreviating mechanisms for common cases; these are listed in the notes following the example, and, for comparison, a fully expanded version follows those notes: TextDoc | Message -- can be viewed as TextDoc or Message ( justified←F -- "←" means overridable default font←(face=TimesRoman size=10 style=n) -- keyword notation margins←(2540 19050) -- positional notation leading(x=1 y=1) MsgInfo ( -- keyword notation, "="s elided pieces(hdr=#1 body=#2) -- internal references to labelled parts date{1981 6 18} from=<Mitchell.PA> -- collected by Laurel; could use refs? subject=<A Sample Document Syntax> to=<Horning.PA> cc=<Mitchell,InterDoc↑.pa> authenticated -- authenticated=T ) CONTENTS= ( 1: Section -- "1" is the label of this section ( CONTENTS=Paragraph(font(style=bold)) -- distributed over sequence { -- a sequence of paragraphs all in boldface <Date: 18 June 1981 9:18 am PDT (Thursday)> <From: Mitchell.PA> <Subject: A Sample Document Syntax> <cc: Mitchell, InterDoc↑.pa> } 2: Section ( leading(y=6) -- override outer y leading CONTENTS=Paragraph { <text of paragraph> <text of paragraph> <text of paragraph> <text of paragraph> } ) ) ) Notes: I assume that an editor that ingests such a script understands all of the syntax and low-level semantics and has definitions available to it for the rest (e.g., TextDoc; see discussion under Level 2, below). x←value means that x has that value in the absence of any surrounding assignment of the form x=value. "=" is the default binding and means local value wins (Algol scope). Can elide "=" when followed by "(" or "{". x=T can be replaced by x; x=F can be replaced by ~x. The contents field of a node can be stated explicitly as "CONTENTS=" or can be given as the last literal of a node (if last, it makes semantics, inheritance, etc. easier for an editor ingesting a script). When each field of a structured literal has a distinct type, the type names can be used to identify the fields. This has been used heavily in the example. #n is an internal reference to the node labelled with the UID n (i.e., preceded by "n:") This is the only way to link things within the document other than the linking implicit in the hierarchy. Here is the fully expanded version of the above example: DOCUMENT=TextDoc | Message ( justified←F font←font(face=TimesRoman size=10 style=n) margins←margins(left=2540 right=19050) MsgInfo=MsgInfo ( pieces=pieces(hdr=#1 body=#2) date=date(yy=1981 mm=6 dd=18) from=from<Mitchell.PA> subject=subject<A Sample Document Syntax> to=to<Horning.PA> cc=cc<Mitchell InterDoc↑.pa> authenticated=T ) CONTENTS= ( leading=leading(x=1) 1: Section=Section ( leading=leading(y=1) CONTENTS=Paragraph(font.style=bold) { <Date: 18 June 1981 9:18 am PDT (Thursday)> <From: Mitchell.PA> <Subject: A Sample Document Syntax> <cc: Mitchell InterDoc↑.pa> } 2: Section=Section ( leading=leading(y=6) CONTENTS=Paragraph { <text of paragraph> <text of paragraph> <text of paragraph> <text of paragraph> } ) ) ) Level 1: At this level we must talk about how and when properties are inherited. In general, this is a merging operation since many properties will be acquired from "styles" by a level of indirection. In fact, a style has the same syntax as any node (except that it needs to be defined). One might ask whether styles should be lambda expressions so that a document node could particularize a style to better match constraints of its own. I think the answer is "no", for the following reason: In general, a node will have a number of styles, and we are interested more in combining the sets of properties derived from the styles than by parametrizing each one, because after parametrization and expansion we still have to somehow combine the properties. Let's just have one mechanism, the combining rule, rather than two. We tried to identify a set of meta-properties of properties. This set must ultimately converge to a small, constant set. Our first candidates are inherited | overriding | local -- default is inherited integral | deletable | ignorable -- default is integral Normally, Algol scope rules hold for inheritance, and a local definition of some property overrides a less local one. However, it seems desirable also to allow some properties in an outer scope to override a local definition. Finally, some properties are meant to be local only and not to be imported into subnodes in the dominant hierarchy An integral property is one which cannot be thrown away without losing important information (e.g., the textual content of a node). A deletable property is one which caches information - it should be deleted if the node to which it is attached is altered; an ignorable property is like a hint - an editor can safely ignore it if it doesn't understand it, but if it does understand it, it must check the hint's validity. Level 2: This is where it gets hard. The bulk of the interchange standard will lie in the definitions of node types, i.e., what types there are, what their properties are, and what the meta-properties of those properties are. See Jim Horning's message about InterDoc meta-thoughts for a discussion of scope of properties, dominant structure, and definitions. JGM ------------------------------------------------------------ *start* 15895 00024 USt Mail-from: Arpanet host SU-SCORE rcvd at 22-JUN-81 0740-PDT Mail-from: ARPANET site SU-AI rcvd at 21-Jun-81 2335-PDT Date: 21 Jun 1981 23:35:13-PDT From: reid at Shasta To: InterDoc↑@Parc-Maxc Subject: thoughts on InterDoc Cc: reid, reid@Score Remailed-date: 22 Jun 1981 0739-PDT Remailed-from: Brian K. Reid <CSL.BKR at SU-SCORE> Remailed-to: InterDoc↑ at PARC-MAXC Here is the current draft of my thoughts on InterDoc. I am sorry to tell you that it has been done in Scribe rather than in Laurel or something (I had to work from home) so that you cannot read it easily on your screens. In case you would like to read it uneasily on your screens, I have included here a draft Scribed in such a way as to remove all font information. You can get the real thing by printing the file [Maxc]<Reid>IPthoughts.press, making sure that you use Maxc's PRESS command to do it with unless you are an FTP wizard. I might be a few minutes late for tomorrow's meeting, as I am going to be driving back from Sunnyvale with a load of wire for Stanford and I don't know how to predict the rush-hour traffic. Brian THOUGHTS ON INTERDOC BRIAN K. REID 21 JUNE 1981 Table of Contents 1. Principles 2. The lowest Syntactic level 2.1. The basic data types 2.2. Basic structure 2.3. Syntactic representation 2.4. Links 3. Some Semantics 3.1. Default properties and inheritance 3.2. Object utility properties 3.3. Object origination properties 4. Definitions and Macros 5. Higher-level constructs 1. Principles The Interdoc format is a language for the representation of documents. The language itself is an extremely simple syntax for the representation of certain concepts along with some simple rules for attaching semantics. This document summarizes my current thinking about what Interdoc should look like. Remember as you read this that Interdoc is not supposed to be intelligible to humans; it is the job of a text editor (or prettyprinter) to interpret for you. At this stage in the design of Interdoc, we are thinking only in terms of the interchange representation, which is to be entirely in the ISO character set. ISO looks like ASCII, but has no "[", "]", "↑", or " " in it. Xerox character sets also look like ASCII, but bungle the "←", and "`" characters; it is therefore not a good idea to use them either. This outline represents my mental picture of what we have all been designing in committee. Various pieces of that picture were painted by Horning and Mitchell, and I have fitted them into my syntax but left them more or less alone. This syntax looks a lot like some dialect of Lisp if you squint your eyes right. Please don't shy away from it on that account. Regardless of how you feel about Lisp as a programming tool, you as a computer scientist must admit that its expression language is unparalleled for the combination of simplicity and precision that it provides. 2. The lowest Syntactic level At the lowest syntactic level we need to represent text characters, integers, reals, and the relationships among them. Although we can represent characters with integers, it will make a more useful vestigial case for simple-minded editors if a character-code representation is used, so we pick ISO. 2.1. The basic data types Let a scalar character be represented by an ISO alphanumeric, or by an equals sign preceding an integer. Thus, "a" and "A" are scalar characters, as are "=97 " and "=65 ". The specific set of characters that can be directly represented by their ISO graphic should be fairly limited, so that quoting and nesting problems are avoided.{The "=" escape convention is borrowed from Interpress, but I don't like the concatenation rule in Interpress because it is not context-free; in the Interpress string <Alpha =24 25 25=>, which can be abbreviated as <Alpha =24 25 25> it is not possible to jump in to a random place in the middle of the string and break it into two strings without searching all the way back to the beginning of the string. So I propose that this string be represented in Interdoc as <Alpha =24 =25 =25 >.} There is exactly one space after the escaped number. Let a scalar integer be represented the same way it is in your favorite programming language: a sequence of decimal digits optionally prefixed by a sign. Let a scalar real be represented the same way it is in your favorite programming language. I won't give a rigorous definition here. We add the restriction not found in your favorite programming language that a number (integer or real) must be followed by a space. This is so we can have a simple rule for constructing vectors out of scalars by ordinary concatenation. Let a vector of characters, also called a string, be a group of zero or more adjacent scalar characters surrounded by <pointy brackets>. The pointy bracket character can of course be represented with its numeric code, so there is no problem putting that character (or any other) in a string. Thus <Hi mom!> is a string, and so is <Hi mom=33 >. Let a vector of integers be a group of zero or more adjacent scalar integers surrounded by {braces}. Thus {1 0 3333 67 } is a vector of integers. Let a vector of numbers be a group of zero or more adjacent scalar numbers surrounded by {braces}. Thus {1 0.0 3333 67 } is a vector of reals. (We might be able to get away with the single construct "vector of numbers" and rely on properties to denote integer-ness; must think further.) 2.2. Basic structure The Interdoc format must be able to represent data and structure. The kinds of structure that we need to be able to represent are: - Regions. A region is a (possibly empty) piece of the document. All of the document data must be in some region. Regions are the mechanism for representing containment; therefore regions nest. Non- hierarchical nesting of regions is not permitted; e.g. something like the Scribe construct @b[foo@i(baz]frozz) is not permitted. (note: this restriction might be too limiting, but it is certainly a useful one. Must discuss.) - Points. A point is a marker that has no intrinsic size; it is attached to some datum. Although a point could be represented as a null region with a certain kind of property, I believe that points are important enough to deserve special treatment. - Links. A link is a specification that some pair of {points or regions or links} has some directed relationship. I don't think we need to be able to link properties, q.v. Although links are directed, it is possible to build nondirectional links out of directional links, but not vice versa. - Properties. All regions, points, links, and properties can have properties. A basic datum cannot have a property; but all basic data are in some region, and regions can have properties, so that is not a limitation. A property is actually an attribute/value pair, which makes it something like a LISP property list that is one element long. 2.3. Syntactic representation Regions, points, and links must be represented in such a way that they are trivial to parse, and that properties can be unambiguously attached. For the purposes of this representation, we must introduce the concept of an identifier. No semantics at all attached; an identifier is just a list of alphanumeric characters (not necessarily even beginning with a letter, though it won't hurt to add this restriction and it might save our ass later if some sort of syntactic extension becomes needed. Avoid unnecessary generality.) Let the representation of a region be a parenthesized list beginning with the identifier "region", and followed by anything at all: (region collection of regions, points, or basic data) Thus, for example: (region <Hi mom!>) (region <Hi mom!> (region <nested region>) <back to =120 101= outer>) Marks are represented using exactly the same syntax as regions. We can put a mark into a region by splitting the data to be marked into two pieces, and placing the mark between them: (region <Hi > (mark zotchmark) <mom!>) We can attach a property to a region by placing a property mark before the first datum in the region: (region (property color green) <Hi mom!>) and in fact we can do the same thing with marks: (region (property spin up) <Hi > (mark (property color 7) zark) <mom!>) We can attach a property to a property in the same way: (region (property (property 287 9) spin up) <Hi mom!>) Syntactically we could represent (property color green) as just (color green), and let the definition of "color" as a property attribute be the key, but we don't want to force the processing program to do a symbol-table lookup on the word "color" to discover that (color green) is just a property. In the interests of brevity, we should probably shorthand this notation as follows, (r (p (p 287) spin-up) <Hi mom!>) though it will help our discussions on these matters if longer words are used in examples. Even the dumbest sort of editor can gracefully skip over pieces of this format that it does not understand. When the editor's import routine finds an open parenthesis and an identifier that it does not understand. It drops into a mode where it ignores all characters except open and close parentheses, and stops when it reaches the closing paren that matches the one it found initially. It is free to do this because the "(" and ")" characters are not permitted in identifiers, numbers, or character strings. 2.4. Links Links are more complicated than regions, marks, or properties because they make a non-local reference. Part of the lowest-level specification for links must therefore be some scheme for satisfying those references, in order that there be no ambiguity about what they link to. On the other hand, it is sometimes vital to be able to include an unbound link in a piece of a document, so that when it is moved to a particular context its links will bind to the appropriate places in its new context. Unambiguousness is not the same as rigidity. Superficially, though, links are very much like you'd expect them to be. Ignoring for now all of the hard problems of non-local referencing, links look like this: A link has an origin and a destination. The origin of a link looks like this: (region (link dest-name) <Hi mom!>) which links the region containing the character string <Hi mom!> to a symbolic destination named dest-name. The destination part of a link is very similar, and looks like this: (region (hook dest-name) <Hi there, junior.>) In essence, we provide a hook that matches the name referenced in some link. 3. Some Semantics The semantics that are wired in to Interdoc have mostly to do with the rules for handling and inheriting properties, for binding links, and for providing simple rules by which an editor can know which parts of a document it can ignore and which parts it must be able to process. I am assuming the existence of a definition facility, though I haven't worked out its details yet. I am imagining something of the form: (define color (property type property) (property inheritance local)) (define subheading (property type region) (property style SH2)) These define a property named "color" and a certain kind of region called a "subheading". 3.1. Default properties and inheritance (This and the following section are my attempts to expand upon the metaproperty notion in the Horning/Mitchell note.) We don't want to have to re-specify the complete list of properties attached to everything in a document. We get around this with a set of defaults and a set of property inheritance properties. When a property is defined, it is given an inheritance property named "inherit", whose values are in {inherited, overriding, local}. Normally a property would be inherited. An overriding property is one that cannot be redeclared in anything contained inside the object given that property; for example, a region might be defined as a figure region, and given the property "keep this all on one page". That property would be given the property "overriding" so that nothing nested inside the figure could break it up accidentally. A value of "local" means that this property value temporarily overrides any inherited values, but that objects nested further inside this one do not inherit that value. In California we spend a lot of time and energy worrying about property values. 3.2. Object utility properties The property named "utility" can take on values from {integral, deletable, ignorable}. An integral object is one that contains the master copy of some data. If it is deleted the document will be damaged and if it is ignored the document will be incomplete. A deletable object is one that is a cached copy of some derivative of other data. Deletable objects can always be regenerated, but not necessarily without expending non-trivial computation resources. An ignorable object is one that you are free to ignore if you don't understand it, but that you are not permitted to delete. Private henscratches left by some editing program for its later re-use would be ignorable objects; they had better be there, but they are useful only to certain programs. Ignorable non- region objects include properties for color of text (ignored by monochrome printers) and marks indicating line breaks determined by some particular formatting program. 3.3. Object origination properties Some objects are placed in a document by a specific program, and marked "ignorable" so that all other programs can safely bypass it. Other objects are placed in a document with the intention that they be looked at by all comers. We want to provide a mechanism whereby some editing agent can attach its special stamp to an object, so that it and the world can know how the object came to be there. This is done with the "creator" property. Values for the "creator" property are arbitrary names, intended to be the name of the program (or person, in the case of a reviewer adding annotations to something) who claims responsibility. No attempt at validation or enforcement is to be made, of course. 4. Definitions and Macros All names except the small set of built-in names (which we are not ready to disclose to you) must be defined either by a (define...) or a (macro...) object. It may be that the two end up being equivalent, except that I think it will be safe for a macro object to have parameters (since we are so restrictive w.r.t. delimiter syntax there will be none of the usual quoting problems associated with macros). I'm not ready to provide details, but the rules for expanding macros must be rigid and trivial, in order that macros be processable by all readers. I'm pretty sure the MacLisp DefMacro function provides exactly the right qualities, but I can't find my MacLisp manual. A (define...) object works exactly like a Scribe @Define command, which is to say that it defines an aggregate property by combining other properties (which can also be aggregated if you like). I can elaborate if people don't know what that means. 5. Higher-level constructs (watch this space) *start* 01315 00024 USt Date: 24 June 1981 4:07 pm PDT (Wednesday) From: Mitchell.PA Subject: Meetings 3a and 3b, 19 June 1981 and 22 June 1981 To: Interdoc.pa Attendees: Bill Paxton (WHP), Robert Ayers (RA), Scott McGregor (SMcG), Alan Perlis (AJP), Brian Reid (BR), Jim Horning (JJH), and Jim Mitchell (JGM) Discussion at this (extended) meeting centered around proposals from Horning, Mitchell, and Reid. WHP: Perhaps an Interdoc script should package all the text, bitmaps, etc. in one area and all the structural information in a separate place in a script so that an editor would never have to scan the contents (text, scanned images, etc.) in a script on input or output, but only the structural information. This suggestion was not accepted because even the contents have to be in some ISO-compatible form and have to be "scanned" anyway. There was much discussion of parametrized macros versus combining styles. The only resolution of this was that we all agreed to call them "definitions". JGM: It would be nice if one could have a general script-to-master converter which would work so long as it had access to all the right InterDoc definitions and InterPress dicitionaries. (slight worry about creeping grandiosity). The next meeting is Friday, 26 June 1981, at 13:30 in the CSL Commons. JGM *start* 02828 00024 USt Date: 22 Jun 1981 15:42 PDT From: Horning at PARC-MAXC Subject: Re: thoughts on InterDoc In-reply-to: Reid's message of 21 Jun 1981 23:35:13-PDT To: Reid cc: Interdoc, Reid@Score Brian, I find myself largely in agreement with both your new material, and the material you have incorporated from my earlier draft. Let me just record the points about which I have questions, while I still remember them. I don't understand the business about requiring a blank after each number. Surely it is sufficient to require that the next character be non-alphanumeric, and to allow a blank where this lexical requirement would otherwise be violated? I have no strong preference between "point" and "mark"; both have some undesirable graphic connotations. But we should not switch haphazardly between them. I understood your link/hook example better than your abstract explanation as a pair, which suggested to me that links would somehow exist outside the tree, rather than within it. I agree that we must distinguish between the name of a property and its value. Definitions are the mechanism for establishing connections between the two. Generally, we will be associating properties with regions "by name," within the scope of some definition that supplies the value. E.g., (region emphatic <Boo!>) rather than (region (property face bold) <Boo!>) . I would rather not allow unbound links in Interdoc scripts. Let us claim that the semantics of merging two scripts is a problem for the text editor, not for Interdoc. E.g., some editors may interpret some properties as references to file names, and perform some further linking on this basis, but each script should, at a low level, be parsable on its own. I think we could enjoin a many-one link-hook association as part of our low-level standard, and require that hook names be unique over entire scripts. I'm still of two minds about inheritance. It and various other "metaproperties" are so context-dependent that I sometimes wonder whether they shouldn't simply be handled by definitions, too. [In California, property is seldom inherited, it is almost always sold at an inflated value.] Definitions will probably require parameters (sigh!). As noted this morning, some names will be "passed through" to Interpress, and interpreted as operators (primitive or composed). This is our ultimate escape hatch that allows editors to traffic in unanticipated properties, as long as the printers have been told how they are to be treated. I think that the notion that a major chunk of the Interdoc semantics (inference rules?) can be given by means of a uniform translation from scripts into Interpress Masters (in the context of a given set of definitions) is a good one, and may convert the 2N problem into a C + N epsilon problem. Jim H. *start* 01139 00024 USt Date: 25 June 1981 10:49 am PDT (Thursday) From: Ayers.PA Subject: On "caches" To: Interdoc Reply-To: Ayers Using the nomenclature of the last meeting, a "cache" is (semi-)private information, placed in the interdoc source by a particular editor for its own purposes; it is "accelerator" information that can be re-created if necessary, but which is "true" if present. A cache might contain computed line breaks. What restrictions should be placed on caches? One requirement is that an editor has to delete the cache if he "invalidates" it, even though he does not know what the data in the cache is based on. Suggestion one: a cache can only reflect local data -- that is data immediately within the region containing the cache. This lets an editor delete the cache iff he alters the region. Suggestion two: a cache can only reflect data "here and below" -- data within the subtree rooted at the region containing the cache. Now an editor has to delete all caches "above" a change. Another thought: in any case, we should prohibit caches from containing, in any guise, logical pointers. Bob *start* 00505 00024 USt Date: 26 June 1981 10:48 am PDT (Friday) From: Horning.pa Subject: Re: On "caches" In-reply-to: Your message of 25 June 1981 10:49 am PDT (Thursday) To: Ayers cc: Interdoc Bob, I think I am in favor of your Suggestion two, modified to prohibit all "non-local" references (i.e., those outside the subtree). This seems safe. Global reference caches will have to be attached to the global node, and invalidated on ANY change by a non-comprehending editor. Life is hard. Jim H. *start* 00260 00024 USt Date: 26 June 1981 10:55 am PDT (Friday) From: Reid.PA Subject: Re: On "caches" In-reply-to: Horning's message of 26 June 1981 10:48 am PDT (Friday) To: Interdoc I believe that it will be too restrictive to ban non-local references. *start* 01387 00024 USt Date: 26 June 1981 10:56 am PDT (Friday) From: Mitchell.PA Subject: Re: On "caches" In-reply-to: Your message of 25 June 1981 10:49 am PDT (Thursday) To: Interdoc It seems to me that caching information will almost certainly involve logical pointers, which I don't believe is an insuperable problem because they must stick out syntactically anyway. However, I am beginning to think that even considering a general mechanism for caching information in an Interdoc script is a bad idea. Here are my reasons: If the cached information is only understood by a few editors, it would probably be better off being held in whatever private representations they use (I intend here to distinguish between private encodings of the Interdoc standard, which may have to have some familial resemblance to the standard, and private representations of scripts, which have nothing to do with the standard). If the cached information is widely understood among different editing programs, then it seems much less important that one worry about general invalidation schemes, just as it is unimportant to find general ways of handling the side effects of altering ANY widely understood information. This opinion is uttered as a strawman for you to knock down. I would like to hear why we should bother about caches at all. Yours in the interest of creeping simplicity, JGM *start* 00703 00024 USt Date: 26 June 1981 11:11 am PDT (Friday) From: Ayers.PA Subject: Re: On "caches" In-reply-to: Mitchell's message of 26 June 1981 10:56 am PDT (Friday) To: Interdoc "It seems to me that caching information will almost certainly involve logical pointers, which I don't believe is an insuperable problem because they must stick out syntactically anyway" My point about hiding logical pointers (logical pointers; not "real" syntactic ones) in the cached info was that one could, conceptually HIDE pointers there. For example, one could cache an integer n whose semantics is Of this regions siblings, exactly n are (rendered) taller than it. That's what I wanted to ban. Bob *start* 01246 00024 USt Date: 26 June 1981 11:15 am PDT (Friday) From: Guibas.PA Subject: "Caching" along To: Interdoc I agree with much of what JGM just said. The case for allowing private caches to creep into InterDoc documents can only be made if it is in fact plausible that several (at least more than one) InterDoc document editors will be able to use the same cached information. With my current understanding of what might go into such a cache, I believe this sharing to be very unlikely. Furthermore, in my view, InterDoc is there mostly to facilitate interchange of documents among different editors. I still expect that almost all documents in the world will be kept around in private representations, and perhaps some in private InterDoc encodings. As long as we are careful to make translations to/from the standard relatively efficient, I see no great drawbacks to this position. Another way to say this is that documents can be in different states of "fluidity". A document in InterPress form is pretty rigid, a document in a private representation of some editor is pretty fluid, and one in the InterDoc format is someplace in between. JGM should appreciate this "degrees of binding" view. Anyone else feel the same way? LJG *start* 00607 00024 USt Mail-from: Arpanet host CMU-10A rcvd at 29-JUN-81 1113-PDT Date: 29 June 1981 1410-EDT (Monday) From: Mary.Shaw at CMU-10A To: Horning at PARC-MAXC, Mitchell at PARC-MAXC Subject: IDL and Diana documents CC: David Lamb at CMU-10A Message-Id: <29Jun81 141005 MS10@CMU-10A> Jim and Jim, I have asked David Lamb to send you the IDL and Diana documents. IDL won't be ready for another month (or so), but Diana is available now, and he'll send that soon. Although the Diana document is directed at Ada, it should provide enough of a flavor of IDL for you to seen what's going on. Mary