*start* 00932 00024 USt Date: 1 July 1982 8:52 am PDT (Thursday) From: Mitchell.PA Subject: Re: Abbreviation as a LHS ?? In-reply-to: Ayers' message of 30-Jun-82 18:04:52 PDT (Wednesday) To: Ayers cc: Horning, InterDoc Bob, With respect to your example: lineHeight = 'font.size+line.leading' Yes, you'd better not assign to lineHeight because of the "=" binding. This issue is separable from the issue of quoted expressions (I emphasize the word EXPRESSION -- these are not macros). Secondly, no you don't have to say lineHeight = '(font.size+line.leading)' because quoted expressions are not macros. They are evaluated as needed and the resultant value is used. Assuming that font.size+line.leading produces a number when evaluated, that is its value; e.g., if font.size=10 and line.leading=12, then temp_lineHeight*2 evaluates to 22*2=44. Say after me: "Quoted expressions are not macros" fifty times. Jim M. *start* 00628 00024 USt Date: 1-Jul-82 18:19:59 PDT (Thursday) From: Ayers.pa Subject: Random Claim on Editing Styles .. To: InterDoc Suppose that we have a fragment of a script ... ... and that an editor desires to make the word "more" 'emphatic' -- a style. I suggest that the editor will chnage the fragment to ... {emphatic% } < words > ... because to do anything else would be very painful/awkward -- the stack "pop" at the right curly is very nice here. What does this suggest? It suggests that un-tagged nodes may be quite common in scripts. Bob *start* 00730 00024 USt Date: 1-Jul-82 20:01:56 PDT (Thursday) From: gcurry.es Subject: Re: Random Claim on Editing Styles .. In-reply-to: Ayers' message of 1-Jul-82 18:19:59 PDT (Thursday) To: Ayers.pa cc: InterDoc.pa Following that principle would mean that a text segment such as would be represented as {bold% } {bold% italic% } {italic% } , rather than bold% italic% noBold% noItalic% . I thought that brackets meant structure which was meaningful to the externalizer, and which must be preserved by conforming editors. Are these 'runs" of text really structure? Gael *start* 00720 00024 USt Date: 2 July 1982 12:15 pm PDT (Friday) From: Horning.pa Subject: Re: Random Claim on Editing Styles .. In-reply-to: gcurry.es's message of 1-Jul-82 20:01:56 PDT (Thursday) To: gcurry.es cc: InterDoc A long time ago I suggested "chromatic brackets," whose sole function was to delimit that did not correspond to structure, merely to runs. Let [x ... x] be an x-bracket pair delimiting a scope in which x. Then your example becomes [b[ib]i] As I said at the time (I think), I'm not sure that the gain in compression justifies the increase in complexity. I do think that this is an encoding issue, that should not intrude on the language design. Jim H. *start* 00551 00024 USt Date: 2 July 1982 12:24 pm PDT (Friday) From: McGregor.PA Subject: Re: Random Claim on Editing Styles .. In-reply-to: Horning's message of 2 July 1982 12:15 pm PDT (Friday) To: Horning cc: InterDoc I don't like "chromatic brackets" for two reasons; the complexity that you mentioned and the low likely use frequency. I believe the overlapping bold and italic that you cited in your example happens fairly infrequently in real documents (or at least at a low enough frequency that the compression issue seems minor). Scott. *start* 01148 00024 USt Date: 1-Jul-82 11:45:34 PDT (Thursday) From: Ayers.pa Subject: Meeting in 100G 9:30 Friday 2 July To: Interdoc Maybe we should discuss abbreviations a bit. I've gotten Jim&Jim to answer "yes" and "no" on at least one query in tha area ... Notes from the (small -- several people on vacation) meeting of 25 June. We received the material on font naming from Zack.wbst. It is basically uninteresting -- a historical sumary as of 1960. I've messaged Liz Bond and Scott Kim about any DIN numbering schemes that they are aware of. I decided that our Interscript documentation should be done in Star, and have constructed 'template' document sections. There are difficulties either in Star or Bravo; Star seems better in the long run; the NS8000 net at Bayhill has recently been fairly dependable. Jim Horning advised us to maintain a glossary. Jim Mitchell reported briefly on th e implementation activities of {Mitchell, Karlton, Stepak}. Pamela Dick, summer employee working with jean-Marie on the 820 implementation of Interscript, was introduced to us and Interscript was introduced to her. Welcome aboard. Bob *start* 03773 00024 USt Date: 8 July 1982 4:04 pm PDT (Thursday) From: Horning.pa Subject: Generalized Boxes, Boundaries, and Spaces To: Interdoc This is to record the outline of some tentative conclusions that Jim Mitchell and I reached in a discussion of Generalized Boxes this afternoon, triggered by concern about whether Interscript could handle both Tioga's and Star's treatment of paragraph top and bottom leading. First, we convinced ourselves that the problem is real, and cannot simply be legislated out of existence by deciding that one or the other is wrong. We concluded this after enumerating a number of other similar phenomena that will have to be dealt with, too [see below]. We also concluded that it is not reasonable to expect two editors with arbitrarily different geometric models to exchange scripts so that each would render precisely according to the other's model. Rather, the standard should identify a small set of "standard layout attributes" for cross-editor renderings. Each editor should provide definitions for these properties (or at least, for acceptable approximations) in terms of the values of attributes in which it traffics; conversely, each editor should be prepared to translate any incoming standard layout attributes into acceptable approximations in terms of the attributes it uses. (Of course, it would be nice if compatible families of editors actually trafficked in standard attributes, but that is a probably an insuperable political problem--as Tony Hoare says, you can only realistically standardize when the choice doesn't matter.) Next, we tried to group the two kinds of behavior, to see whether we could abstract any general principles. Do not put too much weight on the particular terms used; they are nonce identifiers. BOUNDARIES - around SPACES - between ----------------------- ------------------ Page margins Character spacing Columns (relative to margins) (graphics) Frames Line leading Paragraph margins Tioga paragraph leading Paragraph indents (top and bottom) Star paragraph leading Star paragraph leading (relative to other paras) (relative to page margins) Tabs Both refer to geometric properties of the layout, and ensure a certain amount of white space. The distinguishing characteristic seems to be that the white spaces in the first column may not overlap (and hence are additive), while those in the second column can (and hence maximize). For example, character spacing is there because the "last" space on a line can overlap into the paragraph, column, and page margins. Most often, items in both columns will be specified by dimensions relating them to other boundaries (containing and/or contained). We have been using the term "offset" for a dimension in the first column, and "separation" for one in the second. Of course, not everything can be specified independently. Some thought needs to be given to Defaulting unspecified dimensions. Deriving implicit dimensions from explicit dimensions and geometrical constraints. E.g., A pair of boundary lines may be related by up to two offsets and two separations (one of each from each). The minimum distance allowed between them is MAX(offset[1]+offset[2], separation[1], separation[2]) Consider the following example: a paragraph style has top and bottom offsets of 15 points, and separations of 20 points. Two consecutive paragraphs on a page are separated by at least 30 points. At the top of a page, a new paragraph is 15 points down from the page margin. Within a bordered frame with 0 offset and separation, a new paragraph is 20 points down. More thought is needed. This is sent now partly to transcribe my whiteboard, and partly to get you all thinking about these issues. Jim H. *start* 02146 00024 USt Date: 14-Jul-82 14:53:40 PDT (Wednesday) From: Ayers.PA Subject: Upcoming Interdoc TWG Meting is Friday 16 July at 3:30 (3?) in 100G Bayhill To: Interdoc At Jim Horning's suggestion, we are moving the meeting of the full TWG to the afternoon. The plan is to minimize technical discussion -- in particular on-the-fly design -- at the full TWG get-togethers. We hope that the subgroups (document model and parsing) will meet separately and produce solid trial balloons (solid balloons?). The full meeting, occurring at the end of the week, will be in a position to monitor and discuss their activites. At the meeting of 9 July, we spent a good deal of time discussing Tioga vs. Star's ideas on what paragraph leading is. [And we were really too big a group to brainstorm on this issue -- we demonstrated the merits of Jim's suggestion.] Since leading is intimitely connected with rendering, we revisited the "Interscript is silent on rendering" issue and what that meant. We still understand that you cannot ask an editor to render and then score the editor's interscript-ness based on the paper. (After all, there might not be a 'render' button on the editor!). But defining something like a "representative renderer" would make concrete our discussions about the "meaning" of justification and leading. I will be on vacation from 16 July through 26 July and thus will be absent from the next two meetings. I would really like to see the "document model" group present a straw-man proposal on the main-document-tree / / content-sub-tree layout-subtree / / / / / / style boxes / / content-sub-tree layout-subtree / / style boxes arrangement that they have been working on. Could this happen by the end of the month? I believe that this guy "works" and agree with Jim H. that it is appropriate to get a concrete proposal before the TWG. Bob *start* 13802 00024 US Date: 18 July 1982 7:42 pm PDT (Sunday) From: Mitchell.PA Subject: The Interscript grammar To: Karlton, Stepak cc: Mitchell Jane and Phil, Here is the current Interscript grammar, including the lexical grammar. Please let me know if you run into any problems so that I can fix them in the document from which this was extracted. Jim Mitchell --------------- 2. The Language Basis: Syntax and Semantics 2.1. Grammar Our notation is basically BNF with terminals quoted and augmented by the following conventions: a sequence enclosed in [ ] brackets may occur zero or one times; a construct followed by * may occur zero or more times; parentheses ( ) are used purely for grouping. script ::= versionId node versionID ::= "Interscript/Interchange/1.0 " item ::= content | binding | label content ::= term | node term ::= primary | primary op term op ::= "+" | "" | "*" | "/" primary ::= literal | invocation | indirection | application | selection | vector literal ::= Boolean | integer | intSequence | real | string | universal invocation ::= name name ::= id ( "." id )* indirection ::= name "%" application ::= ( name | universal ) "[" item* "]" universal ::= ucID selection ::= "(" term "|" scope* "|" scope* ")" vector ::= "(" scope* ")" node ::= "{" scope* "}" scope ::= ( binding | label )* content content* binding ::= name mode rhs mode ::= "_" | "=" | ":=" rhs ::= content | op term | "'" scope* "'" | "[" [ item* ] "|" binding* "]" label ::= tag | link tag ::= universal "$" link ::= id "@!" | name "@" | name "!" 2.2. Discussion of Features [Note that we have a formal semantic definition for this language that is every bit as precise as the grammar above. However, we have not yet figured out how to present it in a form that humans find equally palatable, so we postpone it to an appendix.] primary ::= literal literal ::= Boolean | integer | intSequence | real | string The primitive elements by which the value of a document is represented. term ::= primary op term op ::= "+" | "" | "*" | "/" Both the primary and the term must reduce to numbers; the arithmetic operators are evaluated right-to-left (a la APL, without precedence) and bind less tightly than function application. The result is a real if either operand is. invocation ::= id Id is looked up in the current environment; depending on its current binding, this may produce contents, bindings, and/or labels; if the rhs bound to id was quoted, that expression is evaluated in the current environment. In the (implicit) outermost environment, every id is bound to the corresponding universal (ID). invocation ::= name "." id Qualified names represent lookup in "nested" environments; name must have been bound to an environment, in which id is looked up. indirection ::= name "%" This indicates an intentional indirection through name, which should be preserved as part of the structure; replacing the indirection by its value in the current environment is a value-preserving loss of structural fidelity. (An invocation that is simply a name is an abbreviation that need not be preserved.) universal ::= ucID Universals are identifiers that are written entirely in upper case letters. They are presumed to be defined externally, so they are not looked up in the environment. application ::= ( name | universal ) "[" item* "]" If the application involves a universal (either explicitly, or because the name is bound to a universal), the corresponding function is applied to the argument list that results from evaluating item*. Part of the definition of Layer 2 will involve the specification of a small set of standard functions, which may be expanded in various Layer 3 extensions. If name is not bound to a universal, the current environment is temporarily augmented with a binding of the value of item* to the identifier value, and the value of the application is the result of evaluating name in that environment; this allows function definition within the language. Neither form of application changes the environment of succeeding expressions. selection ::= "(" term "|" scope1* "|" scope2* ")" This is a standard conditional item sequence, using syntax borrowed from Algol 68. The value and effect are those of item1* if the term evaluates to "T" in the current environment, those of item2* if it evaluates to "F". vector ::= "(" scope* ")" Parentheses group a sequence of items as a single vector; bindings in scope* affect the environment of items to the right in the containing node, but labels have no meaning. node ::= "{" scope* "}" Nodes have nested environments, and affect the containing environment only through global (:=) bindings to ids. Scope* is implicitly prefixed by an invocation of Sub, which may be bound to any sequence of items intended to be common to all subnodes in a scope. item* ::= "" The empty sequence of items has no value and no effect; this is the basis for the following recursive definition. item* ::= item1 item* In general, the value of a sequence of items is just the sequence of item values; binding items change the environment of items to their right in the sequence. binding ::= name mode rhs This adds a single binding to the current scope (i.e., to its associated environment); bindings have no other "side effects" and no value (i.e., they do not change the length of a containing vector or node value). binding ::= name mode op term "name mode op term" is just a convenient piece of syntactic shorthand for "name mode name op term". mode ::= "_" | "=" | ":=" A value can be bound to a name either locally ("_") in the environment of the node in which the binding appears, or globally (":=") in the environment of the root node of a script. Using "=" in a binding causes the right-hand side value to be bound locally to the name on the left and prevents any further bindings to that name in the scope of the node in which the "=" binding occurs. rhs ::= "'" scope* "'" A quoted rhs is evaluated in the environment of invocation, rather than the environment current at the point of binding. rhs ::= "[|" binding* "]" This creates a new environment value that may be used much like a record. rhs ::= "[" item* "|" binding* "]" This creates a new environment value that is an extension of the environment that is the value of item*. tag ::= universal "$" This gives the containing node the property denoted by the universal. link ::= id "@!" This introduces the set of links whose main component is id, and defines their scope. link ::= name "@" This identifies the immediately containing node as a source of the link name. link ::= name "!" This identifies the immediately containing node as a target of each of the links that is a prefix of name. 2.X. Lexical Considerations Integer. An integer is represented in radix 10 notation using the characters "0" through "9" as digits, followed by a delimiter. A negative integer is preceded by a minus sign "". Thus the decimal number 1234 is encoded as "1234", and 1234 is encoded as "1234". The delimiter may be empty if the following character is a letter. A sequence of integer literals in the range 0..255 can be represented in radix 16 notation using the characters "A" through "P" as digits ("A" corresponds to 0, "P" to 15). The entire sequence is enclosed in "#" brackets. For example, the integer 93 is represented as "#FN#", and the sequence of integers 93, 94, 95, 96 as "#FNFOFPGA#". These sequences require only two characters for each integer (plus two characters of overhead). Note that there is no delimiter between the integers in this encoding. Ordinary integer literals, with their delimiters, may be included in the sequence; e.g., 7, 93, 400, 40 could be represented by "#7,FN400CI#". Booleans are represented by the characters "F" and "T", followed by a delimiter. Real. A real is represented using Fortran E or F notation, with a trailing delimiter. Thus "12.34" is the same as "1.234E1". Minus signs may precede the mantissa or the exponent: "12.34E3 ". Identifier. An identifier is encoded by its characters (which are limited to letters and digits), followed by a delimiter: "x", "arg1". The first character of an identifier must be a letter, and must be written in lower case to distinguish identifiers from universals. Other letters may be written in either case for readability, since case is not significant in distinguishing identifiers. Vector. A vector is encoded by surrounding a sequence of values with parentheses, "(" and ")". String. A text vector usually contains integers that are interpreted as character codes. Often these codes lie in the range 32 to 126 inclusive, which are the numbers assigned to the characters of the interchange set by ISO 646. It is convenient to encode an element of such a vector by the character whose ISO code is the desired value. Such a string can be encoded by surrounding the characters with "<" and ">", thus "". If the string contains elements outside the allowed range (i.e., if the value is less than 32 or greater than 126) or the value 62 or XX (the ISO codes for the characters ">" and "#"), those elements must be represented as integers inside "#" brackets, as described above. The two-character encoding of small integers is designed to make escape sequences compact. Thus "", "", "", and "" are all equivalent. Universal names. A universal is encoded by giving its name in upper case letters, followed by a delimiter. E.g., "TEXT". Node. A node is encoded by a "{", followed by a sequence of items, followed by a "}". Comment. The beginning and end of a comment are both marked by a double minus sign: the sequence "" "" is a comment and may occur between any two tokens. Comments are ignored in rendering the script. The tokens of the interchange encoding are defined by the following BNF grammar, together with rules about delimiters: The delimiter that terminates an identifier or universal may only be empty if the next character is not an alphanumeric, "", or ".". The delimiter that terminates an integer may only be empty if the next character is not a digit, "E", "F", "", or ".". extra delimiters may be inserted after any token. token ::= literal | id | ucID | op | bracket | punctuation | comment literal ::= Boolean | integer | intSequence | real | string Boolean ::= ( "F" | "T" ) delimiter delimiter ::= " " | "," | empty empty ::= "" integer ::= [ "" ] digit digit* delimiter digit ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" intSequence ::= "#" intOrHex* "#" intOrHex ::= integer | hexChar hexChar hexChar ::= "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | "J" | "K" | "L" | "M" | "N" | "O" | "P" real ::= [ "" ] digit digit* "." digit* [ "E" integer ] delimiter string ::= "<" stringElem* ">" stringElem ::= stringChar | intSequence stringChar ::= any character but "#" or ">" id ::= lowerCase idChar* delimiter idChar ::= letter | digit letter ::= lowerCase | upperCase lowerCase ::= "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | "k" | l" | "m" | "n" | "o" | "p" | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z" upperCase ::= hexChar | "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z" ucID ::= upperCase* delimiter op ::= "+" | "" | "*" | "/" bracket ::= "(" | ")" | "{ " | "}" | "<" | ">" | "[" | "]" | ""' punctuation ::= "." | ";" | ":" | "=" | "_" | "!" | "#" | "@" | "|" comment ::= "" commentString "" commentString ::= any sequence of characters not containing "" alphanumeric ::= letter | digit A simple listing of an interchange script can just print the character sequence, with line breaks every n characters, or perhaps at the nearest convenient delimiter. Such a listing is reasonably easy to read, so that problems can be tracked down simply by studying it. Additional help in reading the file can be furnished by utility programs which format the file for more pleasant reading. 2.4.2. Normalization Every encoding must define a normalization function N, which maps a script in the encoding into another script in the encoding which generates the same output. N must be idempotent (i.e., N2=N); it may not change the fidelity level of the script (see 2.4.3). If a script violates the definition of Interscript, a normalization function may report this fact instead of producing a normalized result. In other words, normalization need not be defined on erroneous scripts. The purpose of this function is to make possible a precise description of the rules for private encodings in section 2.4.4. The idea is that when an encoding provides several ways of saying the same thing (typically a basic way, and some more concise ways which work in common special cases), the normalized script will uniformly choose one way of saying it. Note that the normalized script is not intended for any purpose other than precisely defining a notion of equivalent script; it is neither especially compact nor especially readable. The normalization function for the interchange encoding is defined as follows: Comments are omitted. Delimiters are replaced by empty if possible, otherwise with ",". An integer encoded in hex is replaced by the same integer encoded in digits; except in strings, "#" brackets are replaced by parentheses. Leading zeros are dropped from a digits encoding of an integer. Reals are uniformly encoded in E format with a single non-zero digit to the left of the "." and no trailing zeros; 0 is encoded by "0.0". An upper case letter in an identifier is replaced by the corresponding lower case letter. Each direct invocation (abbreviation) is replaced by its binding. ------------- *start* 00361 00024 USt Date: 20-Jul-82 12:03:16 PDT (Tuesday) From: Karlton.PA Subject: Re: The Interscript grammar In-reply-to: Mitchell's message of 18 July 1982 7:42 pm PDT (Sunday) To: Mitchell cc: Karlton, Stepak I took a quick look at the grammar and I noticed rhs ::= ... | "[" [ item* ] "|" binding* "]" Why are there brackets around 'item*'? PK *start* 00722 00024 USt Date: 20 July 1982 5:58 pm PDT (Tuesday) From: Mitchell.PA Subject: Re: The Interscript grammar In-reply-to: Karlton's message of 20-Jul-82 12:03:16 PDT (Tuesday) To: Karlton cc: Mitchell, Stepak rhs ::= ... | "[" [ item* ] "|" binding* "]" The brackets indicate that the item* sequence is optional. One can write something like Times _ [font | face.style _ ROMAN size_10*pt] or, something like complexNum _ [| real_1.0 imag_2.0] In the first case, Times is initialized to the value of font (which better be a record/environment) and then the two following bindings are added. In the second, complexNum is bound to a Null environment augmented with the bindings to real and imag. Jim M. *start* 00278 00024 USt Date: 20-Jul-82 18:25:26 PDT (Tuesday) From: Karlton.PA Subject: Re: The Interscript grammar In-reply-to: Mitchell's message of 20 July 1982 5:58 pm PDT (Tuesday) To: Mitchell cc: Karlton, Stepak Yes, but how is "item*" different from "[ item* ]"? PK *start* 00510 00024 USt Date: 21 July 1982 9:51 am PDT (Wednesday) From: Mitchell.PA Subject: Re: The Interscript grammar In-reply-to: Karlton's message of 20-Jul-82 18:25:26 PDT (Tuesday) To: Karlton cc: Mitchell, Stepak The brackets around item* are part of the BNF (note that they are not in quotes) and only indicate that the item* is optional: rhs ::= ... | "[" [ item* ] "|" binding* "]" An alternate way of specifying this is rhs ::= "[" "|" binding* "]" rhs ::= "[" item* "|" binding* "]" Jim M *start* 01871 00024 USt Date: 2-Jul-82 7:22:24 PDT (Friday) From: gcurry.es Subject: Precision of InterScript Standard Imaging Terms To: InterDoc.pa The question arose yesterday as to how "precisely" the semantics of the various InterScript imaging-related terms should be specified in the Standard layer. At one extreme, the answer might be "not at all precisely". That is, the imaging terms might be interpreted wholly by the various conforming editors. This sacrifices constancy of form under interchange (note that no one has proposed that we have that anyway). At the other extreme, the answer might be "completely precisely", in the sense that a script is so precise that it COULD be used as a specification for two different conforming editors to produce the same Interpress master (I am talking about the precision of imaging terms now - Jim Mitchell points out that other domains like voice would need the same treatment). This is not to say that different editors RENDER the script the same way - only that the semantics of terms in the script is sufficiently precise that that is possible. The tradeoff seems to be in ease of externalization versus interchangeability of form. That is, to the extent the specification is loose, a given editor's imaging model is more likely to be "isomorphic" to that of the standard. To the extent that the standard's model is very precise, even differences in imaging models can force large amounts of work, and hard questions about the completeness of the standard's model come up. Consider the semantics of inter-line leading in the standard. If it is either the Bravo model (standard printing interpretation) or the Star/Tioga model (distance between baselines), then the other will have problems. If it incudes both possibilities (one damn thing after another?), then what else are we missing? Gael *start* 00992 00024 USt Date: 7-Jul-82 7:53:13 PDT (Wednesday) From: GCurry.ES Subject: A Simpler Example of Artifacts To: InterDoc.pa cc: , GCurry Pretty-printing (e.g. Tajo formatter) is very much like Star pagination in the issues it raises, but it is a simpler example. Both operations discard some of the structure of a document (formatter => non printing characters, Star => page structure) and replace it with an updated structure, according to some "essence" of the document which is preserved by the operation (formatter => token stream, Star => "character" stream). In general, the user does not understand the representation of the artifacts (formatter => tabs and spaces?, Star => pages). Are there any qualitative differences between these two cases? Should we tag the spaces inserted by the formatter as artifacts (almost certainly not), and if not, why not? Why is pretty printing never represented as a style, even though it can easily be thought of that way? Gael *start* 00377 00024 USt Date: 8 July 1982 6:46 pm EDT (Thursday) From: Lampson.PA Subject: Re: A Simpler Example of Artifacts In-reply-to: GCurry.ES's message of 7-Jul-82 7:53:13 PDT (Wednesday) To: GCurry.ES cc: InterDoc ------- Should we tag the spaces inserted by the formatter as artifacts (almost certainly not), and if not, why not? ------ Yes, of course we should. *start* 00300 00024 USt Date: 8-Jul-82 17:00:27 PDT (Thursday) From: deLaBeaujardiere.pa Subject: Re: A Simpler Example of Artifacts In-reply-to: Lampson's message of 8 July 1982 6:46 pm EDT (Thursday) To: Lampson cc: GCurry.ES, InterDoc Xerox 820 WordStar tags the spaces it inserts. Jean-Marie *start* 02810 00024 USt Date: 15-Jul-82 13:59:30 PDT (Thursday) From: GCurry.ES Subject: Interchanging the InterScript Standards Document To: Ayers.pa cc: InterDoc.pa, GCurry I have a number of comments on the preliminary version of your "Concepts and Facilities" document; I'll comment in separate messages, since they aren't too related to each other. ASSERTION #1: It is difficult to define an interchange standard which will acceptably interchange its own specifcation (i.e., the Interscript standards document). CASE IN POINT: In the discussion on the first page of "The Nature of a Document", a disinction is made between what is "part of" a document and what is "read into" a document. An example is given "...Other characteristics of the pages are not...part of the document -- the fact that, in this paragraph, the earlier word "letters" is the last word on a line,...(is) not "part of" the document, even though it is clearly present on the printed page". This is an example of a reference to form from content. The printed copy I have puts the word "letters" at the BEGINNING of a (justified) line, not the end. In this case, you must have made an edit (in Star) which moved the word from the end to the beginning of a line. This is an example of an avoidable editing error in a WYSIWYG editor. (It is interesting to wonder how one would do the same in a representation-based editor). THE PROBLEM Because we do not interchange formatting information as specific as line-break algorithms, the act of interchanging the document you distributed will cause the word "letters" to appear in various places in the containing paragraph, according to which editor is doing the formatting. The position of that word is an important part of the meaning of the document, but we have no way to transmit it. We may just have to live with that. We better not use Interscript as part of the printing architecture (i.e., don't transmit documents to be printed in Interscript). ALTERNATIVES: a. Find a way to represent such form information (we rejected that approach as too difficult), b. Invent a primitive on content called WordAtEndOfPrecedingLine, or some such. (tarpit?) c. Be able to constrain the formatting so that a given word falls at the end of a line (too difficult and generalize). d. Understand that those constructs can't be properly interchanged and don't use them (can't expect users to know about this). The alternatives above are only listed to show there aren't any good alternatives. CONCLUSION The InterScript standard, or at least its tutorials, will contain examples of document aspects which can't be properly interchanged; those examples will not be able to be properly interchanged. Yes, the effect is real, and no, we can't do anything about it. Gael *start* 00721 00024 USt Date: 15-Jul-82 14:07:12 PDT (Thursday) From: Ayers.PA Subject: Re: Interchanging the InterScript Standards Document In-reply-to: GCurry.ES's message of 15-Jul-82 13:59:30 PDT (Thursday) To: GCurry.ES cc: Ayers, InterDoc "CONCLUSION The InterScript standard, or at least its tutorials, will contain examples of document aspects which can't be properly interchanged; those examples will not be able to be properly interchanged. Yes, the effect is real, and no, we can't do anything about it." But we can. We don't want to INTERCHANGE the Interscript standard, just PUBLISH it. So getting things right once and then making an Interpress file which defines the standard is the thing to do. Bob *start* 00505 00024 USt Date: 15-Jul-82 17:26:11 PDT (Thursday) From: GCurry.ES Subject: Re: Interchanging the InterScript Standards Document In-reply-to: Ayers.PA's message of 15-Jul-82 14:07:12 PDT (Thursday) To: Ayers.PA cc: GCurry, InterDoc.PA I only bring this up because a standard test for completeness is whether a thing will process itself. Hence the question, "What happens when you use Interscript to interchange its own description". I agree we don't have a practical problem. Gael *start* 00911 00024 USt Date: 16 July 1982 11:03 am CDT (Friday) From: Johnston.DLOS Subject: Re: Interchanging the InterScript Standards Document In-reply-to: Ayers.PA's message of 15-Jul-82 14:07:12 PDT (Thursday) To: Ayers.PA cc: GCurry.ES, InterDoc.PA "But we can. We don't want to INTERCHANGE the Interscript standard, just PUBLISH it. So getting things right once and then making an Interpress file which defines the standard is the thing to do." I didn't think that was the point. The problem to me was that documents which verbally reference a specific piece of content in the form that results from using a specific editor. I felt the problem stated was that Interscript could not address keeping the form of the original editor, therefore the content would become incorrect. Not that we should be responsible for that, I guess; but it is, as I understood Gael to say, a point to ponder. Rick *start* 04013 00024 USt Date: 19-Jul-82 11:39:59 PDT (Monday) From: stepak.pa Subject: Notes on Document Model Mtg: 7/16/82 To: Interdoc cc: , stepak.pa General overview of events of the meeting. Additions, deletions, and comments are welcome. Attendees: Gael Curry, Jim Horning, Scott McGregor, Jim Mitchell, Jane Stepak Purpose of the meeting: Discussion of document model issues Gael proposed the following description for a document consisting of 2 paragraphs: doc (width _ c...) ! ------------------------------------ ! ! para 1 para 2 ! ! ------------------ --------------------- ! ! ! ! content layout content layout ! ! ------------------ ----------------- ! ! ! ! style boxes style boxes (justified...) (para.pre _... (justified...) (para.pre _... para.post _ ...) para.post _ ...) ------------------------- ! para1.pre ! ------------------------- ! ! ! ! ! ! ! ! ! ! ! text of para1 ! ! ! ! ! ! ! ------------------------! ! para1.post ! ------------------------- ! para2.pre ! ------------------------- ! ! ! ! ! ! ! text of para2 ! ! ! ! ! ! ! ------------------------- ! para2.post ! ------------------------- The next discussion centered on interpretation capability of editors, etc. complexity of model --> no ....-----!-----!----!-----!-----!-----!-----!------ ..... interpretation ! ! ! ! ! fixed pitch Star Tex ! single size ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! . . . ! . . . ! . . . ! . . -------> . ! .idempotent . ! .<-------- . ! ! ! ! full semantics ! (precise ! rendering) more interpretation: editors must translate the information, decide on abstract syntax of the document, see how the interchange works. less interpretation: no interpretation --> just look for identifiers that are "spelled the same"; just suggested interpretation for layout parameters is used; pre/post leading taken as absolute values... The fact that Star and Tioga deal with paragraph leading differently inspired the following: gap = offset1 + offset2 (Star) gap = max(offset1,offset2) (Tioga) --- ep1 ---- ! gap 0 ! ! --- e11 ---- ! ! ! ! ! size1 ! ! ! --- ---- ! gap 1 ! --- ---- ! ! ! size2 ! ! ! ! ! --- --- --- ! ! ! parent ! sep. offset size ! ! ! ! --- --- --- ! ! ! size3 ! ! ! ! ! --- ! ! ! ! . ! . ! . ! ! ! --- ! ! ! ! ! size n ! ! ! ! ! --- ---- ! ! gap n ! --- ep2 ---- Document Attributes ------------------- ! understand first ! ! "The Galley Problem" ! ! character looks ! ! paragraph properties: ! line-leading ! justification ! ! 1st indent, right indent, rest indent ! ! ! ! intervals: nesting & siblings ! ! frames & simple graphics ! ! cross-hierarchy layout ! (lines in paragraphs, ! pages in document...) ! ! ! general graphics ! ! global layout ! ! ! "Mouse's Tale" ! ! understand later ! Etc ---------- 1. contents of a paragraph: concatenation of characters 2. rendering of a paragraph: concatenation of lines 3. adjustments must be made when an editor is of a different level of complexity than the script. 4. the "content" of a document should never change. (adding or deleting blank lines...) Important in discussion of representation of pre/post leading in script, and how each editor will interprete it. *start* 00649 00024 USt Date: 22-Jul-82 11:50:27 PDT (Thursday) From: Karlton.PA Subject: Interscript is LALR To: Mitchell cc: Stepak, Karlton I went to Satterthwaite and he fixed up my grammar for me. The problem I was having was in my definition itemlist ::= | item | item It turns out that this allows an infinte number of s at the front of the list and is ambiguous. The trick (which I don't think I would have guessed in quite some time) is to redefine it to be itemlist ::= | itemlist' itemlist' ::= item | itemlist' item Now my problem is to convert this LALR grammar into an obvious LL[1] grammar. PK *start* 01341 00024 US Date: 27-Jul-82 15:29:28 PDT (Tuesday) From: Karlton.PA Subject: Interscript grammar To: Mitchell, Horning cc: Karlton I wish to unilaterally change the grammar in minor way (just to make my life a little easier), and I wanted to run the change through you. Currently the following definitions exist string ::= "<" stringElem* ">" stringElem ::= stringChar | intSequence stringChar ::= <" or "#">> intSequence ::= "#" intOrHex* "#" intOrHex ::= integer | hexChar hexChar as well as literal ::= ... | intSequence | ... What I wish to do is to define a different category for a sequence of values that may exist in a string. As long as programs are creating these strings, I want to restrict the escape sequences in strings to be a sequence of hex characters. This would entail making the following changes stringElem ::= stringChar | hexSequence hexSequence ::= "#" hex* "#" hex ::= hexChar hexChar What is the value of having intSequence as a possible literal? (Of course a vector of integers has lots of uses, but that is not what I am wondering about.) It might just be the right thing to eliminate it entirely from the grammar. It just allows a redundant way to say something and adds no function. I agree that an escape sequence is important, but there is one. PK *start* 00577 00024 US Date: 27-Jul-82 16:03:23 PDT From: Horning.pa Subject: Re: Interscript grammar To: Karlton cc: Mitchell, Horning.pa Phil, I believe that section of the grammar was copied pretty slavishly from Interpress, with not too much thought about whether we needed the flexibility. Of course, Interpress uses vectors of integers for lots of things that have nothing to do with strings, and takes the position that a string is just a special case. I certainly don't have strong feelings on this point. Have you asked Bob Ayers his feelings about it? Jim H. *start* 01144 00024 USt Date: 27-Jul-82 9:50:39 PDT (Tuesday) From: deLaBeaujardiere.pa Subject: Re: Interscript TWG meeting of Friday, 16 July In-reply-to: Karlton's message of 23-Jul-82 11:33:13 PDT (Friday) To: Karlton cc: Interdoc Amendments to the minutes of the TWG meeting of Friday, 16 July: 1. Bill Duvall's editor already exists. What remains to be done in order to use it is: A) agree on the exact structure of the internalized script. B) modify the editor so that it knows about that structure. 2. We can't tell if it is too messy to interact with WordStar since we haven't seen the source code. It takes too long to get the source code (the legal documents to get it were signed yesterday only!) and Bill already has that handy dandy editor, so let's use it instead. It will be informative, however, to have a look at WordStar source code when we get it and pass on our conclusions to whomever will make it process scripts. 3. "There was a call for example scripts, and Jean-Marie has volunteered to gather some"??? I thought I was going to gather properties of fonts... Jim Mitchell: can you tell me? Jean-Marie *start* 02368 00024 USt Date: 8-Jul-82 16:34:18 PDT (Thursday) From: GCurry.ES Subject: More on artifacts To: InterDoc.pa cc: , GCurry We have been thinking of artifacts as aspects of a document which are introduced by certain operations. For example, Star pagination INTRODUCES page structures and Tajo formatting INTRODUCES blanks. When externalizing a document containing artifacts, we considered marking the artifacts as such, with the understanding that they constituted low-grade information. The thought was that such information could be deleted by foreign conforming editors, and that doing so would do some, but not serious, damage to the document. Artifacts were at one time considered to beaspects of documents introduced when going from one LEVEL of a document to the next (e.g., Appearance, Presentation, Substance, Essence, Understanding). Jim Horning suggested, as a matter of presentation, that it would be better to think in terms of document equivalence than in terms of artifacts. I believe now it may be more than a matter of presentation. ARTIFACT: If P is a part(type) of a document, and Op is an operation taking documents to documents, then P is an artifact when P1 # P2 => Op[P1] = Op[P2] for some P1, P2 instances of P. That is, an aspect(part) of a document is an artifact with respect to some operation when that aspect doesn't matter to the operation. Thus page structure is an artifact with respect to pagination, and blanks between tokens is an artifact with respect to formatting. Note: Artifacts are not necessarily INTRODUCED by operations. A more accurate phrasing of the second sentence of this message is "For example, Star pagination REPLACES page structures and Tajo formatting REPLACES blanks." Documents are not necessarily well-formed if artifacts are DELETED. For example, a Star document always has a page structure, but there may be just a large single page which contains all the text in the document. Alternatively, here may be 40 spaces (inserted by a user) between tokens in a Mesa program, and the Tajo formatter may reduce that to 0 or 3 or 6. This is unfortunate The utility of ARTIFACT$ (or equivalent) to us was that the generic editor could REMOVE the value without understanding all about it. It seems impossible for the generic editor to REPLACE an artifact with an equivalent one. Gael *start* 00971 00024 USt Date: 15-Jul-82 18:28:55 PDT (Thursday) From: GCurry.ES Subject: Secondary Goals for an Interchange Standard To: Ayers.pa cc: InterDoc.pa, GCurry In the section entitled "Why an Interchange standard", I think you have captured the main motivation for interchange: information-lossless exchange coupled with (at least) basic editing. However, I think we should also explicitly mention other goals which affect what we end up with, among which are: UNCONSTRAINED EDITOR EVOLUTION: It is desirable that adherence to the interchange standard minimally constrain an editor's evolution. An editor should not be forced to choose between adherence to the standard and evolution. MAXIMUM INTERSECTION: It is desirable that we be able to interchange as many aspects of documents (e.g., layout, equations, etc.) as accurately as possible. As with the "Encodings" goals, these goals are somewhat antagonistic with each other and other goals. Gael *start* 03465 00024 USt Date: 15-Jul-82 20:03:53 PDT (Thursday) From: GCurry.ES Subject: What are Star Page Structures? To: Ayers.pa cc: InterDoc.pa, GCurry This message wonders whether Star page structures are artifacts or transients (presumably transients), and then argues against the (anticipated) assertion that they are TRANSIENTS in the same sense as other examples given in the "Concepts and Facilities" document. From the Concepts... document (paraphrasing): ARTIFACT: a characteristic of a document not explicitly mentioned in the representation. (note this is not the same notion as that expressed in an earlier message of mine: ARTIFACT (with respect to an operation): A characteristic of a document which doesn't matter to the operation. Examples are page structures under pagination and intertoken blanks under pretty-printing) Presumably Star page structures are not artifacts in the first sense. TRANSIENT: a temporary aspect of a rendition, especially a display, which is due to an editor's inability to correctly render a document -- possibly because of a desire for fast response time. There are two reasons that Star pages have been treated as they have: A. PERFORMANCE - we can't do continous pagination in the general case with adequate performance; I doubt any editor will ever able to do so. B. CONTINUITY - Pagination, regarded as a mapping from representation to image, is "discontinuous" in general. The more complicated documents get, and the more capable editors become with layout facilities, the more likely it is that small changes in representation (e.g., character insert) will cause large changes in image. Pages structures, on the other hand, can be made to act continuously (e.g., by letting them stretch). This is very valuable, for example, when trying to edit a document from a marked-up copy - pages on the marked-up copy correspond to editor pages, even after editing. Of the two people that I have mentioned these considerations to, both thought that continuity was the more important. There is a third reason why page structures should not be viewed as unimportant structures which exist only because the editor wasn't clever enough to generate them at display time. C. INCREASED LAYOUT FUNCTION - Page structures can be used as "hooks" to elaborate layout. EVEN IF they are temporary, and discarded at every pagination, they can still be used to get extra layout function. By regarding them as concrete, user-editable structures, one can paginate, tune, and print (much as one would paginate, tune and print the Interscript Standard document). If the printed representation is considered to be the end product, then it is valuable to be able to get as much variety in the printed form as possible (even if some of that structure is discarded by subsequent edits, such as pagination). Star plans call for pages to evolve to graphics frames, containing text frames (linked together) containing the document text. Other structures may then be added to such pages, such as annotations; these MIGHT be discarded by subsequent paginations. Star page structures are neither temporary, nor do they exist solely because of an "inability to correctly render the document". They are legitimate document components, which happen to be wholly replaceable based on page specifications embedded in the document. Is it problematic if Star page structures are neither artifacts nor transients? Gael *start* 01423 00024 USt Date: 16 July 1982 12:28 pm EDT (Friday) From: Lampson.PA Subject: Re: What are Star Page Structures? In-reply-to: GCurry.ES's message of 15-Jul-82 20:03:53 PDT (Thursday) To: GCurry.ES cc: Ayers, InterDoc I'm quite sure that it is possible to do continuous pagination in almost all cases of practical interest with adequate performance. It has been a source of mild frustration to me that I have never been able to get anyone to try out the scheme for doing it that I thought up three years ago. I certainly agree that continuity is important. But I don't like having it forced down my throat. Your point about increased layout function is a good one. Especially in the context of sophisticated page layout algorithms that do dynamic programming etc, it may be possible to hand onto quite specific page layout information provided by the user for much longer that one might expect. Certainly this is how things are done in the publishing business. But again, it most cases a document's layout is not being caressed, and it is much better to have everything done automatically. I am not sure I understand the question you are asking. There are lots of permanent things that are "really" part of the domcument (hence, I assume, neither artifacts nor transients) which can be changed by editing operations: the text, the looks, etc. All you are saying is that "paginate" is an editing operation. *start* 01714 00024 USt Date: 16-Jul-82 12:51:05 PDT (Friday) From: gcurry.es Subject: Re: What are Star Page Structures? In-reply-to: Lampson's message of 16 July 1982 12:28 pm EDT (Friday) To: Lampson.pa cc: GCurry.ES, Ayers.pa, InterDoc.pa You are right. I am mostly saying that pagination is an editing function. I was also trying to object to the anticipated position of Interscript that page structures in Star were like the old Laurel "insert mode" (formatting invariants suspended). In Tioga, page structures are introduced at rendering time. Since they are not in a document's representation, it is not necessary to represent them in an Inter-script. IF ONLY Star pages were treated that way, we wouldn't have to deal with the ugly problem of representing two orthogonal hierarchies (i.e., chapter/section/paragraph, page/column/segment) in a script. I was trying to say that there are GOOD REASONS for Star pages to be treated that way, and that the only reason for not handling them in Interscript is that we can't (rather than shouldn't). If pages can be full-fledged document parts, and Interscript-conforming editors must preserve structure, then Tioga and the 820 cannot discard them (probably substantially complicating those editors). We had proposed considering page structures to be low-grade information - but still information - which could be discarded at the risk of accepting editors. At one time, we had considered tagging document parts with the tag ARTIFACT$, which meant "discardable if necessary". I saw no mention of such a construct in the "Concepts and Facilities" document, and the intepretation of the term "artifact" there was incompatible with such a use. Gael *start* 01347 00024 USt Date: 16-Jul-82 12:51:52 PDT (Friday) From: Karlton.PA Subject: Re: What are Star Page Structures? In-reply-to: Lampson's message of 16 July 1982 12:28 pm EDT (Friday) To: Lampson cc: InterDoc I am interested knowing about your scheme for doing continuous pagination. If I have time I may try putting it into the simple Tajo-based Interscript-conforming editor that I have signed up to do. I am somewhat confused by your claim (in your message of July 8th) that the spaces inserted by the formatter are artifacts (under the old definition which is closer to what a transient is now). The formatter gets used in one of two ways: making an Interpress master or editing the document. In an Interpress master the spaces don't matter since the master is not intended to be edited. When the formatter gets used as an editor (It could be viewed as the edit function beneath some large key.), the document is altered; the spaces are part of the document. The spaces could be viewed as artifacts in a different model of use of a formatter. Suppose some paragraph in the document has Look-Mesa attached to it, and the editor is capable of maintaining the formatting as editing by the user takes place. As tokens are changed in, spaces would appear and disappear from the document without explicit intervention by the user. PK *start* 00677 00024 USt Date: 19 July 1982 11:24 am PDT (Monday) From: Horning.pa Subject: Re: What are Star Page Structures? In-reply-to: GCurry.ES's message of 15-Jul-82 20:03:53 PDT (Thursday) To: GCurry.ES cc: InterDoc Gael, "concrete, user-editable structures" are neither artifacts nor transients. They are part of the document, and of course, its externalization as a script. It is harder to give a convincing argument that Star's page-break characters are part of the galley, rather than the layout, but I think I could construct such. Note that there are operations other than pagination that make global edits to a document (consider substitution . . .). Jim H. *start* 03000 00024 USt Date: 15 July 1982 7:39 pm PDT (Thursday) From: Horning.pa Subject: Generalized boxes, cont'd - notes from whiteboard To: Interdoc Last Friday afternoon Gael, Jim, and I had a further discussion about boxes and leading. Since the notes got left on my whiteboard, I agreed to transcription. What follows is filtered somewhat by my memory of what the various pictures and phrases were supposed to mean. First, we recognize that rectangles are "just" a special case of general boundaries. However, they are worth considering both because they occur very frequently in practice, and because they seem to pose many of the difficult problems of the general case in a form where we can at least think about them relative to familiar objects. Second, we started making more rapid process when we realized that, to a very large extent, a rectangle could be viewed as two "intervals" (one horizontal, the other vertical) that could be considered separately. Of course, there are some second-order interactions: if the lines in a paragraph get shorter, there may be more of them. Third, we noted that intervals are well-suited to considering the most common form of composition: catenation in a single dimension--characters into lines, lines into paragraphs, paragraphs into galleys, etc. A catenation of N intervals nested in a parent interval can be described either by the positions of 2N+2 "edges," or, almost equivalently, by the position of k of the edges and 2N+2-k "relative positions." The simple geometric constraint that they add up allows us to convert freely; where the models differ is in what is considered to be fixed and what is derived. |--------------------------------parent size-----------------------------------| |--gap0--|---size1---|--gap1--|---size2---|--gap2---| . . . |--gapN--| eP1 e11 e12 e21 e22 e31 eN2 eP2 Position information may come from one of four sources: Explicit: "this box is 3" wide" Synthesized: "this box is high enough to hold the paragraph contents" Inherited: "this box is half the width of its parent" Derived: "this gap fills the remaining space" This seems to cover what we know about sizes. Gaps are a little more complicated. As sketched in my message of 8 July, we can model the behaviors with which we are familiar by associating with each edge two constraint "distances" (not really distances, since we must allow for signed values): Offset: "the gap must contain at least 20 pts associated with this edge" Separation: "this interval must be at least 20 points from the adjacent interval" Thus, for internal gaps, we have the constraint gapi >= MAX[offset2i + offset1(i+1), separation2i, separation1(i+1)] For the first and last gaps there will be different, but similar rules that we haven't set down yet. [It seems better to send this off now than to wait til I've finished it. Things will probably change tomorrow, anyhow.] Jim H. *start* 01219 00024 USt Date: 16 July 1982 12:17 pm EDT (Friday) From: Lampson.PA Subject: Re: Interchanging the InterScript Standards Document In-reply-to: GCurry.ES's message of 15-Jul-82 13:59:30 PDT (Thursday) To: GCurry.ES cc: Ayers, InterDoc Actually, I think you should be able to get the word to come out at the end of the line. I have sometimes wanted this capability, and have seen publishing people using Bravo who have also wanted it. The feature you want is to be able to force justification of the last line of a paragraph (or perhaps of any line with an explicit CR), by stretching the white space as much as necessary. This seems to me to be a perfectly reasonable sort of paragraph property, not different in kind from "centered" or "justified". In general, there should be properties which give the user as much control as he wants over the layout of the text. This is somewhat tangential to the facilities our editors emphasize, which make low-level layout decisions automatically pased on higher level properties specified by the user. But I don't think there is a conflict. I wonder whether there are examples of the general problem you describe which are not amenable to this kind of solution.