LIMITED DISTRIBUTION: FOR XEROX INTERNAL USETowards an Interchange Standardfor Editable Documentsby Jim Mitchell (Mitchell.PA) and Jim Horning (Horning.PA)Revision 1.1/May 10, 1982The Interscript standard will define a digital representation of editable documents forexchange among different editing systems. An Interscript script can be transmitted from oneeditor to another over a network, or can be stored for later editing. A script is not limited toany particular editor: if a script contains editable information some of which is notunderstandable by a particular editor, it is still possible to edit the parts of the documentunderstood by that editor without losing or invalidating the parts it does not understand.This document is a draft of a proposal for the technical content of the Interscript standard. Itdefines and explains the proposed standard, gives examples of its use, explains how toexternalize documents in an editor's private format into scripts, and how to internalize scriptsinto an editor's private format. It also indicates a number of issues that must still be resolvedto establish a practical standard.Note:This draft is being circulated to interested parties within Xeroxto report preliminary ideas. It should not be interpreted as a definitiveproposal, and should not be distributed outside.XEROXPALO ALTO RESEARCH CENTERCOMPUTER SCIENCE LABORATORY3333 Coyote Hill Road / Palo Alto / California 94304 ge0f@p{,+W;qTy gLXrX: gI  gFsr sD gDZ9ts gB(8 gA +* g?d] g=H g:@ g9 S g7ft s 5t s g5># g4"S1rIs4 S/h-S-0(`u(`r(`(`Sv! g >QQTowards an Interchange Standard for Editable Documentsby Jim Mitchell and Jim HorningRevision 1/May 4, 1982The Interscript standard will define a digital representation of editable documents forexchange among different editing systems. An Interscript script can be transmitted from oneeditor to another over a network, or can be stored for later editing. A script is not limited toany particular editor: if a script contains editable information some of which is notunderstandable by a particular editor, it is still possible to edit the parts of the documentunderstood by that editor without losing or invalidating the parts it does not understand.This document is a draft of a proposal for the technical content of the Interscript standard. Itdefines and explains the proposed standard, gives examples of its use, explains how toexternalize documents in an editor's private format into scripts, and how to internalize scriptsinto an editor's private format. It also indicates a number of issues that must still be resolvedto establish a practical standard.The standard provides for documents witha dominant hierarchical structure (e.g., book/chapter/section/paragraph...) while alsoproviding for documents needing a more general structure than a tree (e.g., forgraphics or cross-references in a textual document),formatting information (e.g., margins, fonts, line widths, etc.),definitional structure (such as styles or property sheets), andintermixed kinds of editable information (e.g., text with imbedded graphics). This draft deals primarily with the contents of Layers 0 and 1 (the base language) of theproposed standard.Contents1. Introduction2. The Language Basis: Syntax and Semantics3. HigherLevel Issues4. PragmaticsAppendix A: GlossaryWp{6 gLXqX gI  gFrq rD gDZ9sr gB(8 gA +* g?d] g=H g:@ g9 S g7fs r 5s r g5># g4" g1(]. L]-3 B]+4])WA]'#?]$N g!6# g ?'7q g/ g'+ g g g g >QKTowards an Interchange Standard for Editable Documents11.IntroductionInterscript provides a means of representing editable documents that is independent ofany particular editor and can therefore be used to interchange documents among editors.The basis of Interscript is a language for expressing editable documents, or scripts.Scripts are created by computer programs (usually an editor or associated program); scriptsare "compiled" by programs to produce whatever private or file format a particular editoruses to represent editable documents.1.1. Rationale for an interchange standardAn editing program typically uses a private, highly-encoded representation for documentsin order to meet its performance and functionality goals. Generally, this means that differenteditors use different, incompatible private formats, and the user can conveniently edit adocument only with the editor used to create it. This problem can be solved by providingprograms to convert between one editor's private (or file) format and another's. However, aset of different editors with N different document representations requires N(N-1) conversionroutines to be able to convert a document directly from each format to every other.This N(N-1) problem can be reduced to 2(N-1) by noticing that we could write N-1conversion routines to go from F1 (format for editor1) to F2,. . .,FN, and another N-1 routinesto convert from F2,. . .,FN to F1. Except when converting from or to F1, this scheme requirestwo conversions to go from Fi to Fj (j=i); this is a minor drawback. Choosing which editorshould be editor1 is a more critical issue, however, since the capabilities of that editor willdetermine how general a class of documents can be interchanged among the N editors.This presents a truly difficult problem in the case that there is no single functionallydominant editor. If the pivotal editor1 doesn't incorporate some of the structures, formats, orcontent types used by others, then it will not be possible to faithfully convert documentscontaining them. Even if we had a single editor that was functionally dominant, it would placean upper bound on the functionality of all future compatible editors. Since there are no actualcandidates for a totally dominant editor, we have chosen instead to examine in general whatinformation editors need and how that information can be organized to represent generaldocuments.Since we are not proposing an editor, we do not need to design a private format for itsdocuments; we only need an external representation that is capable of conveying thestructure, form, and content of editable documents. That external representation has only onepurpose: to enable the interchange of documents among a different editors. It must be easyto convert between real editors' formats and this interchange encoding.Using a standard interchange encoding has the additional advantage that much of theinput and output conversion algorithms will be common to all conforming editors. In fact,when adding a new version of a previous editor, the only differences in the new version'sconversion routines will be in the areas in which its internal format has changed from itsprevious form; this represents a significant saving of programming. Finally, no specialroutines or human procedures would be needed to upgrade documents to a new version of;frX6Gf _q [r? YW Vg5sr T[ R%4 Q% LqrXq% Hr#5 FN E N C@%3 AuY ?R =S :nF 8 8t8r8t8r8t8r8t8r 66Lt6r6Lt6r6Lt6r%6Lt6r 54t5r4t5rur- 3C2t3Cr; 1y*) . K ,<&+t,Q]vTowards an Interchange Standard for Editable Documents2an editor, since each conforming editor will be capable of understanding and producing theinterchange representation anyway.1.2. Properties that any interchange standard must haveAn interchange encoding for editable documents must satisfy a number of constraints.Among these are the following:1.2.1. Universal character setScripts must be encoded using the graphic (printable) subset of the ISO 646 printingcharacter set. As well as the obvious rationale that these characters are guaranteed not tohave control significance to any devices meeting the ISO standard, it has the additionaladvantage that a script is humanly readable.1.2.2. Encoding efficiencySince editable documents may be stored as scripts, may be transmitted over a network,and must certainly be processed to convert them to various editors' private formats, it isimportant that the encoding be reasonably space-efficient.Similarly, the time cost of converting between interchange encoding and private formatsmust be reasonably low since it will have a significant effect on how useful the interchangestandard is. (If the overheads were small enough, an editor might not even use a private fileformat for document storage.)1.2.3. Open-ended representationScripts must be capable of describing virtually all editable documents, including thosecontaining formatted text, synthetic graphics, scanned images, etc., and mixtures of thesevarious modes. Nor may the standard foreclose future options for documents that exploitadditional media (e.g., audio) or require rich structures (e.g., VLSI circuit diagrams, databaseviews). For the same reasons, the standard must not be tied to particular hardware or to a fileformat: documents will be stored and transmitted using a variety of media; it would be folly totie the representation to any particular medium.1.2.4. Document structureMany documents have hierarchical structure; e.g., a book is made of chapters containingsections, each of which is a sequence of paragraphs; a figure is embedded in a frame on apage and in turn contains a textual caption and imbedded graphics; and the description of anintegrated circuit has levels corresponding to modular or repeated subcircuits. The standardshould exploit such structure, without imposing any particular hierarchy on all documents. Hierarchy is not sufficient, however. Parts of documents must often be related in otherways; e.g., graphics components must often be related geometrically, which may defyfrX6 f g_'3 g]K" gXUqrXq2Tr; gS gNsrXsKrG gIE gGB gF$, gBsrXs>r9 g<H g::7: g5U g3)4 g2) g.srXs*r3$ g(H g'E g%5P g#jK g!F g0 gsrXsAr; gvW gA gS gK N g N >Q\xTowards an Interchange Standard for Editable Documents3hierarchical structuring, and it must be possible to indicate a reference from some part of adocument to a figure, footnote, or section in way a that cuts across the dominant hierarchy ofthe document (section 1.6.4).Documents often contain structure in the form of indirection. For instance, a set ofparagraphs may all have a common "style," which must be referred to indirectly so thatchanging the style alone is sufficient to change the characteristics of all the paragraphs usingit. Or a document may be incorporated "by reference" as a part of more than one documentand may need to "inherit" many of its properties from the document into which it is beingincorporated at a given time. 1.2.5. Document form and contentThe complete description of a document component usually requires more than anenumeration of its explicit contents; e.g., paragraphs have margins, leading between lines,default fonts, etc. Scripts must record the association between attributes and pieces ofcontent.The contents of a document must be represented by a rich space containing scalarnumbers, strings, vectors, and record-like constructs in order to describe items as varied asdistances, text, coefficients of curves, graphics constraints, digital audio, scanned images,transistors, etc. Attribute values should also be described in this rich value space.1.2.6. Transcription fidelityIt must be possible to convert any document from any editor's private format to a scriptand reconvert it back to the same editor's private format with no observable effect on thedocument's form, structure, or content. This characteristic is called transcription fidelity, andis a sine qua non for an interchange encoding; if it is not possible to accomplish this, theinterchange encoding or the conversion routines (or both) must be defective.1.2.7. Comprehending scriptsEven complicated documents have simple pieces. A simple editor should be able todisplay document components that it is capable of displaying, even in the presence ofcomponents that it cannot. More precisely, an editor must, in the course of internalizing ascript , be able to discover all the information necessary to recognize and to display the partsthat it understands. This must work despite the fact that different editors may well usedifferent data structures to represent the structure, form, and content of a document.At a minimum, this requires that a script contain information by which an editor can easilydetermine whether or not it understands a component well enough to display or edit it, andthat it be able to interpret the effect that components that it does not understand have on theones it does. For example, if an editor does not understand figures, it should still be possiblefor it to display their embedded textual captions correctly, even though a figure might welldictate some of its caption's content or attributes such as margins, font, etc. ;frX6Gf _4) ]K*4 [ X-' VD? Ty T R@ PR O JsrXs Gr8 EL C? B% >1 <> ;@ 9T 5C 1srXs .Mr: ,B *Fsr (s r+ '#L "#s"#rX"#s 85%rHB8 m;+RF2m' 4M?s Fnr 9/ZD/  =RdX C5L@C 6(TD:( 5]M@] ;;,C  q  q* C  * A5 z   8 3P  >Q^Towards an Interchange Standard for Editable Documents4This constraint requires that an interchange encoding must have a simple syntax andsemantics that can be interpreted readily, even by low-capability editors. Along with thedesire for openendedness (section 1.2.3), this suggests a language with some form of"extension by definition" built around a small core.1.2.8. RegenerationProcessing a script to internalize it correctly is only half the problem. It is equallyimportant that an editor, in externalizing a script from its private document format be able toregenerate the structure, form, and content carried by the script from which the documentoriginally came.This problem is much less severe when an editor is transcribing a document that it"understands" completely, e.g., because the entire document was generated using thateditor. However, when regenerating a script from an edited document, it should be possibleto retain the structure in parts of the original script that were not affected by editingoperations. For example, an editor that understands text but not figures should be able toedit the text in a document (although editing a caption may be unsafe without understandingfigures) while faithfully retaining and then regenerating the figures when transcribing from itsprivate format.1.3. What the Interscript standard does not doThere are a number of issues that the Interscript standard specifically does not discuss.Each of these issues is important in its own right, but is separable from the design of aninterchange representation1.3.1. Interscript is not a file formatThe interchange encoding of a script is a sequence of ASCII/ISO 646 characters. Thestandard is not concerned with how that representation is held in files on various media(floppy disks, hard disks, tapes, etc.), or with how it is transmitted over communicationsmedia (Ethernet, telephone lines, etc.). 1.3.2. Interscript is not a standard for editingA script is not intended as a directly editable representation. It is not part of its functionto make editing of various constructs easier, more efficient, or more compact: those are thepurview of editors and their associated private document formats. A script is intended to beinternalized before being edited. This rendition might be done by the editor, by a utilityprogram on the editing workstation, or by a completely separate service.1.3.3. Combining documents is not an interchange functionThis exclusion is really a corollary of the statement, "A script is not intended as a directlyeditable representation." In general, it is no easier to "glue" two arbitrary documentsfrX6 f^$_53^6_ g\=R]KY g[)[*[)+[< gY_$Y4 gUsrXs R"r/( gPWs r ' gNs r< gLIP%- gG; gEZ gCN gB%R g@[E g>G g< g7qrXq#vq4^r1( g2F g0 g,srXs)4rsr, g'iX g%@ g#) gsrXs)@r/0 gu%7 g\ gP gH gsrXs2 r@ g C B p>Q\3Towards an Interchange Standard for Editable Documents5together than it is to edit them.1.3.4. Interscript does not overlap with other standardsThere are a number of standards issues that are closely related to the representation ofeditable documents, but which are not part of the Interscript standard because they are alsoclosely related to other standards. For example, the issues of specifying encodings forcharacters in documents, how fonts should be named or described, or how the printing ofdocuments should be specified (i.e., Interpress) are not part of this work.;frX6Gf _! ZsrXs1 WrB U? S#4 R"C PWK> P>QHTowards an Interchange Standard for Editable Documents6HISTORY LOGEdited by Mitchell, September 1, 1981 3:12 PM, added first version of glossaryEdited by Mitchell, September 7, 1981 2:11 PM, wrote parts of introductionEdited by Mitchell, September 10, 1981 10:14 AM, added Tab def to Star property sheetsEdited by Mitchell, September 14, 1981 9:54 AM, renumbered chapters and did minor editsEdited by Mitchell, September 16, 1981 8:42 AM, folding in comments from JJH's review and added sections on renditionand transcription fidelityEdited by Mitchell, September 18, 1981 1:56 PM, folded in comments from JJH's reviewEdited by Horning, May 3, 1982 6:02 PM, Folded in comments from Truth copyEdited by Mitchell, May 10, 1982 3:28 PM, changed "Interdoc" to "Interscript", "rendering" to "internalizing", and"transcribing" to "externalizing" plus various edits necessitated by these substitutions.Edited by Mitchell, DDD, ExplanationfrX6 f&](qwqw g[]tFO gYK gXW gVgX gTVvS gQqU gOK gN#sLY gK>$ K=RJTowards an Interchange Standard for Editable Documents61.4. Concepts and Guiding Principles1.4.1. LayersThe Interscript standard is presented as a sequence of layers:Layer 0 defines the syntax of scripts; parsing reveals the dominant structure of thedocuments they represent.Layer 1 defines the semantics of the base language, particularly the treatment ofbindings and environments.Layer 2 defines the semantics of properties and attributes that are expected to have auniform interpretation across all editors.Various Layer 3 extensions will define the semantics of properties and attributes thatare expected to be shared by particular groups of editors.The present document focusses almost exclusively on Layers 0 and 1, although some of theexamples illustrate properties and attributes likely to be defined in Layer 2.1.4.2. Transcription and RenditionTranscription fidelity requires that any document prepared by any editor can beexternalized as a script that will then be internalized by the editor without loss of information.Ease of internalization requires that the Interscript base language contain only relatively few(and simple) constructs. We resolve this apparent paradox by including within the baselanguage a simple, yet powerful, mechanism for abbreviation and extension.A script may be considered to be a "program" that could be "compiled" to convert thedocument in the private representation of a particular editor, ready for further editing. TheInterscript language has been designed so that internalizing scripts into typical editors'representations can be performed in a single pass over the script by maintaining a few simpledata structures.1.4.3.Content, Form, Value, and StructureMost editors deal with both the content of a document (or piece of a document), and itsform. The former is thought of as "what" is in the document, the latter as "how" it is to beviewed. (E.g., "ABC" has a sequence of character codes as its contents; its format mayinclude font and position information.) Interscript maintains this distinction.Another useful distinction is between the value and the structure of either form or contentwithin a document. When viewing a document, only the value is of concern, but the structurethat leads to that value may be essential to convenient editing. An example of structure incontent is the grouping of text into paragraphs; in form, associating a named "style" with aparagraph.Content: Text and graphics are common special cases. Interscript's treatment of thesehas been largely modelled on that of Interpress. Other kinds of content may be representedby structures built from character strings, numbers, Booleans, and identifiers.fpX6 f g_qpq gZrprWp>]UM.&]S]Qq4]O]M?]K*]I<]H: pEt@ gCN g?rpXr<p$=% g:KE g8? g6#3 g4J1y1$ g/6' g-Q g,,1 g*N g&,r]X#"p rp0 g rpX g% spB gZP*rprp g": gSD gE g LrpE g 1* g O | p>Q\Towards an Interchange Standard for Editable Documents7Form: Interscript provides for open-ended sets of properties and attributes. Properties areassociated with content by means of tags. Attributes are name-value pairs that applythroughout a scope, and are placed in the environment by means of bindings. Contents arenot always present to be simply displayed as text. The way the contents of a document areto be "viewed" is determined by its properties; Interscript makes it straightforward todetermine what these properties are without having to understand them.Structure: Most editors structure the content of a document somehowinto words,sentences, paragraphs, sections, chapters; or lines, pages, signatures; or . . . . This assists inobtaining private efficiency, but, more importantly, provides a conceptual structure for theuser.Full transcription fidelity requires that the Interscript language be adequate to record anystructure that is maintained by any editor for either form or content. Of course, some editorsprovide a number of different structures. A general structure, of which all the editors weknow use special cases, is the labelled directed graph. Interscript provides this structure,without restricting the purposes for which it may be used. There are also two specializationsof general graphs that occur so frequently that Interscript treats them specially:Sequences: The most important, and most frequent, relationship between values islogical adjacency (sequentiality), which is represented by simply putting them oneafter another in the script.Ordered trees: Most editors that structure contents have a "dominant" hierarchy thatmaps well into trees whose arcs are implicitly labelled by order. (Different editors usethese trees to represent different hierarchies). Interscript provides a simple linearnotation for such trees, delimiting node values by braces ("{" and "}"). If an editormaintains multiple hierarchies, the dominant one is the one transcribed into the treestructure and used to control the inheritance of attributes.Content structure beyond that contained in the dominant hierarchy is represented by explicitlinks in the script; any node may be labelled as the source and/or the target of any numberof links. A link whose target is a single node uniquely identifies that node; links with multipletargets may be used to represent sets of nodes.Typical structures recorded for form are expressions (indicating intended relations amongattribute values) and sharing (representable by indirection). Interscript allows expressions tobe composed of literals, identifiers, operators, and function applications, and permits the useof identifiers to represent expressions.1.4.4. Features of the Base Language1.4.4.1 ValuesExpressions in a script may denoteLiteral values of primitive typesBooleans: F, TIntegers: . . . 3, 2, 1, 0, 1, 2, 3, . . .Reals: 1.2E5, . . .Strings: ;fpX6Gf _rp-r pr p ]K'rp [ rpr p rp YH W; V!F Rr p0 PJ OX MO IX HN FH*1 D}M B-0 @R>F = +';e9047A5=4:$rp #2rp)0< .MD ,rp!rp rp *Q (/ %|Y #P ! Q  ( rpXr  9p"! -   t  y>Q]8Towards an Interchange Standard for Editable Documents8Universal names: TEXT, XEROX, PARAGRAPHStructured valuesNodesVectors of valuesEnvironmentsGeneric operationsInvocationsApplicationsSelectionsOperations specific to particular typesArithmeticComparisonLogicalSubscript. . .BindingsLabelsTagsTargetsSourcesLink introductionsExpressions to be evaluated at the point of invocation1.4.4.2 Environments and AttributesEnvironments bind attribute identifiers to values (or expressions denoting values), invarious modes:"_" denotes a local binding, which may be freely superseded,"=" denotes a constant binding (definition), which may not be superseded within thecontaining node or any of its subnodes,We expect definitions to be used by sophisticated editors for such things as styles. Somescripts will come with a prefix containing non-standard property and attribute definitions thatare global to the document. There may be standard libraries containing definitions that allowcomplex documents to be edited in terms of properties and attributes understood by simplereditors.":=" denotes a global binding, which prevents the variable name from being reusedfor any other purpose.Null denotes the "empty" environment, containing bindings for no attributes. The (implicit)outermost environment binds each identifier id to the corresponding universal name ID(written with all capital letters).fpX6 fS_']\SZSXSV ]TySRh SPW SNF ]L'SJ SG SESCSA]?]=SS;AS90S7S5]26 g/rX#,p; g*N ](<]%=]$>'S" t8ptS NS ptAS}'3S]n'p:8n] p@ gDrp g#j >QVhTowards an Interchange Standard for Editable Documents9Each piece of content in a document has its own environment. Editors will use relevantattributes from that environment to control its form. Attributes may also be used in scripts fortwo purposes:abbreviation: an identifier may be bound to a quoted expression; within the scope ofthe binding, the use of the identifier is equivalent to the use of the full expression;indirection: reference through an identifier permits information (such as styles) to bedefined in one place and shared throughout its scope; this is an example of structure(which must be preserved) in the form of a document.1.4.4.3 InheritanceThe dominant hierarchy of a document is represented by grouping its pieces withinnodes, which are the most obvious form of content structuring. They also control the scopeof bindings.The environment of a node is initially inherited from its containing node (except for theoutermost node, which inherits it from the editor), and may be modified by bindings. Abinding takes effect at the point where it appears, and its scope extends to the end of theinnermost node containing it, with two exceptions:any binding except a definition may be superseded by a (textually) later binding (ifthe later binding is in a nested node, the outer binding's scope will resume at theend of the inner node), anda global binding extends over the entire document.Attributes are inherited only via environments following the dominant structure. Thus the choice of adominant structure to represent scripts from a particular editor will be strongly influenced by expectations aboutinheritance.Attributes are "relevant" to a node if they are assumed by any of its tags. In general, a node's environmentwill also contain bindings for many "latent" attributes that are either relevant to its ancestors (and inherited bydefault) or are potentially relevant to its descendants.The interior of each node is implicitly prefixed by Sub, which will generally be bound in the containingenvironment to a quoted expression performing some bindings, applying some labels, and/or supplying somerepeated content.1.4.4.4 ExpressionsExpressions involving the four infix ops (+, , *, /) are evaluated right-to-left (a la APL);since we expect expressions to be short, we have not imposed precedence rules.Parentheses are used to delimit vector values. Square brackets are used to delimit theargument list of an operator application and to denote environment constructors, whichbehave much like records.;fpX6Gf _> ]KD [ YL+rpW).Uprp +S5R"4 NrX Kap; I1) G DZ0) BA @A >2<#1;<96$7B2 3t8pt 2)Q! 0 -z75 +,G *r8 'Fh %Q $> !rX p$9 N dB P  >QS Towards an Interchange Standard for Editable Documents10The notation for selections (conditionals) is borrowed from Algol 68:( | | )This is consistent with our principles of using balanced brackets for compound constructionsand avoiding syntactically reserved words; the true part and false part may each contain anarbitrary number of items (including none). 1.4.4.5 Tags and LabelsA tag is written as a universal name followed by $''. A tag, t, labels a node that containsit with its associated properties and also indirectly refers to the component of theenvironment with the name "defaults.t". Properties are either present in a node or absent,whereas attributes have values that apply throughout a scope.Layer 2 of the standard will be primarily concerned with the definition of a small set ofstandard properties that are expected to be shared among all conforming editors. For eachstandard property, it will describethe associated tag that denotes it,the assumptions it implies about the contents (values that must/may be present andtheir intended intepretation, invariant relations that are to be maintained, etc.),the assumptions it implies about the environment (attributes that must be present andtheir intended intepretation). A label L! on a node makes that node a target of the link L (and its prefixes); a label L@makes it a source. The "main" identifier of a link must be introduced (using id@!) at the rootof a subtree containing all its sources and targets, and textually preceding them. Each linkrepresents a set of directed arcs, one from each of its sources to each of its targets. Multipletarget labels make a node the target of multiple links. Labels provide a very generalmechanism for recording structure, such as cross-references, not captured by linear order orthe dominant hierarchy.1.4.5. Comprehending scriptsThe Interscript standard applies to interchange among editors with widely varyingcapabilities. It will be important to define some structure to the space of possibilities, just asInterpress has for printable documents. Dimensions in which we foresee reasonable variationsin script comprehension are:Abbreviations: only editor-supplied  defined in document.Dominant structure: single-layer  arbitrary.Other structure: no links or indirections  links and indirections preserved.Bindings: Local only  const (=), and global (:=).Selection: No conditionals  conditionals.Numbers: Integers only  floating point.See section 2.4 for further details.fpX6 f_E]\' pZC&6 gXx#8 gV, gS_rXOp>up gN##M$ gLX$upsp# gJ#spG.sp gEQP gC#]AR#]?R]=vS];AC]96(S g4^V g2 P g0+5 g.M g-3I g+i g'FrpXr#p/" g" Ksp  g ?Q gt]@:] -]M]2]n*]:( $ >QZTowards an Interchange Standard for Editable Documents111.4.6. Internalizing a ScriptThe private representations of low-capability editors are not generally adequate to providea full-fidelity internalization of every script that results from externalizing a documentprepared by a high-capability editor. Thus, when internalizing a script, some information maybe lost. The Interscript language has been designed to simplify value-faithful internalization,even if structure is lost, and content-faithful internalization, even if form is lostor theconversion of form to additional content to allow it to be examined (and perhaps even edited)by a low capability-editor. The standard provides some simple conditions under which a low-capability editor can safely modify parts of a document that it understands fully, withoutthereby destroying the value or structure of parts that it is not prepared to deal with.A script may be internalized into an editor's (private or file) representation as follows:Parse the entire script from left to right.As each literal is encountered in the script, convert it to the editor's representation.As each abbreviation (free-standing invocation) is encountered in the script, replace itwith the value to which it is bound in the environment.As each structure is recognized in the script, represent the corresponding structure inthe editor's representation, if possible; if not, use the semantics of Interscript tocompute the value to be internalized.Update the environment whenever a binding is encountered or a scope is exited,according to the semantics of Interscript.Transfer the values of all attributes relevant to each piece of content from the currentenvironment to the editor's representation, if possible; if not, apply an invertiblefunction to convert the attribute-value binding into additional content.Determine the properties of each node from its tags; this list will be complete at theend of the node. A node is viewable if any of its tags denotes a property in the set ofthose the editor is prepared to display; it is understood if they are all in the set ofthose the editor is prepared to edit.Record the sources and targets of all links; for any link, these lists will be complete atthe end of the node in which its main identifier was introduced. Translate each link tothe corresponding editor structure, according to the properties of the node thatintroduces it.Of course, any process yielding an equivalent result is equally acceptable.;fpX6Ff _rpr [p= YO X%8 VDN TyX R+2 PH O> MO9 I?G+Et;C@/)A7?d8=C<%9N89*6G4^+)2H0*,.rp4-3/r p+%)WD'O& -#$a !K !|>QFTowards an Interchange Standard for Editable Documents12HISTORY LOGEdited by Mitchell, May 10, 1982 3:28 PM, changed "Interdoc" to "Interscript", "rendering" to "internalizing", and"transcribing" to "externalizing" plus various edits necessitated by these substitutions.fpX6 f&](qvqv g[]tFsZ Y Y:,'Towards an Interchange Standard for Editable Documentsby Jim Mitchell and Jim HorningMay 10, 1982 4:13 PMFile: Interdoc-1.5.bravoWp{6 gLXqX gI  gE gC9Towards an Interchange Standard for Editable Documents15011.5.Introduction to the Interscript Base LanguageThis section is intended to lead the reader through a set of examples, to show what thelanguage looks like and how it is used to represent a number of commonly occurring featuresof editable documents. The examples purposely use rather long identifiers and lots of whitespace to make them more readable. In actual use, programs, not people, will generate andread scripts; names will tend to be short and logically unneeded spaces and carriage returnswill tend to be omitted.1.5.1. Simple text as a documentThe following script defines a document consisting of the string "The text of the mainnode of example 1.5.1"; no font, paragraph structure, or formatting information is supplied.This example will gradually be expanded to represent accurately figure 1.5.1, below. Thenumbers at the left margin do not form part of the script; they are used to refer to the variouslines in the discussion below.0Interscript/Interchange/1.01{}Line 0 is the header denoting version 1.0 of the interchange encoding. Line 1 is the entirebody of this script: it contains a single node enclosed in {} which in turn contains a singlestring value enclosed in <>.The text of the main node ofexample 1.5.1The text of the first subnode of example 1.5.1Example 1.5.1: A simple documentThe next version of the example adds the tag, TEXT$ to the node. The identifier TEXT iscalled a universal name (or atom), which is indicated by its being composed of all uppercaseletters. Universal names have no definition within the base language (they are expected to bedefined in Layers 2 and 3).0Interscript/Interchange/1.01{TEXT$23}A tag is denoted by placing "$" after a universal name. A node's tags are strictly local(they are not inherited by other nodes in the script) and serve as "type information" aboutthe node. The tag TEXT$ labels this node as one that can be viewed as textual data. Tags alsocreate implicit indirections; see section 1.6.5.0Interscript/Interchange/1.01{PARAGRAPH$2leftMargin_3.25*inch rightMargin_5.0*inch34}This example shows how auxiliary information, such as margins, may be associated with anode of a script. The binding leftMargin_3.25*inch adds the attribute leftMargin to the node's;fsX6EAf _q- [sW YH X-. VDF Ty@ R NtsXt KsQ IPB G: E R C Au @7F. =sM ;*ts& :'6uF5U $3vu 0r -Vs)tsusus +t s; )E ' %u $> "F, !} .sus: c-. us. r 0 Su  F) ],  sU  tus u s  v  y>Q]7Towards an Interchange Standard for Editable Documents1502environment and binds the value of the expression 3.25*inch to it (inch is a constant whosedimensions are inches/meter; meters are the standard Interscript units of distance). Thebindings to leftMargin and rightMargin convey the fact that this node has margins for display. Todenote the change in character of the node, we have tagged it as PARAGRAPH instead ofTEXT. Figure 1.5.1 uses these margins for its first line of text. 0Interscript/Interchange/1.01{PARAGRAPH$2leftMargin_3.25*inch rightMargin_5.0*inch34{PARAGRAPH$ leftMargin_+0.5*inch56}7}We have further elaborated the example by nesting another text node in the primary one,with its text following the primary node's text and with an indented leftMargin. The bindingleftMargin_+0.5*inch is a contraction of leftMargin_leftMargin+0.5*inch. The right side of the binding isevaluated, and since there is as yet no binding in the inner node's (lines 46) environment forleftMargin, it is looked up in the environment of the containing node (lines 13). The value ofthe right hand side expression is thus 3.75*inch. This value is then bound to the identifierleftMargin in the inner node's environment. Since no value is bound to rightMargin in the innernode's environment, it will have the same rightMargin as its parent node.0Interscript/Interchange/1.01p='PARAGRAPH$ leftMargin_3.25*inch rightMargin_6.0*inch'2{p rightMargin_5.0*inch34{p leftMargin_+0.5*inch56}7}One can also define an abbreviation by binding a sequence of unevaluated expressions toan identifier and subsequently using the identifier to cause those expressions to be evaluatedat the point of invocation. This example binds the quoted expression'PARAGRAPH$leftMargin_3.25*inchrightMargin_6.0*inch' to the identifier p. The binding operator is =instead of _ to denote the fact that this binding may not be superseded in this node or any ofits subnodes; for this reason such a binding is called a definition. When p is invoked in lines 2and 4, the quoted expression replaces the invocation and is evaluated there.Invoking p places the tag PARAGRAPH$ on the node, sets the leftMargin to 3.25*inch and therightMargin to 6.0*inch. In line 2, the rightMargin is then rebound to 5.0*inch, overriding the defaultbinding created by invoking p. Similarly, the binding for leftMargin in line 4 overrides the oneresulting from invoking p, resulting in its leftMargin being 3.75*inch and its rightMargin being 6.0*inch. An identifier can also be bound to an environment value as a convenient record-likemanner of naming a set of related bindings. For example, a font might be defined as follows(a more complete definition is given later in section 1.6.3):fsX6 f g_usus g]KT g[ u su s+ gY:us gWus>UpuPSP R"PF)PzP,NM,0KJPFsR gD#"u s gCusus  gAR[ g?u sH g=us, g;u s=u s g:'*u s7uP6PF84^P2P,1/h0-,<P(st s% g'#<" g%X)*t g#u6susu g!s us0" g9t sus  g.Lusu su sus gu susu sus g'us u s g\usu susu susts g !: gU= ` >QYTowards an Interchange Standard for Editable Documents1503font = [ | family_TIMES size_10*pt face_[ | weight_NORMAL style_ROMAN slant_NIL] ] This defines font to be the environment formed by taking the empty or Null environmentand altering it according to the series of bindings following the initial "[ |." In this case font isan environment having bindings for three attributes, family, size, and face. face is itself bound toan environment (with attributes weight, style, and slant). Since font is bound using "=", it cannotdirectly be changed in its scope, although its components can be since they are bound using"_". The set of default bindings in font specify a normal weight (non-bold), non-italic TimesRoman 10-point font.We can incorporate this font definition in the example and then use it to indicate that theword "first" in the subnode should be in italics:0Interscript/Interchange/1.01p='PARAGRAPH$ leftMargin_3.25*inch rightMargin_6.0*inch'2font = [ | family_Times size_10*pt face_[ | weight_NORMAL style_ROMAN slant_NIL] ]3{p rightMargin_5.0*inch45{p leftMargin_+.5*inch67font.face.slant_ITALIC font.face.slant_NIL8< subnode of example 1.5.1>9}10}Bindings affect node contents to their right: so, "first" will be italic, while "subnode ofexample 1.5.1" will be non-italic due to the binding immediately preceding it. If we expectedto switch between italics and non-italics frequently, it might be profitable to introduceabbreviations to shorten what must appear. For example, in the scope of the definition l=[ | i='font.face.slant_ITALIC' nI='font.face.slant_NIL'] line 7 could be abbreviatedl.il.nI1.6.Further ExamplesThis section gives some more realistic examples of the use of the Interscript languageand explores the issues of making sets of standard definitions for use in scripts.1.6.1. A Laurel MessageHere is a possible Interscript transcription of a Laurel message:0Interscript/Interchange/1.0-- standard heading --1{LAURELMSG$-- tag for a Laurel document --2Sub='PARAGRAPH$ leftMargin_1.0*inch rightMargin_7.5*inch'3justified_F-- "_" means overridable default --4font.family_TIMES font.size_105leading.x_16leading.y_1-- overridable default leadings --7heading@! -- declare a label --8laurelInfo =-- Laurel information for easy access; none is changeable --9(Heading.time@ Heading.from@ Heading.subject@ Heading.to@ Heading.cc@);fsX6EAf _9uFS [s usE Z Kus us XU4usususus V usususus T[ R us5 Q+ MI K1 Iu H6F8 FR Et D, Bx ARx ?x2 >x =/x ; 8sM 6%8 4> 3 K 0XuF8 .s +u &qX #$s2$ !YR 7tsXt sA mu0<F   - x9 Kx -# x Cx x (" x - x +F< F  >Q]nLTowards an Interchange Standard for Editable Documents150410{ {Heading.time! <18 June 1981 9:18 am PDT (Thursday)>}11 {Heading.from! AUTHENTICATED$}12 {Heading.subject! }13 {Heading.to! }14 {Heading.cc! }}15leading.y_6-- override outer y leading --16{}-- node which is a paragraph --17{}18{}19}Line 1 tags this document (by tagging its root node) as a Laurel message, and line 2 tags itssubnodes (starting on lines 10, 16, 17, and 18) as paragraphs with default margins. Lines 36bind some other attributes, likely to be relevant to paragraphs. Line 7 declares the main linkidentifier heading, and lines 89 bind to laurelInfo a vector of source links whose targets are theparts of the document of interest for mail transport. Lines 1014 have similar structures: eachconsists of a string followed by a node containing a target link for the label heading and textfor that Laurel "field." Line 11 is additionally tagged as AUTHENTICATED. Lines 1618 containparagraphs constituting the body of the message.Alternatively, the external environment might well contain a definition of laurel60 thatestablishes a suitable environment for a Laurel 6.0 document:1laurel60= '2time@! from@! subject@! to@! bodyNodes@! cc@!3LAURELMSG$4cr = <#13#> tab = <#9#>5p='PARAGRAPH$ leftMargin_1.0*inch rightMargin_7.5*inch'6justified_F7font.family=TIMES font.size=108margins.left_2540 margins.right_190509leading.x_1 leading.y_1-- overridable default leadings --10printForm=11 '{p time@ tab12 from@ cr13 subject@ cr14 to@15 leading.y_616 bodyNodes@17 cc@18 }'19heading = 'LAURELHEADING$ Sub_'TEXT$ LAURELFIELD$' '20body = 'Sub_'p bodyNodes!' '21'One advantage of using source labels for the "bodies" of the To:, From:, etc. fields (lines1114, 17) is that they can represent sets of nodes as well as single nodes.Now the Laurel document would be described by the following script:22Interscript/Interchange/1.0-- standard heading --23{laurel60% -- invoke Laurel 6.0 definitions24 {heading%-- invoke heading style --25 {time! <18 June 1981 9:18 am PDT (Thursday)>}26 {from! AUTHENTICATED$ }fsX6 f_9uF?]5\w9[!Y0XU /!V/!UT3RP gOsM gMJ gKO gJ# usu s gHY_ gF/us gD:u s gB0?Kus g==;AuPF 9-7 6K473C 1/%.q,", +E)'&O$#!YP  4c gs5& gM gTCuP/!FP , ;P , P3 yP* 2 V>Q]UTowards an Interchange Standard for Editable Documents150527 {subject! }28 {to! }29 {cc! }30 }31 {body%-- Invoke body style --32 {}33 {}34 {}35 }36}Invoking laurel60 in line 23 introduces the quoted expressions heading and body into the rootnode's environment, tags it as LAURELMSG and declares the labels time, from, etc. It alsoacquires a definition for a print form, which could be used to format the message for sendingto a printer. The "%" (indirection) operator indicates that this is intentional structure, to bepreserved by each internalization, rather than merely an abbreviation. Thus the messageheading and body should "see" the effects of any future changes made to laurel60, byediting its definition. By contrast, p is used as an abbreviation; when the script is rendered, itsvalue may safely be copied at each use.Look at the definition of heading (line 19): the right side is a quoted expression sequence.The first expression of the sequence produces the tag LAURELHEADING$ and the secondbinds the quoted expression 'TEXT$ LAURELFIELD$' to Sub. As a result, each subnode of theone beginning on line 24 will be initialized by invoking Sub from its containing node, whichgives each the tags TEXT$ and LAURELFIELD$.Similarly, the definition of body (line 20) defines Sub, and the nodes on lines 3234 will beinitialized by invoking p and having the target link bodyNodes placed on it. Labelling the set ofbody nodes this way means that the source link, bodyNodes@, in printForm (line 19) denotes theentire sequence of body nodes, in left-to-right depth-first tree order.1.6.2. A page of a Star documentThis example is taken from page 71 of the Star Functional Specification and shows onepage of a paginated document with a diagram and a footnote (we recommend that you havethat page in front of you when analyzing this transcription):-- pages 1 .. 6 supposedly precede this one --{pg.a7!Sub_'PARAGRAPH$'{{fn.n1!-- just a unique label: fn! introduced somewhere earlier --FOOTNOTE$}< which has shown our techniques to be valid. Other data can be collected by future changes to your accounting andbilling packages, which will allow us to perform even better analyses and lead to better problem discovery andcorrection.>}{};fsX6EAf _9uF+ ] \w& [ Y+F XU V U T3 R Osus.usus Mususus KD J#;% HYO F= D%uws! Bts" ?us; =-u s ;usu sus! :'us 8]usu s 4usus# 3 usus# 1U0u sus /G +itsXt 'sF &,V $a= " uF. H:k#;klk/lksXuF)km ,sXuF?nn 1ysXuF,sXuF522  T  >Q[:lTowards an Interchange Standard for Editable Documents1506Sub_'FRAME$'-- change to subnode tag FRAME --{Alignment.horizonally_FlushLeft Alignment.vertically_Floatingheight_2.8*inch width_3.67*inchedges.expandingRightEdge_Tborder_dots1-- change to default subnode environment Rectangle with solid, double width outline --Sub_'RECTANGLE$ lineType.width_2 lineType.style_solid'rect@!-- declare label class to be used below --{rect.a1! UpperLeft_(.0254 .07)shading_7 height_.01 width_.027{Title } }{rect.a2! UpperLeft_(.073 .015)height_.01 width_.018{Title } }height_.013-- attribute value shared by following subnodes{rect.a3! UpperLeft_(.02 .03)width_.025{Title } }{rect.a4! UpperLeft_(.02 .03)width_.028{Title } }{rect.a5! UpperLeft_(.042 .055)width_.016{Title } }{rect.a6! UpperLeft_(.067 .055)width_.016{Title } }-- default subnode environment is LINE with solid, double width outline --Sub_'LINE lineType.width_2 lineType.style_solid'ln@!{ln.out1!rect.a1@ln.in34@}{ln.out2!rect.a2@ln.out1@}{ln.in3!ln.in34@rect.a3@}{ln.in4!ln.in34@rect.a4@}{ln.in34!ln.in3@ln.in4@}{ln.out4!rect.a4@ln.in56@}{ln.in56!ln.in5@ln.in6@}{ln.in5!ln.in56@rect.a5@}{ln.in6!ln.in56@rect.a6@}}-- end of Frame1 --Sub_'PARAGRAPH$'-- restore default subnode initialization to PARAGRAPH --{}{}}-- end of page --1.6.3. Some Star property sheetsHere a few of the definitions invoked in the above example (these were derived from page148 of the Star Functional Specification). Some of them simply give default values for variousattributes; some, like default.font, define a collection of related attributes as an environment;and most are quoted expression sequences for providing abbreviations or "decorating"nodes with tags and their environments with relevant attributes. These definitions would existin the external environment for Starproduced scripts. They would be made accessible toother editors as part of the definition of XEROX.Star.Version1.1.6.3.1. Font-related defaults and definitionsbaseline_0-- the base line for characters --underlined_F-- whether or not text in node is to be underlined --strikeOut_F-- whether or not text in node is to have strike-out line through it ---- there is no rhyme and little reason behind the names of type fonts. The following definition is intended to provide enoughchoice, using standard "terms" to name any existing font in an arbitrary font catalog (of course, it doesn't, but perhaps it isclose enough) --default.font = [ |-- Definition --family_Times-- a font family name --fsX6 fP_9u "F!P]>\[ZC YVW6V"*UM%56T%51R ?/Q%5 ,PW%5 ,O%5 ,M%5 ,LJKa0J#HuuG?uuEuuD7uuBuA.uu?u>&uu<uuP;AP:'9P8KsXuF*P7&sXuFIP5P4^OsXuF#P2ZsXuFP1y30;S g,5&4' g$4a%5T@$%5 g#n#j#6#jL g!I5i!RA!I! g~$61~! g,Z ? gtsXt%u F"\ I5( IGEsXuF3L   ?P * ? , >Q]Towards an Interchange Standard for Editable Documents1507face_[ |-- Definition --weight_NORMAL-- In (EXTRALIGHT, LIGHT, BOOK, NORMAL, MEDIUM,DEMIBOLD, SEMIBOLD, BOLD, EXTRABOLD, ULTRABOLD,HEAVY, EXTRAHEAVY, BLACK, GROTESQUE) --lineType_SOLID-- In (SOLID, INLINE, OPEN, OUTLINE, DISPLAY, SHADED) --proportions_NORMAL-- In (NORMAL, CONDENSED, EXPANDED, EXTENDED,WIDE, BROAD, ELONGATED) --style_ROMAN-- In (ROMAN, GOTHIC, EGYPTIAN, CURSIVE, SCRIPT) --slant_NIL-- In (NIL, ITALIC, OBLIQUE) --swash_F-- T => use swash capitals --lowercase_T-- T => use lowercase letters --uppercase_T-- T => use uppercase letters --smallCaps_F-- T => use small capitals --]size_10*pt-- distance --]-- some useful font shorthands: --Helvetica = 'font _ [default.font% | family_HELVETICA]'Italic = 'font.face.slant_ITALIC'Bold = 'font.face.weight_BOLD'Helvetica10BI = 'Helvetica font.size_10*pt Bold Italic'1.6.3.2. Footnote-related definitionsfnCount:=0-- global variable for counting footnotesFOOTNOTE = 'fnCount:=+1 font.size_8*pt FootnoteRef%'FootnoteRef = '{FOOTREF$ baseline_+5*pt fnCount}'-- raise 5 pts --1.6.3.3. Paragraph-related definitionsTab = [ |position_0type_LEFT-- In (LEFT, CENTERED, RIGHT, DECIMAL) --]MakeTabs='n_0 tabs_(RecursiveMakeTab[Value])'RecursiveMakeTab='(EQ[Value 0] | NIL | n_+.25*inch [Tab | position_n ] RecursiveMakeTab[Value-1])'Default.PARAGRAPH = 'Indent = [ | Left_0.0 Right_0.0]-- distance --Alignment_FLUSHLEFT-- In (FLUSHLEFT, FLUSHRIGHT, BOTH, CENTERED) --Justified_Fleading_[leading | between_1*pt above_12*pt below_0]charStyle_[|Normal_'font_default.font'Emphasis1_'font_default.font Italic'Emphasis2_'font_default.font Bold']Hyphenation_FKeepOn_NIL-- In (NIL, SamePageAsNextParagraph) --MakeTabs[8]-- binds tabs to a sequence of 8 tabs (0, .25 inch, .50 inch, . . .) --charStyle.Normal-- initializes to normal style1.6.3.4. frame, rectangle, and line definitionsDef.UpperLeft = 'UpperLeft_(0.0 0.0)'-- Def is just a convenient environment in which to put useful auxiliarydefinitions --Def.lineType = 'lineType_[ |Visible_T;fsX6EAf_9uFx] /!Z\w/!Z['xY 8xXU-&PVxU 3xT3xRxQq xP xN xMOK ( J HY" F7 E! D7 B7 >tsXt <u F) ;4 81+F 4tsXt 2uF12 /).q ,<- *c (P(5+F(P{-(P{0<(P{2(P( &`&'F&{!Z&{#&{&P&{(&'F0 %`%% $.`$.u$6 "`"#$ !l`!lx!l D! ` x X b$ `x6" J`JxJO `@ `~ n{{d{{!Z{#{&P' `~[ n{{d{{!Z{#{&P[>A[sXBa[uCH[F ` {d{{!Z{#{&P  tsXt& uF%!ZH   x *  8]WTowards an Interchange Standard for Editable Documents1508Width_1Style_SOLID]-- IN (SOLID, DOT, DASH, DOTDASH, DOUBLE, . . .) --'Def.Shading = 'Shading_0'Def.Box = 'Def.UpperLeft Def.lineType Def.Shading'Frame = 'FRAME$ Def.Box'Rectangle = 'RECTANGLE$ Def.BoxConstraint_MagnifyOnly-- IN (NIL MagnifyOnly) --'Def.LineEnd = 'LineEnd_(LeftUpper_Flush RightLower_Flush)-- IN (Flush Round Square arrow1 arrow2 arrow3) --'Line = 'LINE$ constraint_FixedAngle Def.lineType Def.LineEnd'Title = 'CAPTION$ Paragraph'1.6.4. Using linksLinks are intended to provide the means for associating nodes in non-hierarchical ways.They can be used for referring to figures, examples, tables, etc., for describing tables ofcontents, for denoting index items, keeping lists, etc. 1.6.4.1. References to figuresThe following outlines how the labelling facilities and global bindings can be used togenerate references to (source links for) a figure whose number may not be known at thepoint of reference. The identifier n5 is assumed to have been generated by the program thatproduced the script and is assumed to be unique over the target labels with naming prefix"figures." in the script.figures@! figCount:= 0-- should appear in a script's root node --makeFigureNum = 'HIDDEN$ figCount:=+1 figCount'{. . . figures.n5@ . . .}-- ref to node with label figures.n5! --{ . . . {figures.n5! makeFigureNum} . . .}-- a hidden node holding the figure number --The node in which the figure number for figure n5 is defined contains a tag, HIDDEN,which means that the node is not to be considered a part of the dominant structure fordisplay purposes even though it is part of it. The node's sole content is the value of figCountafter it has been (persistently) incremented by 1. Because figCount is bound with ":=", thescope of the binding is global.1.6.4.2. Collections of index itemsAssume that the word "framble" is to be considered an index item in certain placeswhere it occurs in a document. The link class Indexable@! should be introduced at the root ofthe document, and each to-be-indexed occurrence of "framble" in a string, e.g., , should be replaced by the sequence framble% < is found, it . . .>.Somewhere in the script within the scope of the declaration of Indexable, at the root of asubtree containing all the uses of framble should be the following definition:fsX6 f_9u] *+F3P\wZCX4USPRE%5PPNPMO+,2PKI@G gCtsXt @sW g>JG g<8 g8]tsXt4s8 g3 9 g1U"9 g/? g-+iuF,+*/',(%*,-"PsMus g H gM gQ g& gtsXts2 g.u s gPu g2s%u(s g g>us g N \ U>Q] Towards an Interchange Standard for Editable Documents1509framble='{HIDDEN$ indexable.framble! pageNumber} 'Invoking framble results in the appearance of a hidden node containing the current pagenumber (assumed to be held in the attribute pageNumber) and labelled as being in the set oftarget links indexable and indexable.framble. The index for the document might then contain thefollowing entry for "framble":{INDEXENTRY$ indexable.framble@}This entry contains the minimal information needed to generate the sequence of pagenumbers corresponding to indexable occurrences of framble. If some occurrences areconsidered primary and some secondary, then these mechanisms can be generalized to haveframble defined asframble=[ | primary='{HIDDEN$ indexable.framble.primary! pageNum} 'secondary='{HIDDEN$ indexable.framble.secondary! pageNum} ']Primary references are denoted in the script as framble.primary% and secondary ones asframble.secondary%. Similarly, the index entry takes the form:{INDEXENTRY$ indexable.framble.primary@ indexable.framble.secondary@}1.6.5. Using indirectionsIndirections provide a way to centralize (and delay) the binding of information within adocument. They can be used to share information that is intended to be consistent. 1.6.5.1 Styles and style sheetsDocuments generally follow stylistic conventions for presenting different kinds of content.E.g., major headings may be in bold face with twelve points of extra leading, minor headingsin italic with six points of extra leading. If this information is explicitly bound for each piece ofcontent, then a stylistic change may require locating and changing all the relevant bindings(note that italic is likely to be also used for other purposes, such as emphasis). If, however,the binding is done indirectly, through a style, a single change will be effective for all placeswhere the style is referenced. Note that each occurrence of a tag implicitly establishes anindirection through the same identifier; this is convenient in associating styles withsemantically meaningful tags. For example:MajorHeading = 'PARAGRAPH$ Bold leading_+12'MinorHeading = 'PARAGRAPH$ Italic leading_+6'1.6.5.2 Technical termsTerminology may be undergoing change while a document is in production. For example,the previous version of this document used "mark" for what is now called "tag." One way todefer decisions on terminology, while ensuring that each version of the document is self-consistent, is to use an indirect reference for each occurrence of a term that may have to berebound later.;fsX6EAf _9uF; [sus+ Z ,u s% XU usus$ V T3uF* QsN O !us N#1& LXus JuFLxHE F$s!us DZus, BuFP >tsXt :s8 8S 4tX 12s0+ /h#9 -M +,0 *-ts (=*ts/ &sA $1Q2 "* uF, %- &tX s** A %4 TL   * B>QWTowards an Interchange Standard for Editable Documents1510HISTORY LOGEdited by Mitchell, September 1, 1981 3:12 PM, added first version of glossaryEdited by Mitchell, September 7, 1981 2:11 PM, wrote parts of introductionEdited by Mitchell, September 10, 1981 10:14 AM, added Tab def to Star property sheetsEdited by Mitchell, September 14, 1981 9:54 AM, renumbered chapters and did minor editsEdited by Mitchell, September 17, 1981 1:37 PM, folding in JJH's edits.Edited by Mitchell, September 18, 1981 12:45 AM, added considerable annotation of examples.Edited by Horning, May 4, 1982 12:30 PM, Fold in Truth Copy editsEdited by Horning, May 10, 1982 4:12 PM, changed "Interdoc" to "Interscript", "rendering" to "internalizing", and"transcribing" to "externalizing" plus various edits necessitated by these substitutions.fsX6 f&](qxqx g[]rFO gYK gXW gVgX gTH gS\ gQqB gOrNY@ Ni:FCTowards an Interchange Standard for Editable Documents202. The Language Basis: Syntax and Semantics2.1. GrammarOur notation is basically BNF with terminals quoted and augmented by the followingconventions:a sequence enclosed in [ ] brackets may occur zero or one times;a construct followed by * may occur zero or more times;parentheses ( ) are used purely for grouping.script::=versionId nodeversionID::="Interscript/Interchange/1.0 "item::=content | binding | labelcontent::=term | nodeterm::=primary | primary op termop::="+" | "" | "*" | "/"primary::=literal | invocation | indirection | application | selection | vectorliteral::=Boolean | integer | intSequence | real | string | universalinvocation::=namename::=id ( "." id )*indirection::=name "%"application::=( name | universal ) "[" item* "]"universal::=ucID ( "." ucID )*selection::="(" term "|" scope* "|" scope* ")"vector::="(" scope* ")"node::="{" scope* "}"scope::=( binding | label )* content content*binding::=name mode rhsmode::="_" | "=" | ":" | ":="rhs::=content | op term | "'" scope* "'" | "[" [ item* ] "|" binding* "]"label::=tag | linktag::=universal "$"link::=id "@!" | name "@" | name "!"2.2. Discussion of Features[Note that we have a formal semantic definition for this language that is every bit as preciseas the grammar above. However, we have not yet figured out how to present it in a form thathumans find equally palatable, so we postpone it to an appendix.]primary::=literalliteral::=Boolean | integer | intSequence | real | stringThe primitive elements by which the value of a document is represented.fpX6 f g_qpq( gZ pqVpC gT ]Rrp&]Pzrp]NF rp gJrpr gI prsr gG?pr gEtpr gCpr gApr g@prE g>Jpr; g< pr g:pr g8 pr g7 pr" g4)5U4>5Up4 _5Ur g3pr" g1pr g/pr g.*pr% g,_pr g*pr g(prC g'pr g%5pr g#jpr gtqpXq gIp^ gI gAA grpr gpr/pG4 K=XITowards an Interchange Standard for Editable Documents21term::=primary op termop::="+" | "" | "*" | "/"Both the primary and the term must reduce to numbers; the arithmetic operators areevaluated right-to-left (a la APL, without precedence) and bind less tightly than functionapplication. The result is a real if either operand is.invocation::=idId is looked up in the current environment; depending on its current binding, this mayproduce contents, bindings, and/or labels; if the rhs bound to id was quoted, that expressionis evaluated in the current environment. In the (implicit) outermost environment, every id isbound to the corresponding universal (ID).invocation::=name "." idQualified names represent lookup in "nested" environments; name must have been boundto an environment, in which id is looked up.indirection::=name "%"This indicates an intentional indirection through name, which should be preserved as partof the structure; replacing the indirection by its value in the current environment is a value-preserving loss of structural fidelity. (An invocation that is simply a name is an abbreviationthat need not be preserved.)universal::=ucID ( "." ucID )*Universals are like names, but written entirely in upper case letters. They are presumed tobe defined externally, so they are not looked up in the environment.application::=( name | universal ) "[" item* "]"If the application involves a universal (either explicitly, or because the name is bound to auniversal), the corresponding function is applied to the argument list that results fromevaluating item*. Part of the definition of Layer 2 will involve the specification of a small set ofstandard functions, which may be expanded in various Layer 3 extensions.If name is not bound to a universal, the current environment is temporarily augmented witha binding of the value of item* to the identifier value, and the value of the application is theresult of evaluating name in that environment; this allows function definition within thelanguage.Neither form of application changes the environment of succeeding expressions.selection::="(" term "|" scope1* "|" scope2* ")"This is a standard conditional item sequence, using syntax borrowed from Algol 68. Thevalue and effect are those of item1* if the term evaluates to "T" in the current environment,;fpX6Ff _r0pnr ]K0pnr Yprprp rp Xtp'rp VDrp Rr 0pnr O`pB M1rp rp K+-rp Jrprp Fr 0pnr Cp rp,rp ARrp =r 0pnr :npr prp# 8r p4 6+r prp 5 1r0pnr .* p rpA ,_D (r 0pnr" %|prp$rp #rp2 ! rpG  H rprp2  rprpr p rp@ J N fr0pnr$ prprp  * rprp rp   >Q](VTowards an Interchange Standard for Editable Documents22those of item2* if it evaluates to "F".vector::="(" scope* ")"Parentheses group a sequence of items as a single vector; bindings in scope* affect theenvironment of items to the right in the containing node, but labels have no meaning. node::="{" scope* "}"Nodes have nested environments, and affect the containing environment only throughglobal (:=) bindings to ids. Scope* is implicitly prefixed by an invocation of Sub, which may bebound to any sequence of items intended to be common to all subnodes in a scope.item*::=""The empty sequence of items has no value and no effect; this is the basis for thefollowing recursive definition.item*::=item1 item*In general, the value of a sequence of items is just the sequence of item values; bindingitems change the environment of items to their right in the sequence.binding::=name mode rhsThis adds a single binding to the current scope (i.e., to its associated environment);bindings have no other "side effects" and no value (i.e., they do not change the length of acontaining vector or node value).binding::=name mode op term"name mode op term" is just a convenient piece of syntactic shorthand for"name mode name op term".rhs::="'" scope* "'"A quoted rhs is evaluated in the environment of invocation, rather than the environmentcurrent at the point of binding.rhs::="[|" binding* "]"This creates a new environment value that may be used much like a record.rhs::="[" item* "|" binding* "]"This creates a new environment value that is an extension of the environment that is thevalue of item*.tag::=universal "$"fpX6 f g_rprp g[rpr X2prp rprprp gVgrprprprp gRrpr Op@rp gM rprprpr prp gK rp rp) gH|rprE prprp gC@ g?rpr <\p#rprp g:rprp g7rpr 3p rp< g1rp$rp/ g0 rprp g,rpr)4p6 g'irp g#rpr  prpr p grp gIrprpI gerprp/) g)rp g rpr R p>Q\/Towards an Interchange Standard for Editable Documents23This gives the containing node the property denoted by the universal.link::=id "@!"This introduces the set of links whose main component is id, and defines their scope.link::=name "@"This identifies the immediately containing node as a source of the link name.link::=name "!"This identifies the immediately containing node as a target of each of the links that is aprefix of name.2.3. Safety Rules for Low-capability EditorsConservative rules for editor treatment of script nodes created by other editors:It's OK to display a node ifyou understand at least one of its properties.It's OK to edit (the items in) a node ifyou understand all of its (local) properties, and eitheryou don't remove any of them, oryou also understand all properties of its parent.It's OK to copy a node ifyou understand all properties of its new parent,no labels are moved outside their scope, andthe two environments have the same bindings for all attributes that you don'teitherunderstand, orknow can't be relevant, and anywhere in the node or its subnodes.It's OK to delete a node ifyou understand all properties of its parent.[Less stringent rules will suffice if the document is merely to be viewed, rather thanedited, using the original editor.]2.4. Encodings[Any resemblance between the following material and the corresponding section of theInterpress standard is purely an intentional consequence of plagiarism.]The script for a document can be encoded in many different ways. This section gives therules for designing encodings. The purpose of these rules is to ensure that information is not;fpX6Ff _rprp [r0pnr X2prp rp Tr0pnr QNp+rprprp Mr0pnr Jjp+rprp H rp CqpXq' @7pQ>;.9(7ftp&512tp0.tp,_,*+G(&O $*!tp @A u# qpXq Tp3! H ^7  *4b  L>Q\Towards an Interchange Standard for Editable Documents24lost or added by conversions from one encoding to another. There are two types ofencodings: a single interchange encoding and many possible private encodings. The interchange encoding is used to transmit a script from one site to another when thetwo sites must be assumed to be arbitrarily different. A private encoding is used to transmitscripts from one site to another when the two sites share the private encoding conventions.For example, a line of document-preparation products made by the same manufacturer mightshare a private encoding, which can be used to transmit documents from one editor in theproduct line to another; presumably this encoding is designed to make these transfers simpleror more efficient. However, when one of these editors transmits a document to an unknowneditor, the interchange encoding must be used. The interchange encoding is designed toallow easy generation, transmission, and interpretation by many different editors, possibly atthe expense of compactness and speed of encoding and decoding.2.4.1. The interchange encodingThe interchange encoding is designed to simplify creation, communication andinterpretation of scripts for the widest possible range of editors and systems. For this reason,a script in the interchange encoding is represented as a sequence of graphic (printable)characters taken from the ASCII set; the subset of ASCII used is also a subset of ISO 646.Communication of a script in the interchange encoding requires only the ability tocommunicate a sequence of ASCII characters; Interscript does not specify how thecharacters are encoded. In effect, we define a text representation of the commands to beexecuted. The choice of a text format for the interchange encoding leads to rather lengthy scripts insome cases. The bulk of an interchange script presents no great problem for documentstorage, since a document need not be stored in this form. Rather, as it is transmitted, thesending editor can translate its own private encoding into the interchange encoding.Similarly, the receiving editor can translate the interchange encoding into its own, usuallydifferent, private encoding for storage. However, a bulky interchange script may be moreexpensive to transmit. If a document consists mostly of text, the interchange encoding isquite efficientvery few characters are required in addition to those appearing in thedocument itself.Character set. The character set used in the interchange encoding is described by theISO 646 7-bit Coded Character Set For Information Processing Interchange. The interchangeencoding interprets the 94 characters of the G1 set defined in the International ReferenceVersion (ISO 646, Table 2) and the space character (2/0). This set of 95 characters is calledthe interchange set. Note that except for the concise "string" encoding of vectors describedbelow, the interchange encoding has nothing to do with the integers corresponding to thecharacters, but depends only on the character set itself.It is extremely important to understand that the choice of the ISO standard forthe interchange format has nothing to do with character mappings in Interscriptfonts. Although these mappings must adhere to a character set standard that is shared byinterchanging editors, that standard is not part of Interscript. It is expected that Xerox willdevelop a separate corporate standard in this area.fpX6 f g_ D g]KNY: gXL gVD= gTy9 gR+- gP#: gOJ gMOJ gK[ gI> gEtpXtB%pT:U g@[O g>D g<$6 g:*R g90)L* g7f3% g5 2)P g0_: g.4( g,J g*-/ g)4(0 g'iV g%E g# bt p4 gM gZ gC g8rprp gm1' g91qO gfC gp%. g I g 3 j >Q]LITowards an Interchange Standard for Editable Documents25If the underlying encoding of the ISO character set can also encode other characters(e.g., the control characters (0/0 through 1/15) and del (7/15), or another group of 128characters if eight bits are being used to encode each character), these are ignored ininterpreting an interchange script. This does not mean that these characters are converted tospaces, but that they are treated as if they were not present. There are several reasons for this choice:Control characters may be inserted freely by software that generates the interchangeencoding. For example, carriage returns (0/13), line feeds (0/10), and form feeds(0/12) may be inserted at will to conform to limitations that may be imposed by anoperating system. Restrictions on line length or the use of fixed-length records thusbecome straightforward.Control characters may be removed or inserted freely by software that receives theinterchange encoding. In this way, the receiving software can adhere to anyrestrictions imposed by its operating system.The absence of control characters allows certain kinds of "non-transparent" datacommunication methods (such as binary synchronous communication) to be usedfreely.A minor disadvantage of these conventions is that if a script is typed in, care must betaken not to omit a significant space at the end of a line. Since scripts are normallygenerated by programs, this is not important. A system for manually generating (and perhapsinteractively debugging) Interscript should provide for various convenience features on input,and for prettyprinting the script on output.Any number of space characters may also be added after any token without changing themeaning. Throughout the following, a delimiter is a space or comma, which may be omitted ifthe next character is not an alphanumeric, "" or ".".VersionId. The first characters of an interchange script conforming to this version of theInterscript standard must be "Interscript/Interchange/1.0 ". Note that the VersionId is ofvariable length, and ends with a space. These conventions simplify the design of systems thatmust deal with more than one kind of encoding.If a privately encoded script can be interpreted as a sequence of characters, its firstcharacters must be "Interscript/private/i.j", where private is replaced by anappropriately chosen hierarchical name that identifies the encoding, e.g., "Xerox/860",and i.j is replaced by an appropriate version identification, e.g., "2.4"; the resultingheader would be "Interscript/Xerox/860/2.4".A private encoding that cannot be interpreted as a sequence of characters (e.g., abinary, word-oriented encoding on a 36-bit machine which packs five 7-bit charactersinto a word) should use any available convention to make its scripts self-identifying.Following the versionId is a node constituting the body of the script, with values encodedas follows.Integer. An integer is represented in radix 10 notation using the characters "0" through "9"as digits, followed by a delimiter. A negative integer is preceded by a minus sign "". Thusthe decimal number 1234 is encoded as "1234", and 1234 is encoded as "1234". The;fpX6Ff _7 ]K8 [6! Y6' W? Ty*RE"2P2NAMO0%KIs$.G")F$-C9rprpBI-@ =/F ;eM 96% 77' 6, 2 H 0rp- .r prprp +rpB )rsrprp 'X &,.#>"Pirprpj Kr prprprpZrp&P~#10& e rprprp  )rprp9rpr  ^p rp rprp  rprp t  L>Q\nTowards an Interchange Standard for Editable Documents26delimiter may be empty if the following character is a letter.A sequence of integer literals in the range 0..255 can be represented in radix 16 notationusing the characters "A" through "P" as digits ("A" corresponds to 0, "P" to 15). The entiresequence is enclosed in "#" brackets. For example, the integer 93 is represented as"#FN#", and the sequence of integers 93, 94, 95, 96 as "#FNFOFPGA#". Thesesequences require only two characters for each integer (plus two characters of overhead).Note that there is no delimiter between the integers in this encoding. Ordinary integer literals,with their delimiters, may be included in the sequence; e.g., 7, 93, 400, 40 could berepresented by "#7,FN400CI#".Booleans are represented by the characters "F" and "T", followed by a delimiter.Real. A real is represented using Fortran E or F notation, with a trailing delimiter. Thus"12.34" is the same as "1.234E1". Minus signs may precede the mantissa or the exponent:"12.34E3 ".Identifier. An identifier is encoded by its characters (which are limited to letters anddigits), followed by a delimiter: "x", "arg1". The first character of an identifier must be aletter, and must be written in lower case to distinguish identifiers from universals. Other lettersmay be written in either case for readability, since case is not significant in distinguishingidentifiers.Vector. A vector is encoded by surrounding a sequence of values with parentheses, "("and ")".String. A text vector usually contains integers that are interpreted as character codes. Oftenthese codes lie in the range 32 to 126 inclusive, which are the numbers assigned to thecharacters of the interchange set by ISO 646. It is convenient to encode an element of sucha vector by the character whose ISO code is the desired value. Such a string can be encodedby surrounding the characters with "<" and ">", thus "". If the string containselements outside the allowed range (i.e., if the value is less than 32 or greater than 126) orthe value 62 or XX (the ISO codes for the characters ">" and "#"), those elements must berepresented as integers inside "#" brackets, as described above. The two-characterencoding of small integers is designed to make escape sequences compact. Thus "","", "", and "" are all equivalent.Universal names. A universal is encoded by giving its name in upper case letters, followedby a delimiter. E.g., "TEXT".Node. A node is encoded by a "{", followed by a sequence of items, followed by a "}". Comment. The beginning and end of a comment are both marked by a double minus sign:the sequence "" "" is a comment and may occurbetween any two tokens. Comments are ignored in rendering the script.fpX6 f g_rprp([ rpE gYrprp rprp gXrp rp gVDrprpr p gTy/rp gRrp rprp gP rp8 gOr pKrp#rprprpH6rp6rp gFkrpr p7 gDr pA.r pr p( g?drprprpr p g=9r pr p g;X g:r p6rprpBr g4prp1Urprprp/ g/ L g-K g+rp=rp g*+#rprpr prp g(`0. g& qpr p g$rp  g#rp.r p g!6r pr pr prprprp grprprprprprprprprp( gK r*prp grprprp  9>QVTowards an Interchange Standard for Editable Documents27The tokens of the interchange encoding are defined by the following BNF grammar,together with rules about delimiters:The delimiter that terminates an identifier or universal may only be empty if the nextcharacter is not an alphanumeric, "", or ".".The delimiter that terminates an integer may only be empty if the next character is nota digit, "E", "F", "", or ".".extra delimiters may be inserted after any token.token::=literal | id | ucID | op | bracket | punctuation | commentliteral::=Boolean | integer | intSequence | real | stringBoolean::=( "F" | "T" ) delimiterdelimiter::=" " | "," | emptyempty::=""integer::=[ "" ] digit digit* delimiterdigit::="0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"intSequence::="#" intOrHex* "#"intOrHex::=integer | hexChar hexCharhexChar::="A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | "J" | "K" | "L" | "M" |"N" | "O" | "P"real::=[ "" ] digit digit* "." digit* [ "E" integer ] delimiterstring::="<" stringElem* ">"stringElem::=stringChar | intSequencestringChar::= any character but "#" or ">" id::=lowerCase idChar* delimiteridChar::=letter | digitletter::=lowerCase | upperCaselowerCase::="a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | "k" | l" | "m" | "n" |"o" | "p" | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z"upperCase::=hexChar | "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z"ucID::=upperCase* delimiterop::= "+" | "" | "*" | "/"bracket ::="(" | ")" | "{ " | "}" | "<" | ">" | "[" | "]" | ""'punctuation::="." | ";" | ":" | "=" | "_" | "!" | "#" | "@" | "|"comment::="" commentString ""commentString::= any sequence of characters not containing "" alphanumeric::=letter | digitA simple listing of an interchange script can just print the character sequence, with linebreaks every n characters, or perhaps at the nearest convenient delimiter. Such a listing isreasonably easy to read, so that problems can be tracked down simply by studying it.Additional help in reading the file can be furnished by utility programs which format the filefor more pleasant reading.2.4.2. NormalizationEvery encoding must define a normalization function N, which maps a script in theencoding into another script in the encoding which generates the same output. N must be;fpX6Ff _rp* ]Kr p[rpr prp rp Yor prprpW;rprprpUrprprpS_r prp Or0n: N#0n/ LX0n J0n H0n F0n E-0n; Cc 0n A0n ?0n%*n> <80n: :n0n 8 0n 6 0n" 50n 3C0n 1y0n /0n6n-B ,0n 8 *N0n (0 &0n2 $ 0n4 #$0n !Y 4   pZ R?rp P #;  tpXt ^pC  2% (  L>Q\Towards an Interchange Standard for Editable Documents28idempotent (i.e., N2=N); it may not change the fidelity level of the script (see 2.4.3). If a scriptviolates the definition of Interscript, a normalization function may report this fact instead ofproducing a normalized result. In other words, normalization need not be defined onerroneous scripts.The purpose of this function is to make possible a precise description of the rules forprivate encodings in section 2.4.4. The idea is that when an encoding provides several waysof saying the same thing (typically a basic way, and some more concise ways which work incommon special cases), the normalized script will uniformly choose one way of saying it.Note that the normalized script is not intended for any purpose other than precisely defininga notion of equivalent script; it is neither especially compact nor especially readable.The normalization function for the interchange encoding is defined as follows:Comments are omitted.Delimiters are replaced by empty if possible, otherwise with ",".An integer encoded in hex is replaced by the same integer encoded in digits; exceptin strings, "#" brackets are replaced by parentheses.Leading zeros are dropped from a digits encoding of an integer.Reals are uniformly encoded in E format with a single non-zero digit to the left of the"." and no trailing zeros; 0 is encoded by "0.0".An upper case letter in an identifier is replaced by the corresponding lower case letter.Each direct invocation (abbreviation) is replaced by its binding. 2.4.3. Level restrictionFor each rendition fidelity level L of Interscript, there is an (idempotent) level restrictionfunction RIL which converts an arbitrary interchange script into an interchange script of levelL. An interchange script is of level L if RIL applied to it is the identity. A restriction functionreplaces an excluded structure with its value according to the semantics of Interscript,converts excluded form information into additional content with a special property, andremoves excluded tags.2.4.4. Private encodingsA private encoding may use any scheme for expressing the content of a script. Certainrequirements are imposed on private Interscript encodings to ensure that they can expressthe entire content of a script at a given level, and no more. Since no general statements canbe made about the bits, characters or other low level constituents of a private encoding,these constraints are stated in terms of the existence of certain functions that convert privateencodings to interchange encodings and vice versa. An encoding for which these functionsdo not exist is not an Interscript encoding. The recommended way of demonstrating that thefunctions exist is to exhibit them as executable programs. This makes it easy to run testcases.A particular private encoding has a fixed fidelity level. Informally, this means that it canencode any script of that level.fpX6 f g^_9u^p< g\` g[H gYLU2% gTN gRE+. gPzB gNJ gLXIsN]G?rp ]E r prprp]Brp,]A.rprp&]>!rprp] g N g D gC P g  n >Q]LATowards an Interchange Standard for Editable Documents29For any private Interscript encoding P of fidelity level L, the following functions must exist:NP, the normalization function for P; see 2.4.2.CPI, a conversion function from a script in P to an interchange script of level L.CIP, a conversion function from an interchange script of level L to a script in P.If a script violates the definition of Interscript, a conversion function may report this factinstead of producing a converted result. In other words, conversion need not be defined onerroneous scripts.Given these functions, we can define functions which convert normalized private scriptsto normalized interchange scripts of level L and conversely:NPI=NIoCPINIP=NPoCIPIn other words, first convert to the other encoding, and then normalize. These functions mustbe inverses of each other. This means that after normalization (which does not change the output), a private scriptcan be converted to an interchange script and then back to the same private script, and viceversa. Hence it seems reasonable to say that the private encoding can express exactly thesame information.[We need to say similar things about editor representations, transcriptionfidelity, and rendition fidelity.]Many tricks are available for designing private encodings with desirable properties. Withsome knowledge of the statistics of actual scripts, encodings can minimize the number of bitsrequired to represent the average script, by Huffman or conditional coding of the primitives.For example, if strings consist primarily of ordinary written English text, an encoding with fivebits per character might be attractive: lower case letters except "q", "x", and "z" (23), space,comma space, semicolon space, colon space, dot space space one upper case character,escape to upper case, one upper case character, escape to digits, one digit character (32total). The upper case and digits sets would be analogous. A more complex, but perhapseven more compact encoding would take account of the letter frequencies in English text.Similarly, the most common labels can be encoded compactly.There are other useful ideas for private encodings. The bracketting constructs may bereplaced by constructs with explicit length fields; these can be shorter, it is easy for thedecoder to skip the bracketted constructs, and if the script is damaged it is easier to recoverthan from the loss of a closing bracket. Hints can be associated with nodes that will speedtranslation to a particular editor's representation.In designing a private encoding, it is advisable to handle all the constructs of Interscriptreasonably compactly, rather than allowing some "unpopular" ones to be encoded veryclumsily. Otherwise scripts originally generated in another encoding may cause terribleperformance.;fpX6Ff _8'\0ZAXxA U&8 TD RE N4# M<JspHsp FU D7 @/) >C =/,- ;e 7q J 6(" 2p9 0X /!C -VB +-rprprprprprp )** 'J &,< $a8 "; %5 ZF .1 [ 4 #9 N < )  >QY)&Towards an Interchange Standard for Editable Documentsby Jim Mitchell and Jim HorningMay 10, 1982 5:40 PMFile: Interdoc-3ff.bravoWp{6 gLXqX gI  gE gC9Towards an Interchange Standard for Editable Documents30013. Higher-Level Issues3.1. Standard and Editor-Specific Transcriptions:We need a two-level structure for documents expressed in the base language to be both(a) interchangeable among different editors, and (b) retain information of special significanceto a specific editor. We call (a) the interchange standard information, or standard informationand (b) editor-specific information.Basically, an editor X is free to couch properties in its own terms, which can make it easyfor it to consume a script produced by itself, but it must provide a set of mappings which willtransform properties into the interchange standard. The recommended method for doing thisis to invoke its name as the very first item in the root node of any X-specific subtree. Therules for inheritance of properties mean that often only the root node of a document will needto have this property, but there is nothing wrong with nodes being in different editor-specificterms provided they invoke the appropriate editor properties. Now, to be a valid standard script, the document must have the definition of the name Xplaced in the script itself (There is nothing wrong with having libraries of editor-specific _standard mappings in a library of some sort to avoid having copies of them in each script). When X parses an X-specific script, it will use its X-specific attributes and never invokethe mappings from X-specific information to standard terms; i.e., it can use a null definitionfor the name X. However, when such a document is interpreted by some other editor Y, anytime it tries to access a standard name, the mapping from that name to the correspondingexpression in terms of the X-specific values in the script will have been provided by thedefinition of X. What guarantee is there that this can always be done?It is worth noting first that we are speaking here of a script being rendered for an editor,rather than produced. Consequently, it will never be necessary to access standard names inleft-hand contexts; i.e., to do bindings that are not part of the script in order to interpret it. Itmay, however, need to access the components of environments in order to render the scriptinto its private format. These are always values in right-hand side contexts, and must becomputed in terms of the X-specific information that X put in the script. We can examine thisissue on a case-by-case basis. Below is a list of examples of possible editor-specific uses ofthe base language and the mappings that would allow another editor to treat the document instandard terms: Symbolic values used instead of numbers: supply standard values for the symbolic values:Standard:leading.between _ 1*pt-- some numeric value --Editor-specific:leading.betweenLines _ singleleading.above _ doublemapping:single = 2*ptdouble = 4*ptDifferent names used for standard names: supply a binding to the standard name fromthe editor-specific name using a quoted expression so that it is only evaluated when needed;fsX6EAf _qsq Z sq, VsP TR SG QN$ MG LY JGT H|\ F^ D_ C> ?U =8%t <s&5 8S 6S 5"6 3CQ 1yC /F ,<-/ *rH (:+ &> %; #G#: !}@ B  u?xmuFsxuxKsxu xL sI  3:!  >Q^-Towards an Interchange Standard for Editable Documents3002in a righthand context:Standard:leading_[above_10*pt between_2*pt below_0*pt ]Editor-specific:Space _[BetweenLines_single BeforePara_double AfterPara_single]mapping:leading.between_'Space.BetweenLines'leading.above_'Space.BeforePara'leading.below_'Space.AfterPara'Different concepts used for standard ones: supply a binding to the standard attributenames from the editor-specific concepts using quoted expressions so that they are onlyevaluated when needed in righthand contexts:Standard:leading _ [above_10*pt between_2*pt below_0*pt ]Editor-specific:Spacing _ [fontSize_10 on_14 leading_1]-- all units assumed to be pts --mapping:leading.between_'pt*Spacing.onSpacing.fontSize'-- value is in pts --leading.above_'pt * Spacing.leading+Spacing.onSpacing.fontSize'leading.below_0In general, one can use the facilities of the base language to write essentially arbitraryprograms that can, by being quoted, be bound to a standard identifier to cause theappropriate value to be computed based on editor-specific information put in the documentby the editor that produced it. Moreover, since the mappings provided by editor X can beoverridden in any subtree of the document, an editor that does not "understand" somesubtree of a document produced by another editor Y can simply leave that subtree intactwhen producing an edited version of the original script except to ensure that that subtree'sroot node's first expression is an invocation of "Y", which will cause Y's editor-specificmappings to obtain in that subtree.3.2. Standard External EnvironmentIt will be important to provide for a standard external environment for rendering scripts sothat standard definitions need not be carried along with every script that uses them. Theexternal environment will contain definitions for units (inch, pt, etc.), various "styles" (para,figure, etc.), and useful abbreviations (italic, bold, etc.).3.2.1. UnitsThe Interscript standard assumes that distances are in meters and angles are in degrees.Using the language and the following constants defined in the standard external environment,a script can readily express distances and meters in other, possibly more convenient units:meter=1.0-- IN TERMS OF METERS --mica=1.E-5*meter-- mica= 1.E-5 --inch=.0254*meter-- inch= .0254 --pt=.013836*inch-- pt= .00035143 --pica=12*pt-- pica= .00421752 --tenPitch=inch/10-- tenPitch= .00254 --twelvePitch=inch/12-- twelvePitch= .00211667 --fsX6 f g_P]\u ?F)PZsY)u ?:PWsVgu ?U*  ?S  ?Ps> gN@ gM-PKJuF3PH|sGu)*+!PEsDZu ?1C  ?2A  ?>s O g<= g:Y g90O g7f8 g5 K g3\ g2J g0;# g+EqsXq'sQ g& L g$>H g"s= gQvsXvs/) gO gJDu%5K''C'  '  ; '   '  >Q]aTowards an Interchange Standard for Editable Documents3003degree=1.0-- ANGLES ARE IN DEGREES --pi=3.14159265radian=180*degree/pi-- = 57.29577951 --4.PragmaticsPrivate encodings and private representationsConversion efficiencyImplementation considerations;fsX6EAf _9u # ] \1# W^q Tr- R Q P*'8Towards an Interchange Standard for Editable Documents3004APPENDIX AGLOSSARYAn italicized word in a definition is defined in this glossary.abbreviationAn invocation used to shorten a script, rather than to indicate structureattributeA component of an environment, identified by its name, which is bound to a value base languageThe part of the Interscript language that is independent of the semantics of particularproperties and attributesbase semanticsThe semantic rules that govern how scripts in the base language are elaborated todetermine their contents, environments, and labels bindingThe operation of associating a value with a name to add an attribute to anenvironment; also the resulting association binding modeA value may be bound to an identifier as const, var, local, or persistent BooleanAn enumerated primitive type (F, T) used to control selection and as primitive values const bindingA binding of an attribute that prevents its being rebound in any contained scope contentsThe vector of values denoted by a node of a script definitionAnother name for a const binding documentThe rendition of a script in a representation suitable for some editordominant structureThe tree structure of a document corresponding to the node structure of its script editor-specific nameA non-standard name used by a specific editor in scripts it generates; an editormay use editor-specific terms without interfering with the interchangeability of a scriptif it provides definitions of the standard names in terms of its editor-specific names elaborate(verb) To develop the semantics of a script or a node of a script according to theInterscript semantic rules. This is a left-to-right, depth-first processing of the script encodingA particular representation of scripts environmentA value consisting of a set of attributes expressionA syntactic form denoting a value external environmentA standard environment relative to which an entire script is elaborated fidelityThe extent to which a transcription or rendition preserves contents, form, and structurehexIntA component of an intSequence formed from a pair of letters in the set {A,B,. ..,O,P}, representing an integer 0 .. 255 hierarchical nameA name containing at least one period, whose prefix unambiguously denotes thenaming authority that assigned its meaningidentifierA sequence of letters used to identify an attribute integerA mathematical integer in a limited range; one of the primitive types interchange encodingA standard encoding of scripts InterscriptThe current name of this basis for an editable document standard intSequenceAn abbreviated notation for sequences of small integersinvocationThe appearance of a name in an expression, except as the attribute of a binding labelA tag, or a source, a target, or a link introduction placed in a node linkThe cross product of a source and a target; in general, a link is a set of (source,target) pairs; in the special case when there is exactly one source and one target, alink behaves like a directed arc between a pair of nodes link introductionThe appearance of id@! in a node, where id is the main identifier of a linkfsX6 f':](qwq'Yw pVrx r2 gSq rx rxr# gQ+qr x rxrxr gO=q r x rMx rx gKq rxrx rJGxrxr gHYqrxrxr xrFx r! gDq rxrx rxrx r gBqr xrxrxr gA q rxrxrxr g?qrFxrxr xrxr g=/q rx r g;Aqrxrxr- g9Tqrxrxrxr g7fqvr xrxr 6Mxr4x r" g2qrxrxr12x rsr/  g/DqrFxr g-Vq rxrx r g+iq rxr g){qXvrF x rxrx r g'qrsXrsrsrsrsx srsxsrsxrsrsrsr g%qrx r/s$rxr g"-qrxr&! * gq rF*xr gqr6xr gqXvrF xrxr gq r@u gq r/x gq rxrx rxrxr g(qrxrxrxrxrx r xr g:qrxrxr)<x3xr g qrxrx rx B C>Q[/Towards an Interchange Standard for Editable Documents3005literalA representation of a value of a primitive type in a script local bindingA binding of a value to a name, causing the current environment to be updated withthe new attribute; any outer binding's scope will resume at the end of the innermostcontaining node nameA sequence of identifiers internally separated by periods; e.g., a.b.c nested environmentThe initial environment of a node contained in another node NILA name for the empty value; it does not lengthen a vector or node in which it appears nodeEverything between a matched pair of {}s in a script; this generally represents abranch point in a document's dominant structureNullIdentifies the empty environment; the value it associates with any identifier is NIL OuterA standard attribute of every environment, whose value is the environment just priorto the start of the current node OutermostThe standard outer environment for an entire script; the value of an identifier inOutermost is the universal consisting of the same letters in upper casepersistent bindingA kind of binding within the scope of a var binding that acquires the scope of the varbinding, and hence may endure beyond the end of the innermost containing node primitive typeBoolean, Integer, Real, String, or Universal primitive valueA literal or a node, vector, or environment containing only primitive values private encodingOne of a number of non-standard encodings of a script propertyEach tag on a node labels it with a property; the properties of a node determine howit may be viewed and edited quoted expressionA value which is an expression bracketted by single quotes ("'"); the expression isevaluated in each environment in which the identifier to which it is bound is invoked realA floating point number renditionThe process of converting from a script to a document; also the result of that processscopeThe region of the script in which invocations of the attribute named in a binding yieldits value; the scope starts textually at the end of the binding, and generally terminatesat the end of the innermost containing node scriptAn Interscript program; the interchangeable result of transcribing a document selectionA conditional form in a script that denotes one of two expressions, depending on thevalue of a Boolean expression in the current environmentsourceThe set of nodes labelled with link@ stringA literal which is a vector of characters bracketed by "<>", e.g., styleA quoted expression to be invoked in a node to modify the node's environment,labels, or contents SubA standard component of each environment, which is invoked to initialize nestedenvironments SUBSCRIPTA function that can be used to extract a value from a vector, e.g. SUBSCRIPT[(a b ), 3]is the value tagA universal name labelling a node using the syntax universal$; the properties of anode correspond to the set of tags labelling it targetThe set of nodes labelled with link! transcriptionThe process of converting from a document to a script; also the result of that processtransparencyA characteristic of scripts that allows an editor to identify the nodes of a script that itunderstands and thereby enables it to operate on those nodes without disturbing theones that it doesn't understand UnitsA set of definitions relating various typographical and scientifc units to the Interscriptstandard units, meters; e.g., inch=.0254 pt=.013836*inch ;fsX6EAf _q rFxrx rxr ](q  rxrxrxr x r [xrxr( Zf xr Xxq rF x r. VqX rFx rxr Tq rxrxrxr Rq r.xr Q+x rx O=q rx rxrx rxr MOq r xrx r* Kxr Iq rx rxrxrx r HYxr- Fkq r xr xrxrxrx Dr%xr BqX  xrFxrxrxrxr A q rxrxrxrx r# ?qX rFxrsXrF =/q rxrxrxr xrx rxr  ; 9q rxr x r5 89x r x rxrxr 6Kq rF 4^q r!xrxr! 2pq rxrx rxr xr 1xrxr$xr /'xr -q rx r(x rxr +q rxrx r *N xrx (`q rF xrxrxr &sq rxr xr2 $q rxrxrx r #xrxr !q rx rx  r q rF6xr  /q rxrxrx r 0 q rF xrxrxr q  r!xrxr! q  rxr'xr D  ! 2q rx rx  r<  g>Q[gTowards an Interchange Standard for Editable Documents3006universalA name whose first identifier is all uppercase; a universal name can be used at thetop level in the external environment, e.g., XEROX.fonts.Helvetica valueA primitive value, node, vector, environment, universal, or quoted expression var bindingA binding that is intended to be superseded by persistent bindings within its scope;useful for maintaining such things as running figure numbersvectorAn ordered sequence of values that may be subscripted fsX6 f g_qrxrx r6]xr g[qrFxrxrxrx rxrxr gYq rxrxr xrX2< gVDqrFxr x r U>QcTowards an Interchange Standard for Editable Documents3007APPENDIX BARBITRARY CHOICES"One of the primary purposes of a standardis to be definitive about otherwise arbitrary choices."There are many places in this proposal where we have made an arbitrary choice fordefiniteness. It will be important that the ultimate standard make some choice on thesepoints; it matters little whether it is the same as ours. To forestall profitless debate on thesepoints, we have tried to list some of the choices that we believe can be easily changed at alater date:Encoding choices:The choice of representations for literals (we generally followed Interpress here).The selection of particular characters for particular kinds of bracketting, and forparticular operators.The choice of infix and functional notation for the interchange encoding (as opposed,e.g., to Polish postfix).The choice of particular identifiers for basic concepts.Linguistic choices:The choice of a particular set of basic operators for the language.The particular set of primitive data types (we followed Interpressits set seems aboutas small as will suffice).The choice of particular syntactic sugars for common linguistic forms.;fsX6EAf%](qwq#9YwFqwVyi*{U7 REs< P.vs O=3. M? L5 IsvFs*)CSBl?/&>&;e8 8v5sC3 P1.F .=9xTowards an Interchange Standard for Editable Documents3008APPENDIX CRELATION TO OTHER STANDARDSfsX6 f':](qwq YwqwFqwqw` Y$y(Towards an Interchange Standard for Editable Documents3009APPENDIX DHISTORY LOGEdited by Mitchell, September 1, 1981 3:12 PM, added first version of glossaryEdited by Mitchell, September 7, 1981 2:11 PM, wrote parts of introductionEdited by Mitchell, September 10, 1981 10:14 AM, added Tab def to Star property sheetsEdited by Mitchell, September 14, 1981 9:54 AM, renumbered chapters and did minor editsEdited by Horning, May 4, 1982 5:16 PM, Fold in Truth Copy changes, add Appendix BEdited by Mitchell, May 10, 1982 5:40 PM, changed "Interdoc" to "Interscript", "rendering" to "internalizing", and"transcribing" to "externalizing" plus various edits necessitated by these substitutions.;fsX6EAf%](qwq%zYwqw XrFO VgK TW SX QqS OsNY Ni:,@ HELVETICA HELVETICA HELVETICA  HELVETICA  HELVETICA LOGO HELVETICA HELVETICA HELVETICA  HELVETICA  HELVETICA  HELVETICAMATH  HELVETICA  HELVETICA HELVETICA  HELVETICA  HELVETICA  TIMESROMAN  HELVETICAHIPPO  HELVETICA HELVETICA HELVETICA  HELVETICA HELVETICA  HELVETICA  TIMESROMAN TIMESROMAN TIMESROMAN  HELVETICA HELVETICA  HELVETICA  TIMESROMAN MATH  HELVETICA  HELVETICA HELVETICA HELVETICA  HELVETICA HELVETICA MATH  TIMESROMAN HELVETICA  HELVETICA HELVETICA  HELVETICA TE L!),/7?CJQWY[bjqx C v  k:#='J#%$P:C%P=##:C:#=#B"Py:C:C"i rZ ( B "=i#B " 9eB: ;Z":#j/$/"interdoc-1.1.pressmitchell10-May-82 17:51:20 PDT"