[_CD6_]<interscript>DraftStd>Interscript-Standard.tioga!31

Note: authors' notes and open questions are given in the form <<description of the issue>>.

Filed as [Indigo]<Interscript>DraftStd>Interscript-Standard.tioga, .press
Last edited By Mitchell on January 3, 1984 4:07 pm

Remove This Cover Sheet

Interscript

A Proposal for a
Standard for the Interchange of Editable Documents

January 1984

XEROX Robert M. Ayers
Xerox Corporation
Office Systems Division
Systems Development Department
3450 Hillview Drive
Palo Alto, California 94304

James J. Horning
Butler W. Lampson
James G. Mitchell
Xerox Palo Alto Research Center
3333 Coyote Hill Road
Palo Alto, California 94304

Status of Document: Private Contribution

Purpose and Overview:

Interscript defines a digital representation of editable documents for interchange among editing systems. A script is the representation of a document in the Interscript format; it can be transmitted from one editor to another over a network, or can be stored for later editing. A script is not limited to any particular editor: if a script contains editable information some of which is not understandable by a particular editor, it is still possible to edit the parts of the document understood by that editor without losing or invalidating the parts it does not understand.

Because this report is intended as a nascent standard, it necessarily has a great deal of precision, and we have employed various formal notations to achieve this. Such formal notation requires careful in-depth reading, but it has the advantage of being able to give precise answers to questions about the standard. To help with reading it, commentary has been included in a number of places in fine print, like this. However, the small print should be viewed as hints about the standard: in any conflict between the small print and the normally sized text, the normally sized text is to be considered the correct version.

The more casual reader may find the Introduction and Glossary helpful in understanding some of the concepts of Interscript.

Contents

1. Introduction

2. The Language Basis: Syntax and Semantics

3. Tag Declaration, Node Invariants, and Safety Rules for Editors

4. Standard Document Constructs

5. Layout

6. Other Constructs

Appendix A: Glossary

Appendix B: Example of Script Evaluation

Appendix P: Paragraph% ReducesTo

Appendix Q: Line% ReducesTo

Interscript 83 Xerox System Integration Standard

Introduction

1. Introduction

Interscript provides a means of representing editable documents. This representation is independent of any particular editor and can therefore be used to interchange documents among editors.

1.1. Rationale for an interchange standard

As office systems proliferate, being able to interchange documents among different editing systems is becoming more and more important. Customers need document compatibility to avoid being trapped in evolutionary cul-de-sacs and having to pay the awful price of converting documents from one product's format to another's.

Typically, an editing program uses a private, highly-encoded representation when operating on a document to enable it to provide good performance. Generally, this means that different editors use different, incompatible private formats, and a user can conveniently edit a document only with the editor used to create it. This problem can be solved by providing programs to convert between one editor's private (or file) format and another's. However, a set of different editors with N different document representations requires N(N-1) conversion routines to be able to convert directly from each format to every other.

This N(N-1) problem can be reduced to 2(N-1) by noticing that we could write N-1 conversion routines to go from F1 (format for editor1) to F2,. . .,FN, and another N-1 routines to convert from F2,. . .,FN to F1. Except when converting from or to F1, this scheme requires two conversions to go from Fi to Fj (j`i). This is a minor drawback. Choosing which editor should be editor1 is the critical issue, however, since the capabilities of that editor will determine how general a class of documents can be interchanged among the editors.

This presents a truly difficult problem in the case that there is no single functionally dominant editor1 in the set. If the pivotal editor1 doesn't incorporate all of the structures, formats, and content types used by editor2,. . .,editorN, then it will not be possible to faithfully convert documents containing them. Even if there were a single, functionally dominant editor, it would place an upper bound on the functionality of all future compatible editors.

Since there are no actual candidates for a totally dominant editor, this standard has been developed by examining, in general, what information editors need and how that information can be organized to represent general documents. It provides an external representation that is capable of conveying the content, form, and structure of editable documents. That external representation has only one purpose: to enable the interchange of documents among different editors. It must be easy to convert between real editors' formats and this interchange encoding.

When represented by this interchange encoding, we call a document a script and reserve the term document for the representation that an editing system uses to enable editing it.

Using a standard interchange encoding has the additional advantage that much of the input and output conversion algorithms will be common to all conforming editors. For example, when a new version of an existing editor is released, the only differences in the new version's conversion routines will be in the areas in which its internal document format has changed from its previous form; this represents a significant saving of programming.

1.2. Properties that any interchange standard must have

An interchange encoding for editable documents must satisfy a number of constraints. Among these are the following:

1.2.1. Encoding efficiency

Since editable documents may be stored as scripts, may be transmitted over a network, and must certainly be processed to convert them to various editors' private formats, it is important that the encoding be space-efficient.

Similarly, the cost in time of converting between interchange encoding and private formats must be reasonably low, since it will have a significant effect on how useful the interchange standard is.

1.2.2. Open-ended representation

Scripts must be capable of describing virtually all editable documents, including those containing formatted text, synthetic graphics, scanned images, animated images, etc., and mixtures of these various modes. Nor may the standard foreclose future options for documents that exploit additional media (e.g., audio) or require rich structures (e.g., VLSI circuit diagrams, database views). Thus, a standard must be capable of incremental extension and any extension must have the same guarantees and be able to employ the same mechanisms as the most basic parts of the standard.

For the same reasons, the standard must not be tied to particular hardware or to a file format since documents will be stored and transmitted using a variety of media.

1.2.3. Document content and form

The complete description of a component of a document usually requires more than a list of its explicit contents; e.g., paragraphs have margins, leading between lines, default fonts, etc. Scripts must record the association between attributes (e.g., margins) and pieces of content.

Both the contents and attributes of typical documents require a rich value space containing scalar numbers, strings, vectors, and record-like constructs in order to describe items as varied as distances, text, coefficients of curves, graphical constraints, digital audio, scanned images, transistors, etc.

1.2.4. Document structure

Many documents have hierarchical structure; e.g., a book is made of chapters containing sections, each of which is a sequence of paragraphs; a figure is embedded in a frame on a page and in turn contains a textual caption and imbedded graphics; and the description of an integrated circuit has levels corresponding to modular or repeated subcircuits. This standard exploits such structure, without imposing any particular hierarchy on all documents.

Hierarchy is not sufficient, however. Parts of documents must often be related in other ways; e.g., graphics components must often be related geometrically, which may defy hierarchical structuring, and it must be possible to indicate a reference from some part of a document to a figure, footnote, or section in way a that cuts across the dominant hierarchy of the document.

Documents often contain structure in the form of indirection. For instance, a set of paragraphs may all have a common "style," which must be referred to indirectly so that changing the style alone is sufficient to change the characteristics of all the paragraphs using it. Or a document may be incorporated "by reference" as a part of more than one document and may need to "inherit" many of its properties from the document into which it is being incorporated at a given time.

1.2.5. Transcription fidelity

It must be possible to convert any document from any editor's private format to a script and reconvert it back to the same editor's private format with no observable effect on the document's content, form, or structure. This characteristic is called transcription fidelity, and is a sine qua non for an interchange encoding; if it is not possible to accomplish this, the interchange encoding or the conversion routines (or both) must be defective. It must, of course, also be possible to test that an editor does transcribe scripts faithfully, which in turn requires that it be possible to test if two scripts are equivalent (section 2.3.4).

1.2.6. Script comprehension

Even complicated documents have simple pieces. A simple editor should be able to display parts of documents that it is capable of displaying, even in the presence of parts that it cannot. More precisely, an editor must, in the course of internalizing a script (converting it from a script to its private, editable format), be able to discover all the information necessary to recognize and to display the parts that it understands. This must work despite the fact that different editors may well use different data structures to represent the content, form, and structure of a document.

At a minimum, this requires that a script contain information by which an editor can easily determine whether or not it understands a component well enough to edit it, and that it be able to interpret the effect that components which it does not understand have on the ones it does. For example, if an editor does not understand figures, it might still be possible for it to display their embedded textual captions correctly, even though a figure might well dictate some of its caption's content or attributes such as margins, font, etc.

This constraint requires that an interchange encoding must have a simple syntax and semantics that can be interpreted readily, even by low-capability editors.

1.2.7. Regeneration

Processing a script to internalize it correctly is only half the problem. It is equally important that an editor, in externalizing a script from its private internal format be able to regenerate the content, form, and structure carried by the script from which the document originally came. In particular, when regenerating a script from an edited document, it should be possible to retain the structure in parts of the original script that were not affected by editing operations. For example, an editor that understands text but not figures should be able to edit the text in a document (although editing a caption may be unsafe without understanding figures) while faithfully retaining and then regenerating the figures when externalizing it.

This problem is much less severe when an editor is transcribing a document that it "understands" completely, e.g., because the entire document was generated using that same editor.

1.3. What the Interscript standard does not do

There are a number of issues that the Interscript standard specifically does not discuss. Each of these issues is important in its own right, but is separable from the design of an interchange representation

1.3.1. Interscript is not a file format

This standard is not concerned with how scripts are held in files on various media (floppy disks, hard disks, tapes, etc.), or with how they are transmitted over communications media (local area network, telephone lines, etc.).

1.3.2. Interscript is not a standard for editing

A script is not intended as a directly editable representation. It is not part of its function to make editing of various constructs easier, more efficient, or more compact: that is the purview of editors and their associated private document formats. A script is intended to be internalized before being edited. This might be done by the editor, by a utility program on the editing workstation, or by a completely separate service.

1.3.3. Combining documents is not an interchange function

This exclusion is really a corollary of the statement, "A script is not intended as a directly editable representation." In general, it is no easier to "glue" two arbitrary documents together than it is to edit them.

1.3.4. Interscript does not overlap with other standards

There are a number of standards issues that are closely related to the representation of editable documents, but which are not part of the Interscript standard because they are also closely related to other standards. For example, the issues of specifying encodings for characters in documents, or how fonts should be named or described are not part of this work.

1.4. Concepts and Guiding Principles

1.4.1. Layers

The Interscript standard is presented in layers:

Layer 0 defines the syntax of the base language for scripts; parsing reveals the dominant structure of the documents they represent (sections 2.1-2.2).

Layer 1 defines the semantics of the base language, particularly the treatment of bindings and environments (section 2.3, chapter 3).

Layer 2 defines the semantics of properties and attributes that are expected to have a uniform interpretation across all editors (chapters 4-5).

1.4.2. Externalization and Internalization

A script represents a document in the Interscript format. Its sole purpose is to enable the interchange of documents among editors in a manner that is independent of any one editor.

A script is not the editable form of a document. The editable form is created by an editor by internalizing a script according to the rules (semantics) of Interscript. The reverse operation of converting a document in an editor's internal, editable format to a valid script is called externalization.

It is important that any document prepared by any editor can be externalized as a script that will then be (re)internalized by the editor without "loss of information". Ease of internalization requires that the Interscript base language contain only relatively few (and simple) constructs. This apparent paradox has been resolved by including within the base language a simple, yet powerful, mechanism for abbreviation and extension.

A script may be considered to be a "program" that is "compiled" to convert the script to the private representation of a particular editor, ready for further editing. The Interscript language has been designed so that internalizing scripts into typical editors' representations can be performed in a single pass over the script by maintaining a few simple data structures.

1.4.3. Content, Form, Value, and Structure

Most editors deal with both the content of a document (or piece of a document), and its form. The former is thought of as "what" is in the document, the latter as "how" it is to be viewed; e.g., "ABC" has a sequence of character codes as its contents; its format may include font and position information. Interscript maintains this distinction.

The distinction between the value and the structure of both content and form within a document is also important. When viewing a document, only the value is of concern, but the structure that leads to that value may be essential to convenient editing. An example of structure in content is the grouping of text into paragraphs. An example of structure in form is associating a named "style" with a paragraph.

Content: may be represented by structures built from character strings, numbers, Booleans, identifiers, and nodes, which are structured objects containing simpler ones.

Form: Interscript provides for open-ended sets of properties and attributes. Properties are associated with content by means of tags. Attributes are bindings between names and values that apply over some scope. The way the contents of a document are to be "understood" is determined by its properties; Interscript makes it straightforward to determine what these properties are without having to understand them.

Structure: Most editors structure the content of a document somehowinto paragraphs, sections, chapters; or lines, pages, signatures, for example. This assists in obtaining private efficiency, but, more importantly, provides a conceptual structure for the user.

The most important, and most frequent, structuring mechanism between values is logical adjacency (sequentiality), which is represented by simply putting them one after another in the script.

Most editors that structure contents have a "dominant" hierarchy that maps well into trees whose arcs are implicitly labelled by order. (Different editors use these trees to represent different hierarchies). Interscript provides a simple linear notation for such trees, delimiting node values by braces ("{" and "}"). If an editor maintains multiple hierarchies, the dominant one is the one transcribed into the primary tree structure and used to control the inheritance of attributes.

Structures recorded for form use explicit indirection by means of names. Interscript allows expressions composed of literals, identifiers, and operators, and permits the use of identifiers to represent expressions.

2. The Language Basis: Syntax and Semantics

The Language Basis: Syntax and Semantics

This chapter defines the Interscript Base Language. Two versions of its syntax, a concrete one and an abstract one, are supplied. The concrete grammar defines the publication encoding for Interscript, which is solely intended for communicating Interscript concepts among people and is used for all examples of Interscript in this standard (chapter 7 defines the actual encoding to be used for representing scripts for editing systems rather than humans).

Section 2.3 defines the semantics of the base language. These semantics have two primary functions: (1) to provide a rigorous definition for equivalence of scripts, and (2) to show conceptually what information in any script an Interscript implementation must represent in order to be able to internalize and externalize a script correctly.

2.1. Grammar

Our notation is basically BNF with terminals quoted and parentheses used for grouping.

script ::= header node trailer

header ::= "INTERSCRIPT/INTERCHANGE/1.0 "

trailer ::= "ENDSCRIPT"

node ::= "{" items "}"

items ::= empty | items item

tag ::= primary "$"

openedNode ::= (term | name "%") "|"

scope ::= "[" items "]"

binding ::= name "←" term

sBinding ::= name "%←" ( term | indirection | "'" term "'" )

term ::= primary | term op primary

op ::= "+" | "—" | "*" | "/" | "!" | "LT" | "EQ"

primary ::= literal | invocation | node | "(" term ")"

literal ::= name | number | string

name ::= id | name "." id

invocation ::= primary "^"

indirection ::= name "%"

This small grammar defines a representation language for scripts that is small enough to admit of a precise denotational semantics (section 2.3), yet powerful enough to include facilities for strong typing and data-type extensibility.

This surface syntax is intended only for use as a means of communications about Interscript. It is a publication encoding and is not necessarily intended as the actual encoding to be used for interchanging documents among editing systems. Any such encoding should pay attention both to the structure of this language and to the requirements for an easily decoded, robust, and space-efficient representation.

2.2. Abstract Grammar for Interscript

To associate a precise semantics with this language, it is easier to work from an abstract syntax for it, ignoring surface punctuation and treating the parsed form of some external encoding of a script. The following grammar serves this purpose as well as indicating the reason for each of the parts of the language (in small print, like this). However, the small print should be viewed as hints about the semantics; in any conflict between the small print and the actual semantics, the semantics is to be considered the correct version:

script ::= node

An entire script is represented as a single high level node with subnodes to carry the various parts of the logical and layout structures.

node ::= items

items ::= empty | items item

A node is the primary structured data type of the language and is used to represent both logical and layout structures. The items that make up a node may be tags, values, bindings, and expressions giving rise to tags, values, and bindings. A node with values but no bindings behaves like a vector; a node with bindings but no values behaves like a record; values and bindings may be intermixed to capture both the content of a piece of a document and the bindings associated with that content (such as what font a character is to be displayed in).

tag ::= primary

A node can have zero or more tags. Each tag is associated with an invariant that all nodes with that tag must satisfy. This invariant includes what attributes (bindings) are relevant to the node, what types of contents are allowed for the node, and any relations that must hold among the attributes of the node. For example, a TEXT node is only allowed to have character strings or PSEUDOCHAR nodes as contents, and font is a relevant attribute of a TEXT node. An editor can tell whether or not it has code to deal explicitly with a node simply by looking at the node's tags.

binding ::= name term

A binding associates a value (the result of elaborating the term) with a name. As bindings are encountered during internalization, they are appended to the current environment. Whenever the value associated with a name is needed, the environment is scanned from most-recent to least-recent for a binding to that name (of course, this is a logical description, not necessarily how one would implement environments).

sBinding ::= name sRhs

A structured binding is just like a normal binding except that the binding must be kept with the node in which it occurs, even if it is not a relevant attribute of (a tag of) that node. Only names that are structurally bound can be used in an indirection (see below) or can be opened (see openedNode below). An intended use of sBindings is for associating style information with a name so that it can be applied to nodes while still maintaining the inherent indirection that would allow the style to be changed and have the effect of the change propagate to all the nodes using it.

sRhs ::= term | indirection | quoted

A structured binding can associate a single value, another name, or a quoted expression with a name. to get the effect of structurally binding a set of items to a name, one can bind a node value to the name and then access the items using the openedNode mechanism; e.g., list %← {1 1 2 3 5 8} list%| will result in the sequence 1 1 2 3 5 8 becoming items at the same level as the list%|.

quoted ::= term

A quoted expression must be a syntactically correct expression (this is not a macro facility!). It will be evaluated in the environment that obtains when used in an indirection (see below).

term ::= primary | term op primary

op ::= ADD | SUB | MUL | DIV | SUBSCRIPT | LT | EQ

primary ::= literal | invocation | node | term

Interscript expressions are primitive with a minimal spanning set of operators. Since scripts are expected to be written and read by editing systems rather than by people, this is adequate. An invocation replaces an occurrence of a name by its value. A node is a value like any other.

literal ::= name | number | string

A name standing by itself is atomic: it simply stands for itself, nothing more (an invocation is used if the name is to be evaluated to produce a value). Numbers may be either Fortran-like reals or integers. Strings are characters in the ISO 646 subset bracketted by the double quote character (").

name ::= id | name id

A name is either a simple identifier (e.g., a, b, simpleId) or it is a qualified name (e.g., a.position.x, font.face.style). If the name is to be evaluated (for an invocation, an indirection, or an openedNode), then it is evaluated from left to right. All the identifiers except the last in a name must evaluate to node values, which will then be treated as environments in which to continue the name evaluation process. For example, in a.position.x, a must be bound to a node value and must contain a binding for position, which must also be a node value containing in turn a binding for x.

invocation ::= primary

When a name is invoked, it is evaluated as described under the name rule above, and the resultant value is used in place of the invocation.

indirection ::= name

An indirection behaves just like an invocation for the purposes of evaluation. In addition, the fact that the resulting value came from an indirection must be remembered in case of changes to the binding for the name or for correct externalization of the document as a script.

openedNode ::= primary | name

A node value can be bound to a name and then later "opened" in some context to place the items of the node into the environment in which it is opened. This can be used to introduce default values for tags, values, or bindings anywhere in a script. If the form name%| is used instead of name^|, then the semantics demand that the name of the node opened be maintained as well as the result of opening it. Such a structured open can be used to associate styles with parts of a document while guaranteeing that any change to the style can be reflected in all the places that the style is used.

scope ::= items

A list of items can be put in scope brackets "[ ... ]" to ensure that any computation in the scope cannot affect the surrounding environment except by maintaining any bindings that are structural (see sBinding). Any contents that result from evaluating the scope are considered part of the surrounding environment (just as if they had not been surrounded by scope brackets).

2.3. Formal Semantics for Interscript

The semantics for the base language are defined by a function, S, whose domain is the set of parse trees corresponding to the abstract grammar of section 2.2. S is written in a variant of the Mesa programming language [Mesa] which has had added to it the ability to define and manipulate lists of values (provided all the elements of a given list have the same type in the Mesa sense).

A type describing a list of items of type T can be defined as "LIST OF T" (in these semantics, T will always take the form "REF T'" for some type T'). A list element can be created by the operation

CONS: PROC[first, rest: LIST OF T] RETURNS[LIST OF T]

where T may be any type. Given a list L, the first element can be accessed as L.first, and the tail of the list as L.rest.

2.3.1. Semantics organized according to the abstract grammar

The semantics of the base language are described by the function S:

S: PROC[e: Env, nt: NonTerminal] RETURNS[vs: Vs]

where a NonTerminal represents a node of an abstract parse tree (not defined here).

Vs: TYPE=LIST OF V

V: TYPE = REF VRec

Conceptually, a VRec is a triple consisting of a mark, m, some contents, c, and sometimes extra information, x, which is primarily for recording a name (e.g., for a binding). To get the maximum benefit of type checking the definition of S without encumbering the definition with a lot of extraneous type coercions, a Vrec is defined to contain a set of fields, ca, one for each type of content (ca is one of cV, cVs, cNum, cStr, cId, cTerm). For similar reasons, the x component may be one of xId, xName, or xEnv. In any given Vrec at most one of the c fields and at most one of the x fields will actually be used; e.g., a tuple representing the number 3.14 has the form VRec[m: num, cNum: 3.14], and a tuple representing the name a.b.c as an atom has the form VRec[m: atom, xName: CONS[c, CONS[b, CONS[a, NIL]]] ]. Here are the complete definitions of VRec and Mark:

VRec: TYPE = RECORD[m: Mark,

cVs: Vs←NIL, cV: V←NIL, cNum: Number𡤀.0, cStr: STRING←NIL, cId: Id←NIL,

cTerm: NonTerminal←NIL,

xId: Id←NIL, xName: Name←NIL, xEnv: Env←NIL];

Mark: TYPE= {atom, num, string, node, -- base values
tag, bind, scope, bindStruc, evalStruc, onodeStruc, quotedTerm, vOfQ, evalSentinel};

S is defined in terms of various subsidiary semantic functions, which divide naturally into two groups: those that require an environment as a parameter and those that do not. The type signatures of the functions in the two groups are

(1) functions with an environment parameter:

MkNode: PROC[e: Env, vs: Vs] RETURNS [V -- node--]

MkBinding: PROC[e: Env, v: V, n: Name, kind: Mark] RETURNS [V]

Lookup: PROC[e: Env, n: Name] RETURNS [V]

EvalName: PROC[e: Env, n: Name] RETURNS [V]

RelBindings: PROC[e: Env, vs: Vs --tag items only--] RETURNS [Vs]

(2) functions independent of environment (dealing only with tuples or lists of tuples):

Apply: PROC[op: Op, v1: V, v2: V] RETURNS [result: V]

Bindings: PROC[v: V] RETURNS [Env]

Contents: PROC[vs: Vs] RETURNS [Vs]

Tags: PROC[vs: Vs] RETURNS [Vs --tag items only--]

Items: PROC[v: V] RETURNS [Vs]

MkScope: PROC[vs: Vs] RETURNS [Vs]

S is written in a "distributed" manner in the table below. It associates the parts of S with non-terminals of the abstract grammar. If S were written as a single function, it would look like a large case statement with one arm for each non-terminal of the abstract grammar and the lines under the column heading S[e, Alternative] as the code for the arms of the case statement.

e denotes the current environment argument of S. An environment is simply a sequence of bindings just like those that are relevant to a node. Some of the semantic rules pass a modified e to recursive application of S; see, e.g., the rule defining items. An environment can be augmented by one or more bindings, bind*, denoted by ConcatVs[bind*, e].

Only the semantics for "interesting" alternatives of the grammar are given below. For any alternative lhs ::= rhs whose semantics are not presented, its value semantics are S[e, lhs]=S[e, rhs].

When S maps a nonterminal of the abstract grammar into its tuple form, it is said to be elaborated.

Lefthand Side Alternative S[e, Alternative]

script ::= node S[X, node]

The environment for the root node of an abstracted script is the external environment X.

node ::= items {t: Vs=S[e, items]; RETURN[
MkNode[ConcatVs[Bindings[t], e], t]

Make a node from the list of items using the environment obtained by appending bindings in the node with the current environment.

items ::= empty empty
| items item {t: VS=S[e, items]; RETURN[
ConcatVs[t, S[ConcatVs[Bindings[t], e], item]}

Recursively elaborate items and then elaborate the last item using the environment formed by appending the bindings in the preceding items to the current environment. Thus bindings affect things to their right in a list of items.

tag ::= primary MkTag[e, S[e, primary]]

Elaborate the primary. That should produce a (possibly qualified) name to use as the tag.

scope ::= items MkScope[S[e, items]] -- see MkScope below

openedNode ::= term Items[S[e, term]]
| name MkV[m: onodeStruc,
cVs: Items[EvalName[e, S[NIL, name]]], name]

If a term (non-structural openedNode), then use the Items function to pull all the necessary elaborated items from the node value that the term must elaborate to. If a structural openedNode, embed the elaborated items of the named node in a tuple marked as an onodeStruc in order to remember the name from which the items were obtained.

binding ::= name term MkBinding[e, S[e, term], S[NIL, name], bind]

Elaborate the term in the current environment. and make a (nonstructural) binding between the name and the resultant value. See MkBinding for details, especially for how values are bound to qualified names.

sBinding ::= name sRhs MkBinding[e, S[e, sRhs], S[NIL, name], bindStruc]

Elaborate the term in the current environment. and make a structural binding between the name and the resultant value. See MkBinding for details, especially for how values are bound to qualified names.

quoted ::= term MkV[m: quotedTerm, cTerm: term]

Simply make a quotedTerm tuple to hold the term for later elaboration.

term ::= term op primary apply[op, S[e, term], S[e, primary]]

Elaborate the term, then the primary, and then apply the op to the result.

literal ::= name MkV[m: atom, xName: S[NIL, name]]
| number MkV[m: num, cNum: number]
| string MkV[m: string, cStr: string]

A name as a literal is just made into an atom tuple. A numeric literal is stored as a num tuple. And, a string literal is stored as a string tuple.

name ::= id MkAtom[id]
| name id CONS[MkAtom[id], S[NIL, name]]

A qualified name is mapped into a list with the most highly qualified identifier first and the head, unqualified identifier last.

invocation ::= primary EvalName[e, S[e, primary].xName]

Elaborate the primary to obtain a name to invoke (stored as the xName component of the resultant tuple). Then use EvalName to lookup and evaluate the name.

indirection ::= name MkV[m: evalStruc, cV: EvalName[e, S[NIL, name]],
xName: S[NIL, name]]

Use EvalName to lookup and evaluate the name and embed the resultant value in an evalStruc tuple along with the name from which it came.

2.3.2. Definitions of semantics as a Mesa function

Groundrule: Once constructed, no values or lists of values are ever modified (except in Lookup). Therefore, they can be handed around without regard for being changed in situ later.

S: PROC[e: Env, nt: NonTerminal] RETURNS[vs: Vs] = {--TBD--};

NonTerminal: TYPE = REF --ParseNode--;

Error: ERROR[ec: {BoundsFault, InvalidTag, NotSingleton, UnboundId, WrongType}] = CODE;

*******VALUES*******

V: TYPE = REF VRec;

Number: TYPE = REAL;

VRec: TYPE = RECORD[m: Mark,

cVs: Vs←NIL, cV: V←NIL, cNum: Number𡤀.0, cStr: STRING←NIL, cId: Id←NIL, -- Contents

cTerm: NonTerminal←NIL, -- used for representing quoted terms

xId: Id←NIL, xName: Name←NIL, xEnv: Env←NIL]; -- eXtra information

Mark: TYPE= {atom, num, string, node, -- base values
tag, bind, scope, bindStruc, evalStruc, onodeStruc, quotedTerm, vOfQ, evalSentinel};

MarkAsAtom: ARRAY Mark OF ATOM = [$atom, $num, $string, $node, $tag, $bind, $scope, $bindStruc, $evalStruc, $onodeStruc, $quotedTerm, $vOfQ, $evalSentinel];

Vs, Env: TYPE=LIST OF V; -- for an Env, all items have mark bind or bindStruc

X: Env ← NIL; -- stand-in for the eXternal environment

MkV: PROC[m: Mark, cVs: Vs←NIL, cV: V←NIL, cNum: Number𡤀.0, cStr: STRING←NIL,
cId: Id←NIL, cTerm: NonTerminal←NIL, xId: Id←NIL, xName: Name←NIL, xEnv: Env←NIL] RETURNS[V] = { -- The basic function for making a tuple

RETURN[NEW[VRec ← [m: m, cVs: cVs, cV: cV, cNum: cNum, cStr: cStr, cId: cId, cTerm: cTerm, xId: xId, xName: xName, xEnv: xEnv]]]};

True: V=MkV[m: num, cNum: 1.0]; False: V=MkV[m: num, cNum: 0.0];

TRUE and FALSE as tuples.

MkVs: PROC[v1, v2, v3: V←NIL] RETURNS [vs: Vs] = {

Make a list of Vs from individual Vs.

vs←NIL; IF v3#NIL THEN vs←CONS[v3, vs];

IF v2#NIL THEN vs←CONS[v2, vs]; IF v1#NIL THEN vs←CONS[v1, vs] };

ConcatVs: PROC[v1, v2, v3: Vs←NIL] RETURNS [vs: Vs] = {

Concatenate a number of individual lists into one list (in order v1, v2, v3).

AppendToT: PROC[list: Vs] = {

UNTIL list=NIL DO t.rest ← CONS[list.first, NIL]; t ← t.rest; list ← list.rest ENDLOOP};

t: Vs ← (vs←CONS[NIL, NIL]);

AppendToT[v1]; AppendToT[v2]; AppendToT[v3]};

NthItem: PROC[vs: Vs, n: INT] RETURNS [V] = { -- indexing into a list

IF vs=NIL OR n<0 THEN Error[BoundsFault];

RETURN[ IF n=0 THEN vs.first ELSE NthItem[vs.rest, n-1] ]};

Name, Ids: TYPE=LIST OF Id; Id: TYPE = ATOM;

A Name is never empty; it always comes from the parser

MkName: PROC[a: ATOM, hd: Name←NIL] RETURNS[Name] = {RETURN[CONS[a, hd]]};

EqName: PROC[n1, n2: Name] RETURNS[BOOL] = {

UNTIL n1=NIL OR n2=NIL DO

IF n1.first#n2.first THEN RETURN[FALSE];

n1←n1.rest; n2←n2.rest;

ENDLOOP;

RETURN[n1=n2]};

*******REAL WORK*******

MkScope: PROC[vs: Vs] RETURNS [Vs] = {RETURN[

Unless a scope contains some structural items, its contents simply become contents of the surrounding node.

(IF HasStruc[vs] THEN MkVs[MkV[m: scope, cVs: Contents[vs]]] ELSE Contents[vs])]};

HasStruc: PROC[vs: Vs] RETURNS [BOOL] = {

Tests if a list of Vs has any structural items (binds, indirections, or structurally opened node).

One: PROC[v: V] RETURNS [BOOL] = {

RETURN[v.m=bindStruc OR v.m=evalStruc OR v.m=onodeStruc OR v.m=quotedTerm]};

RETURN[ IF vs=NIL THEN FALSE ELSE (One[vs.first] OR HasStruc[vs.rest]) ]};

MkTag: PROC[e: Env, n: Name] RETURNS [V --tag--] = {

For every valid tag in a script there must be a tag definition (section 3.1.1). That definition is given as a (structural) binding of a TAG$ node to the tag's name. capture that definition in the tuple representing the tag along with its name (remember these are value semantics, not an implementation: a real implementation would avoid copying the definition and point to it somehow).

tagDef: --node--V = EvalName[e, n];

IF NOT HasTag[tagDef.cVs, $TAG] THEN Error[InvalidTag];

RETURN[MkV[m: tag, cV: tagDef, xName: n]]};

MkNode: PROC[e: Env, vs: Vs] RETURNS [V -- node--] = { --[note 1]--

The canonical form for a node is a list of all its tags followed by its contents, followed by a sequence of the relevant bindings for each of the tags.

RETURN[MkV[node, ConcatVs[Tags[vs], Contents[vs], RelBindings[e, Tags[vs]]]]]};

The semantic procedures MkBinding, Lookup, EvalName, and Eval are a closely related set and provide the central definition of how names are used in Interscript and how bindings are handled.

MkBinding: PROC[e: Env, v: V, n: Name, kind: Mark --bind|bindStruc--]
RETURNS [--bind|bindStruc-- V] = {

The only difference between a ← and a %← is that the tuple is marked as such.

IF n.rest=NIL THEN RETURN[MkV[m: kind, cV: v, xId: n.first]]

A binding to a simple id results in a simple tuple being made.

ELSE {t: V=EvalName[e, n.rest]; IF t.m#node THEN Error[WrongType];

RETURN[MkBinding[e, MkNode[e, ConcatVs[t.cVs, MkVs[MkV[m: kind, cV: v, xId: n.first]]]], n.rest, kind]]};

A binding to a qualified name is handled by copying the binding corresponding to the prefix of the name (it will be bound to a node value) with a new binding tuple added to the end of the node value. Any future use of EvalName for the name will find that binding first.

};

Lookup: PROC[e: Env, n: Name] RETURNS [V] = {

looking up a simple identifier in an environment is a simple process of marching down the environment list until a binding to that identifier is reached or the end of the list is encountered.

One: PROC[e: Env, id: Id] RETURNS [V] = { -- local procedure to handle simple ids

b: V ← IF e=NIL THEN NIL ELSE IF e.first.xId=id THEN e.first ELSE One[e.rest, id];

IF e.first.m=evalSentinel AND b#NIL THEN e.first.xEnv ← CONS[b, e.first.xEnv];

see note in Eval to explain why the above statement hangs bindings off sentinels put into environments

RETURN[b.cV]};

Looking up a qualified name requires that we first evaluate the prefix of the name without its last identifier using EvalName (which must yield a node value), then make an environment from the bindings of that prefix in which to look up the suffix identifier.

RETURN[ IF n=NIL THEN NIL ELSE One[Bindings[EvalName[e, n.rest]], n.first] ]};

EvalName: PROC[e: Env, n: Name] RETURNS [V] = {

Lookup the name and evaluate it (it might be a quoted term)

RETURN[Eval[e, Lookup[e, n]]]};

Eval: PROC[e: Env, v: V] RETURNS [V] = {

IF v.m=quotedTerm THEN {

ePlus: Env = CONS[MkV[m: evalSentinel, xId: NIL], e];

We push an evalSentinel tuple on the top of the environment so that lookup can collect all the bindings needed to evaluate this quotedTerm. The environment that ends up hanging off the evalSentinel tuple is then tucked away in the vOfQ tuple that results from evaluating the quoted term. This information is needed to correctly externalize an indirection that caused a quoted term to be evaluated. If the bindings in the xEnv component of the vOfQ tuple match those extant when the document is externalized, then the editor can simply output the indirection that gave rise to the vOfQ tuple safe in the assurance that its value is unchanged.

RETURN MkV[m: vOfQ, cV: S[ePlus, v.cTerm].first, xEnv: ePlus.first.xEnv]}

ELSE RETURN[BaseVal[v]]};

BaseVal: PROC[v: V] RETURNS [V] = { RETURN[

(IF v.m=evalStruc THEN BaseVal[v.cV.cV -- it must be a vOfQ--] ELSE v)]};

Op: TYPE={ADD, SUB, MUL, DIV, LT, EQ, SUBSCRIPT};

IsInteger: PROC[v: V] RETURNS[BOOL] = { -- check that a number is actually an integer

IF v.m#num THEN Error[WrongType];

RETURN[Real.Float[Real.Fix[v.cNum]]=v.cNum]};

Apply: PROC[op: Op, v1: V, v2: V] RETURNS [result: V] = {

apply the operator to the base values for v1 and v2 (i.e., evaluate any quoted terms that you encounter)

vv1: V = BaseVal[v1]; vv2: V = BaseVal[v2];

SELECT op FROM

ADD, SUB, MUL, DIV, LT => {IF vv1.m#num OR vv2.m#num THEN Error[WrongType];

RETURN[(SELECT op FROM

ADD => MkV[m: num, cNum: (vv1.cNum+vv2.cNum)],

SUB => MkV[m: num, cNum: (vv1.cNum-vv2.cNum)],

MUL => MkV[m: num, cNum: (vv1.cNum*vv2.cNum)],

DIV => MkV[m: num, cNum: (vv1.cNum/vv2.cNum)],

LT => (IF vv1.cNum<vv2.cNum THEN True ELSE False),

ENDCASE => NIL)]};

EQ => {IF v1.m#v2.m THEN RETURN[False];

Tuple marks have to be the same if they are to be equal

SELECT vv1.m FROM

node => RETURN[False]; -- two separate node values are never equal

atom => RETURN[IF (vv1.cId=vv2.cId) THEN True ELSE False];

num => RETURN[IF (vv1.cNum=vv2.cNum) THEN True ELSE False];

string => RETURN[IF String.EqualString[vv1.cStr, vv2.cStr] THEN True ELSE False];

ENDCASE => Error[WrongType]};

SUBSCRIPT => -- use v2 as an index into the node v1

IF NOT IsInteger[vv2] OR vv1.m#node THEN Error[WrongType]

ELSE RETURN[NthItem[Contents[Items[vv1]], Real.Fix[vv2.cNum]]];

ENDCASE};

Tags: PROC[vs: Vs] RETURNS [Vs --tag items only--] = {RETURN[(Sort[GetTags[vs]])]};

GetTags: PROC[vs: Vs] RETURNS [Vs --tag items only--] = {

One: PROC[v: V] RETURNS[Vs] = {RETURN[(IF v.m=tag THEN MkVs[v] ELSE NIL)]};

RETURN[ IF vs=NIL THEN NIL ELSE ConcatVs[One[vs.first], GetTags[vs.rest]] ]};

Sort: PROC[vs: Vs --tag items only--] RETURNS [sVs: Vs --tag items only--] = {--[note 2]--};

Items: PROC[v: V] RETURNS [Vs] = {

IF v.m=node THEN RETURN[RawItems[v.cVs]] ELSE Error[WrongType]};

RawItems: PROC[vs: Vs] RETURNS[Vs] = {

Pull all the basic items out of a list, burrowing into evalStruc, onodeStruc, or scope tuples to extract their contained items

One: PROC[v: V] RETURNS[Vs] = {

Local procedure to look at a single tuple and pull items out of it if it is an evalStruc, onodeStruc, or a scope. Otherwise the value is returned.

RETURN[(SELECT v.m FROM

evalStruc, onodeStruc => v.cVs,

scope => RawItems[v.cVs],

ENDCASE => MkVs[v])]};

RETURN[IF vs=NIL THEN NIL ELSE ConcatVs[One[vs.first], RawItems[vs.rest]]]};

Bindings: PROC[v: V] RETURNS [Env] = {

Make an environment from the bindings in the node v.

IF v.m=node THEN RETURN[GetBindings[Items[v]]] ELSE Error[WrongType]};

GetBindings: PROC[vs: Vs] RETURNS[e: Env] = {

One: PROC[v: V] RETURNS[e: Env] = { -- local procedure for catching bindings

RETURN[IF v.m=bind OR v.m=bindStruc THEN MkVs[v] ELSE NIL]};

RETURN[IF vs=NIL THEN NIL ELSE ConcatVs[GetBindings[vs.rest], One[vs.first]]]};

RelBindings: PROC[e: Env, vs: Vs --tag items only--] RETURNS [Vs] = {

Find all the relevant bindings for the tags in vs and make an environment from them.

One: PROC[e: Env, v: V] RETURNS [Vs] = {

IF v.m#tag THEN Error[WrongType];

RETURN[RefineBindings[e, PullDefaults[Bindings[
EvalName[e, MkName[$attributes, v.xName]]]]]]};

RETURN[IF vs=NIL THEN NIL ELSE ConcatVs[One[e, vs.first], RelBindings[e, vs.rest]]]};

RefineBindings: PROC[e: Env, basis: Vs] RETURNS [Vs] = {

For every identifier bound non-structurally in basis, place a binding in the result Vs. If there is a binding in the environment for the identifier use that; if not, use the one for it in basis (which provides a default value).

One: PROC[e: Env, b: V] RETURNS [Vs] = {

IF b.m#bind AND b.m#bindStruc THEN Error[WrongType];

RETURN[IF b.m=bindStruc THEN NIL

ELSE MkVs[MkV[
m: bind, cV: Lookup[ConcatVs[e, MkVs[b]], MkName[b.xId]], xId: b.xId]]]};

RETURN[IF basis=NIL THEN NIL

ELSE ConcatVs[RefineBindings[e, basis.rest], One[e, basis.first]]]};

PullDefaults: PROC[e: Env] RETURNS [Vs] = {

For every binding in the environment return a binding that gives a default value for that identifier. Every identifier in the incoming environment is bound to a TYPE$ node (because relBindings reaches into the attributes attribute for a TAG$ node to get these type definitions). The result Vs is not an environment, but is in the same left-to-right order as the bindings occuring in the attributes attribute for the TAG$ node from which they came.

One: PROC[b: V] RETURNS [Env] = {

IF b.cV.m=node AND HasTag[Items[b.cV], $TYPE] THEN

RETURN[MkVs[MkV
[m: bind, cV: Lookup[Bindings[b.cV], MkName[$default]], xId: b.xId]]]

ELSE Error[WrongType]};

RETURN[IF e=NIL THEN NIL ELSE ConcatVs[PullDefaults[e.rest], One[e.first]]]};

HasTag: PROC[vs: Vs, id: Id] RETURNS[BOOL] = {

Checks whether a given id occurs in the list of tags.

RETURN[ IF vs.first.m=tag AND vs.first.cId=id THEN TRUE ELSE HasTag[vs.rest, id] ]};

Contents: PROC[vs: Vs] RETURNS [Vs] = {

The contents of a node are all the tuples in it except tags and non-structural bindings (it thus includes all other values, structural bindings, structured openedNodes, and indirections).

One: PROC[v: V] RETURNS [Vs] = {

RETURN[(IF v.m=tag OR v.m=bind THEN NIL ELSE MkVs[v])]};

RETURN[ IF vs=NIL THEN NIL ELSE ConcatVs[One[vs.first], Contents[vs.rest]] ]};

2.3.3. Notes for semantics

The semantics deal almost exclusively in immutable values, i.e., values which, once created are never changed. As a result, it copies values prodigiously. Any implementation of Interscript semantics should attempt to use pointers wherever possible to avoid extraneous copies and copying of values.

[1]: All the relevant attributes for each tag are listed at the end of a node, even if the node's tags' relevant attribute sets are not disjoint.

To place some bindings in a node without regard for using a specific node type indicated by tags, each binding must be a structural binding; e.g.,

abbrev ← {a%𡤅 b%𡤇}

...

{ . . . abbrev^| . . .}

[2]: the only requirements on SORT are that it sort names correctly and eliminate any duplicates. More precisely, if N* is a list of names to be sorted and S* is the result of sorting N* and removing duplicates, then

(i[0..card(S*)-1) si<si+1) ' (i[0..card(N*)) sS* ni=s)

2.3.4 Equivalence of scripts

A script s1 is equivalent to a script s2 if and only if S[X,s1]=S[X,s2].

One way to test Interscript-conforming editors might be to develop a set of test scripts, and, for each script in the set, check it for equivalence with a version of the script obtained by internalizing and then externalizing the script using the editor in question.

3. Tags, Node Invariants, and Safe Editing

Tags, Node Invariants, and Safe Editing

Interscript makes it possible for editors to manipulate the parts of documents they understand without altering parts they do not. This section describes the facilities that make this possible and a set of safety rules that will enable editors to preserve this property.

The first part of this section gives the details of how node tags are "defined" in the standard. The second part describes the invariant associated with a node and how it is specified. And the third part shows how a tag's definition can be used by an editor to treat nodes with that tag safely even if the editor does not specifically implement the tag.

3.1. Tags, types, and node invariants

This section uses the Interscript base language syntax and semantics to define precisely the properties of tags, which is accomplished by binding a TAG node to the tag's name. To make a tag definition precise, we will introduce the notion of type, which is not embedded in the base language, but rather is defined using it. Then it will actually be possible to define (recursively) the properties of the tag TAG, including the notion of invariants for nodes, which can be used by an editor to test whether an edit is legal, i.e., invariant-preserving.

3.1.1. Defining tags

For every tag T used in a script there must be a corresponding definition of the important properties of that tag. This is accomplished by (structurally) binding it to a TAG node, which is used to capture some of T's properties so as to enable editors to operate on T nodes without necessarily having knowledge of T built into the editor. A TAG node defining T will have the following relevant attributes:

attributes: a TYPE node (see definition of TYPE below) with each relevant attribute of T bound to a TYPE node containing the default value for the relevant attribute, and its type, which is a predicate on the attribute's value in a given node. If no attributes are specified, the default is that the tag has no relevant attributes.

contentType: a TYPE node value specifying the type(s) of the contents allowed in a T node. The default type is Any (see definition of TYPE below).

nodeInvariant: a predicate expressible as an Interscript (quoted) expression that must be true of any T node. Its default value is True.

hasMoreInv: a Boolean indicating whether there is more invariant for T than can be expressed in T's TAG node; if hasMoreInv is False then T's invariant is completely captured by the information in its TAG definition. Its default value is True.

requiredTags: an untagged node listing as atoms all the tags other than T that are required to be on a T node (this is a way of coupling T's invariant to the invariants for other tags). The default is that no other tags are required.

reducesTo: either a quoted expression of the form '{...}' that can be used to reduce a tag to an equivalent (but more basic) tag by treating the appearance of T$ as T.reducesTo%|, or NIL if T does not reduce to any other tag. The default is NIL.

tagOnly: a Boolean indicating that a node with this tag need only be viewed by its parent as a node consisting of just its tags for the purpose of checking the parent's invariant. The default is True.

3.1.2 Defining types

In order to define the attributes and contentType attributes of a TAG definition, we must be able to define types. This is done by having a standard tag, TYPE, which can be used to construct TYPE nodes to define types. The definition of the tag TYPE can be "described" in Interscript (even though such a definition would actually be circular since TAG uses it):

TYPE%←{TAG$

attributes ←{

code %←node

tags %←AtomList^

union %← NIL -- or a node with a list of TYPE$ nodes as contents

predicate %←{Predicate^ | default%←'1'}

default %←{Any^ | default←NIL --Minimal automatic default--} }

contentType ←None^}

That is, a TYPE node has the relevant attributes code, tags, union, predicate, and default, with the following interpretations:

code must be an atom and one of atom, evalStruc, node, num, quotedTerm, string, corresponding to a subset of the marks on VRecs in section 2.3.1. Or, code can have the value Any, meaning that the type being declared may correspond to any of the marks above.

tags is a list defining what tags a node of this type must possess.

union, if not NIL, is a node value containing as contents a list of TYPE nodes; in this case, the type being defined is viewed as having to satisfy one or more of the type definitions given in the union.

predicate gives an Interscript quoted expression which a value must satisfy in order to be a valid instance of the type. The default value ('1') always yields True.

default provides a convenient way of specifying a legal default value for a node of the type being defined.

A TYPE node has no contents (since it is basically a record with its relevant attributes as components), so its contentType has the type None (see below).

Note that the above definition of the tag TYPE is circular in that it "uses itself" to define the types of the relevant attributes code, tags, union, predicate, and default.

Here are a set of standard TYPE definitions for the basic Interscript data types and for types needed for defining tags and types:

Bool% ← {TYPE$ code←num predicate%←'(A^ EQ 0) + (A^ EQ 1)'}

Number% ← {TYPE$ code←num}

String% ← {TYPE$ code←string}

Atom% ← {TYPE$ code←atom default𡤀 --no default--}

AtomList% ← {TYPE$ tags←{ATOMLIST} default←{}} -- no default

ATOMLIST% ← {TAG$ contentType←Atom^}

Any% ← {TYPE$ code←Any}

None% ← {TYPE$ predicate%←'0'} --ever False, forbids any content

Predicate% ← {TYPE$ code←quotedTerm}

Type% ← {TYPE$ tags←{TYPE} default𡤊ny^}

To test if a given value v can be considered to satisfy a type with definition T, one can simply invoke the following HasType procedure with the V representations of v and T. HasType requires that

(1) either T.code is Any or T.code matches the mark of the value v; and

(2) if T.code is node, then v's tags must be a subset of the tags given in T.tags; and

(3) if T is a union type (T.union#NIL), then HasType[v, subtype] must be true for at least one of the subtypes given in the list T.union; and

(4) v must satisfy T.predicate.

To help in understanding how HasType works, try testing the value of MkV[m: num, cNum: 2] against the type definition Bool above. HasType should return FALSE.

HasType: PROC[v: V, T: --node--V] RETURNS[r: BOOL ← FALSE] = {

bt: Env = Bindings[T];

IF NOT (EqName[Lookup[bt, MkName[$code]].xName, MkName[$Any]] OR EqName[Lookup[bt, MkName[$code]].xName, MkName[MarkAsAtom[v.m]]])

THEN RETURN[FALSE];

IF EqName[Lookup[bt, MkName[$code]].xName, MkName[$node]] THEN

IF NOT Subset[Tags[v.cVs], Lookup[bt, MkName[$tags]].cVs] THEN RETURN[FALSE];

FOR tt: Vs ← Lookup[bt, MkName[$union]].cVs, tt.rest UNTIL tt=NIL DO

IF HasType[v, tt.first] THEN EXIT; -- can stop checking unions as soon as one works

REPEAT

FINISHED => RETURN[FALSE]; -- v didn't satisfy any of the types in the union

ENDLOOP;

RETURN[ ApplyPred[arg: v, pred: Lookup[bt, MkName[$predicate]]] ]};

ApplyPred: PROC[arg: V, pred: V] RETURNS[t: BOOL] = {

predEnv: Env = CONS[MkBinding[X, arg, MkName[$A], bind], X]; -- just the external environment plus A𡤊rg

r: V = Eval[e: predEnv, v: pred];

RETURN[IF r.m#num THEN FALSE ELSE (r.cNum#0) ]};

Subset: PROC[sub, set: Vs--all atoms--] RETURNS[BOOL] = {

One: PROC[a: V, set: Vs] RETURNS [BOOL] = { RETURN[

(IF set=NIL THEN FALSE

ELSE IF EqName[a.xName, set.first.xName] THEN TRUE ELSE One[a, set.rest])]};

RETURN[IF sub=NIL THEN TRUE ELSE (One[sub.first, set] AND Subset[sub.rest, set])]};

3.2. The invariant associated with a node

A node in an internalized script is valid if it satisfies the complete invariant for the node. A node's complete invariant depends on what tags are on the node, the TAG definitions for those tags, and the actual contents of the node and bindings of the relevant attributes of the node's tags. The following function NodeInvariant evaluates as much of a node's complete invariant as is possible without recourse to any external invariants associated with the node's tags. NodeInvariant returns the value no if a node is invalid, yes if it is valid, and checkExternalInvariant if it is valid modulo the external invariants of any of its tags that have such.

NodeValidity: TYPE = {yes, no, checkExternalInvariant};

NodeInvariant: PROC[n: --node--V] RETURNS[r: NodeValidity←yes] = {

IF n.m#node THEN Error[WrongType];

FOR t: Vs ← Tags[n.cVs], t.rest UNTIL t=NIL DO

SELECT TagCorrect[n, t.first.cV] FROM

no => RETURN[no];

checkExternalInvariant => r ← checkExternalInvariant;

ENDCASE;

ENDLOOP;

RETURN[r]};

TagCorrect: PROC[n, tagType: --node--V] RETURNS[NodeValidity] = {

bt: Env = Bindings[tagType];

RETURN[ IF

CheckAttributes[bs: Bindings[n],
types: Bindings[Lookup[bt, MkName[$attributes]]]] AND

CheckContents[Contents[n.cVs], Lookup[bt, MkName[$contentType]].cV] AND

Subset[Lookup[bt, MkName[$requiredTags]].cVs, Tags[n.cVs]] AND

ApplyPred[arg: Stripped[n], pred: Lookup[bt, MkName[$nodeInvariant]]]

THEN (IF Lookup[bt, MkName[$hasMoreInv]].cNum=1 THEN yes

ELSE checkExternalInvariant)

ELSE no]};

CheckAttributes: PROC[bs, types: Env] RETURNS[BOOL] = {

One: PROC[b: --bind|bindStruc-- V] RETURNS[BOOL] = {

t: --bind|bindStruc-- V = Lookup[types, MkName[b.xId]];

RETURN[IF t=NIL THEN TRUE ELSE HasType[b.cV, t.cV]]};

RETURN[IF bs=NIL THEN TRUE
ELSE (One[bs.first] AND CheckAttributes[bs.rest, types])]};

CheckContents: PROC[vs: Vs, type: V] RETURNS[BOOL] = {

One: PROC[c: V] RETURNS[BOOL] = {

RETURN[IF c.m=bindStruc THEN TRUE ELSE HasType[c, type]]};

RETURN[IF vs=NIL THEN TRUE ELSE (One[vs.first] AND CheckContents[vs.rest, type])]};

Stripped: PROC[n: --node--V] RETURNS[--node--V] = {

RETURN[MkV[m: node, cVs: Strip[Items[n]]]]};

Strip: PROC[vs: Vs] RETURNS[svs: Vs] = {

One: PROC[v: V] RETURNS[Vs] = {

SELECT v.m FROM

bind => RETURN[MkVs[MkV[m: bind, cV: One[v.cV].first, xId: v.xId]]];

node => RETURN[IF ShouldStrip[v] THEN MkVs[MkV[m: node, cVs: Tags[v.cVs]]]
ELSE MkVs[Stripped[v]]];

ENDCASE => RETURN[MkVs[v]]};

RETURN[IF vs=NIL THEN NIL ELSE ConcatVs[One[vs.first], Strip[vs.rest]]]};

ShouldStrip: PROC[v: --node--V] RETURNS[BOOL] = {

all of v's tags must agree on tagOnly for it to be true

tags: Vs = Tags[v.cVs];

FOR tags: Vs ← Tags[v.cVs], tags.rest UNTIL tags=NIL DO

tDef: V = tags.first.cV;

btDef: Env = Bindings[tDef];

IF Lookup[btDef, MkName[$tagOnly]].cNum#1 THEN RETURN[FALSE];

ENDLOOP;

RETURN[TRUE]};

3.3 Safety rules for editors

Using TAG definitions and the concept of NodeInvariant, we can give some conservative rules for editors in treating parts of documents that correspond to nodes in a script. These are simply observations about how a node's complete invariant can be affected by various editing operations.

Rendering a node

An editor may display any node at all since this is not an editing operation. If it does not implement any of the tags on a given node, the editor could still display the external form of the node in the Interscript publication format (since it has to be able to externalize the document as a script anyway). Of course, if an editor implements one or more of the tags on a node, it can use those mechanisms to render the node and use the more basic, generic display mechanism of script externalization for the attributes or content that it does not directly implement.

Editing a node

All editing operations within a node can be composed from the following atomic operations:

(1) replacing a value (whether of an attribute, an item of content, or the value of a structured binding) by another value of the correct type;

(2) removing an item of content (includes structured bindings and subnodes)

(3) adding an item of content (includes structured bindings and subnodes)

(4) adding a tag to a node

(5) removing a tag from a node

In general, an atomic operation affects a subset of the tags on a node in the sense that it could potentially affect the invariants of those tags (e.g., if it changed the value of an attribute relevant to them). A single-node sequence of operations is one that affects exactly one node.

A sequence of atomic operations that maintains the invariants of all nodes affected by the edit is called valid.

What valid single-node operations can an editor perform on nodes that it does not directly implement? Here is a general, safe mechanism:

save a copy of the node somewhere;

perform the operation sequence;

if the node's invariant and its parent's invariant are true allow the edit, otherwise restore the original state.

6. Other Constructs

Other Constructs

6.1. LABELs and references

Nodes with tag LABEL$ provide "references" from one part of a script to another. It has one relevant attribute, labels, which is a list containing a mixed sequence of atoms (names) or tagged nodes. The atoms are the actual labels on the LABEL node. A tagged node in the sequence provides both a label name (its tag name) and parameters (its relevant attributes) A LABEL tag can coexist with any other tag on a node.

To denote a reference from one part of a script to another, you (the editing system that is externalizing some document) label the node to be referred to with some atom. Then you bind that same atom to a relevant attribute of the node that needs to "refer" to the labeled node.

When multiple nodes have the same label, it can be used to represent a relationship among those nodes (colors for matching the logical and layout structures use labels this way). It is assumed that an editor that implements labeled nodes will choose an efficient implementation of them with some form of fast lookup to find a list of all the node representatives in the editable form of the document. The TAG definition for LABEL is

LABEL %← {TAG$

attributes ← {TYPE$

labels %← {TYPE$ code←node union←{Atom^ NodeList^} default ← {}}

}

Appendix A

GLOSSARY

GLOSSARY

Italics indicate words defined in this glossary.

abbreviation An invocation used to shorten a script, rather than to indicate structure

attribute A (name, value) pair, identified by its name, which is bound to a value

atom A name denoting only itself. Occurrences of the same atom in different parts of a script all denote the same primitive value.

base language The part of the Interscript language that is independent of the semantics of particular tags

base semantics The semantic rules that govern how scripts in the base language are elaborated to determine their contents and attributes

binding The operation of associating a value with a name; also the resulting association

contents The vector of values denoted by a node of a script

document The internalization of a script in a representation suitable for some editor

dominant structure The tree structure of a document corresponding to the node structure of its script

elaborate (verb) To develop the semantics of a script or a node of a script according to the Interscript semantic rules. This is a left-to-right, depth-first processing of the script

encoding A particular representation of scripts. The encoding presented in this standard is the publication encoding for writing about Interscript.

environment A value consisting of a sequence of attributes. An environment may be either free-standing or nodal. A free-standing environment is a structured value much like a record, with the components being the attributes of the environment. A nodal environment is associated with a node of a script and represents the attributes bound in that node.

expression A syntactic form denoting a value

external environment A standard environment in whose scope an entire script is elaborated. This environment is named X.

externalization The process of converting from a document to a script.

fidelity The extent to which an externalization or internalization preserves contents, form, and structure

hierarchical name A name containing at least one period, whose prefix unambiguously denotes the naming authority that assigned its meaning

identifier A sequence of letters used to identify an attribute

integer A mathematical integer in a limited range

indirection The appearance of a name%, denoting the evaluation of name plus the necessity of remembering that the result came from evaluating name.

internalization The process of converting from a script to a document; also the result of that process

Interscript The name of this editable document standard

invocation The appearance of a name^ (see also indirection)

literal A representation of a value of a primitive type in a script

local binding A binding of a value to a name, causing the current environment to be updated with the new attribute; any outer binding's scope will resume at the end of the innermost containing node

name A sequence of identifiers internally separated by periods; e.g., a.b.c

nested environment The initial environment of a node contained in another node

node Everything between a matched pair of {}s in a script; this may represent a branch point in a document's dominant structure, or it may simply be a structured value acting as a vector, a record-like value, or a mixture of the two.

number The Interscript primitive type for representing numeric values.

primitive type Boolean, number, string, or atom

primitive value A literal or a node, vector, or environment containing only primitive values

property Each tag on a node labels it with a property; the properties of a node determine how it may be viewed and edited

publication encoding The encoding for scripts used for examples to be read by people rather than editing systems.

qualified name see hierarchical name.

quoted expression A value which is an expression bracketed by single quotes ("'"); the expression is evaluated by invoking the name (with or without indirection).

real A floating point number

scope The region of the script in which invocations of the attribute named in a binding yield its value; the scope starts textually at the end of the binding, and generally terminates at the end of the innermost containing node

script An Interscript program; the interchangeable result of externalizing a document

string A literal which is a sequence of characters bracketed by double quote marks, e.g., "This is a string!"

style A quoted expression to be invoked in a node to modify the node's environment, labels, or contents

tag An atom labelling a node using the syntax atom$; the properties of a node correspond to the set of tags labelling it

value A primitive value, node, or quoted expression

X The standard outer environment for an entire script

End of Glossary

Appendix B

Example of Script Evaluation

Example of Script Evaluation

This appendix contains a simple example of a script as it is

(a) written in the publication encoding,

(b) represented in terms of the abstract grammar, and

In the process of going from (a) to (b) the script is parsed; in going from (b) to (c) the parse tree is evaluated using the semantics defined in section 2.3.

The source script

{aTag$

relV1 ← 0 -- relV1 is a relevant attribute of an X node

v ← relV1^+5 -- v is not a relevant attribute of an X node

q%← -- bind a quoted expression to q, which is not a relevant attribute

'{"FalseString" "TrueString"}!(relV1^ LT v^)' -- conditional expression

"content" -- simple string as content

q% -- evaluate q (remembering that the result came from q)

}

For purposes of this example, assume that the tag X is defined as follows:

aTag%←{TAG$

attributes ← {

relV1%← Number^

relV2%← {String^| default←"relV2 default value"}}

contentType ← String^

reducesTo ← NIL}

Abstract version of the source script

script(

node( items(items(items(items(items(items(

item( tag(name(id(X)))) -- aTag$

item( binding(name(id(relV1)) term(primary(number(0))))) -- relV1 ← 0

item( binding( -- v ← relV1^+5

name(id(v))

term(

term(primary(invocation(name(id(relV1)))))

op(ADD)

primary( number(1))

)

))

item( sBinding( -- q%←

name(id(q))

sRhs(quoted(term(

term(primary(

node(items(items( -- '{"FalseString" "TrueString"}!(relV1^ LT v^)'

item(term(primary(literal(string("FalseString")))))

item(term(primary(literal(string("TrueString")))))

)))

))

op(SUBSCRIPT)

primary(term(

term(primary(invocation(name(id(relV1)))))

op(LT)

primary(invocation(name(id(v))))

))

)))

))

item( term(primary(literal(string("content"))))) -- "content"

item( indirection(name(id(q)))) -- q%

)))))))

)

Elaboration of abstract script's semantics

[m: node,

cVs: LIST[

[m: tag, xId: aTag],

[m: bindStruc,

cV: [m: quotedTerm,

term(

term(primary(

node(items(items(

item(term(primary(literal(string("FalseString")))))

item(term(primary(literal(string("TrueString")))))

)))

))

op(SUBSCRIPT)

primary(term(

term(primary(invocation(name(id(relV1)))))

op(LT)

primary(invocation(name(id(v))))

))

)],

xId: q],

[m: string, cStr: "content"],

[m: evalStruc,

[m: vOfQ, -- note the carried-along environment

[m: string, cStr: "TrueString"],

xEnv: LIST[[m: bind, cV: [m: num,0], xId: relV1] [m: bind, cV: [m: num,5], xId: v]]

xName: LIST[$q, NIL],

[m: bind, cV: [m: num, 0], xId: relV1], -- an explicit binding

[m: bind, cV: [m: string, "relV2 default value"], xId: relV2] -- from aTag's definition

] -- end of cVs LIST

] -- end of the node