[_CD6_]<interscript>DraftStd>Interscript-Archive.tioga!4

Release as [Indigo]<Interscript>Std>Interscript-Archive.tioga, .press
Draft [Indigo]<Interscript>DraftStd>Interscript-Archive.tioga, .press
Last edited By Mitchell on February 28, 1983 9:54 pm

LIMITED DISTRIBUTION: FOR XEROX INTERNAL USE

Interscript Archival Text

March, 1983

XEROX
PALO ALTO RESEARCH CENTER
COMPUTER SCIENCE LABORATORY
3333 Coyote Hill Road / Palo Alto / California 94304

February 28, 1983 10:03 pm

2.4.1. The interchange encoding

2.4.2. Normalization

Every encoding must define a normalization function N, which maps a script in the encoding into another script in the encoding which generates the same output. N must be idempotent (i.e., N2=N); it must not change the fidelity level of the script (see 2.4.3). If a script violates the definition of Interscript, a normalization function may report this fact instead of producing a normalized result. In other words, normalization need not be defined on erroneous scripts.

The purpose of this function is to make possible a precise description of the rules for private encodings in section 2.4.4. The idea is that when an encoding provides several ways of saying the same thing (typically a basic way, and some more concise ways which work in common special cases), the normalized script will uniformly choose one way of saying it. Note that the normalized script is not intended for any purpose other than precisely defining a notion of equivalent script; it is neither especially compact nor especially readable.

The normalization function for the interchange encoding is defined as follows:

Comments are omitted.

Delimiters are replaced by empty if possible, otherwise with ",".

Leading zeros are dropped from a digits encoding of an integer.

Reals are uniformly encoded in E format with a single non-zero digit to the left of the "." and no trailing zeros; 0 is encoded by "0.0".

An upper case letter in an identifier is replaced by the corresponding lower case letter.

Each direct invocation (abbreviation) is replaced by its binding.

2.4.3. Level restriction

3.1. Standard and Editor-Specific Transcriptions:

APPENDIX B

ARBITRARY CHOICES

"One of the primary purposes of a standard is to be definitive about otherwise arbitrary choices."

There are many places in this proposal where we have made an arbitrary choice for definiteness. It will be important that the ultimate standard make some choice on these points; it matters little whether it is the same as ours. To forestall profitless debate on these points, we have tried to list some of the choices that we believe can be easily changed at a later date:

Encoding choices:

The choice of representations for literals (we generally followed Interpress here).

The selection of particular characters for particular kinds of bracketting, and for particular operators.

The choice of infix and functional notation for the interchange encoding (as opposed, e.g., to Polish postfix).

The choice of particular identifiers for basic concepts.

Linguistic choices:

The choice of a particular set of basic operators for the language.

The particular set of primitive data types (we followed Interpressits set seems about as small as will suffice).

The choice of particular syntactic sugars for common linguistic forms.

We need a two-level structure for documents expressed in the base language to be both (a) interchangeable among different editors, and (b) retain information of special significance to a specific editor. We call (a) the interchange standard information, or standard information and (b) editor-specific information.

Basically, an editor X is free to couch properties in its own terms, which can make it easy for it to consume a script produced by itself, but it must provide a set of mappings which will transform properties into the interchange standard. The recommended method for doing this is to invoke its name as the very first item in the root span of any X-specific subtree. The rules for inheritance of properties mean that often only the root span of a document will need to have this property, but there is nothing wrong with spans being in different editor-specific terms provided they invoke the appropriate editor properties.

Now, to be a valid standard script, the document must have the definition of the name X placed in the script itself (There is nothing wrong with having libraries of editor-specific b standard mappings in a library of some sort to avoid having copies of them in each script).

When X parses an X-specific script, it will use its X-specific attributes and never invoke the mappings from X-specific information to standard terms; i.e., it can use a null definition for the name X. However, when such a document is interpreted by some other editor Y, any time it tries to access a standard name, the mapping from that name to the corresponding expression in terms of the X-specific values in the script will have been provided by the definition of X. What guarantee is there that this can always be done?

It is worth noting first that we are speaking here of a script being internalized by an editor, Y, rather than being externalized. Consequently, it is never necessary to access standard names in left-hand contexts; i.e., to do bindings that are not part of the script in order to interpret it. Y may, however, need to access components of environments in order to internalize the script for itself. These are always values in right-hand side contexts, and must be computed in terms of the X-specific information that X put in the script. We can examine this issue on a case-by-case basis. Below is a list of examples of possible editor-specific uses of the base language and the mappings that would allow another editor to treat the document in standard terms:

Symbolic values used instead of numbers:
supply standard values for the symbolic values:

Standard: lineLeading ← 1*pt -- some numeric value --
Editor-specific: lineLeading ← single
mapping: single = 2*pt

Different names used for standard names:
supply a binding to the standard name from the editor-specific name using a quoted expression so that it is only evaluated when needed in a righthand context:

Standard: lineLeading ← 2*pt
Editor-specific: lineSpace ← single
mapping: lineLeading ← 'lineSpace'

Different concepts used for standard ones:
supply a binding to the standard attribute names from the editor-specific concepts using quoted expressions so that they are only evaluated when needed in righthand contexts:

Standard: lineLeading ← 2*pt
Editor-specific: lineSpacing ← [fontSize� on� leading𡤁] -- lineSpacing units assumed to be pts --
mapping: lineLeading ← 'pt*Spacing.onSpacing.fontSize' -- compute result in standard units --

In general, one can use the facilities of the base language to write essentially arbitrary programs that can be bound as quoted expressions to a standard identifier to cause the appropriate value to be computed based on editor-specific information put in the document by the editor that externalized it. Moreover, since the mappings provided by editor X can be overridden in any subtree of the document, an editor that does not "understand" some subtree of a document produced by another editor Y can simply leave that subtree intact when producing an edited version of the original script except to ensure that that subtree's root span's first expression is an invocation of "Y", which will cause Y's editor-specific mappings to obtain in that subtree.

For each internalization fidelity level L of Interscript, there is an (idempotent) level restriction function RIL which converts an arbitrary interchange script into an interchange script of level L. An interchange script is of level L if RIL applied to it is the identity. A restriction function replaces an excluded structure with its value according to the semantics of Interscript, converts excluded form information into additional content with a special property, and removes excluded tags.

The interchange encoding is designed to simplify creation, communication and interpretation of scripts for the widest possible range of editors and systems. For this reason, a script in the interchange encoding is represented as a sequence of graphic (printable) characters taken from the ASCII set; the subset of ASCII used is also a subset of ISO 646. Communication of a script in the interchange encoding requires only the ability to communicate a sequence of ASCII characters; Interscript does not specify how the characters are encoded. In effect, we define a text representation of the commands to be executed.

The choice of a text format for the interchange encoding leads to rather lengthy scripts in some cases. The bulk of an interchange script presents no great problem for document storage, since a document need not be stored in this form. Rather, as it is transmitted, the sending editor can translate its own private encoding into the interchange encoding. Similarly, the receiving editor can translate the interchange encoding into its own, usually different, private encoding for storage. However, a bulky interchange script may be more expensive to transmit. If a document consists mostly of text, the interchange encoding is quite efficient—very few characters are required in addition to those appearing in the document itself.

Character set. The character set used in the interchange encoding is described by the ISO 646 7-bit Coded Character Set For Information Processing Interchange. The interchange encoding interprets the 94 characters of the G1 set defined in the International Reference Version (ISO 646, Table 2) and the space character (2/0). This set of 95 characters is called the interchange set. Note that except for the concise "string" encoding of vectors described below, the interchange encoding has nothing to do with the integers corresponding to the characters, but depends only on the character set itself.

It is extremely important to understand that the choice of the ISO standard for the interchange format has nothing to do with character mappings in Interscript fonts. Although these mappings must adhere to a character set standard that is shared by interchanging editors, that standard is not part of Interscript. It is expected that Xerox will develop a separate corporate standard in this area.

If the underlying encoding of the ISO character set can also encode other characters (e.g., the control characters (0/0 through 1/15) and del (7/15), or another group of 128 characters if eight bits are being used to encode each character), these are ignored in interpreting an interchange script. This does not mean that these characters are converted to spaces, but that they are treated as if they were not present.

There are several reasons for this choice:

Control characters may be inserted freely by software that generates the interchange encoding. For example, carriage returns (0/13), line feeds (0/10), and form feeds (0/12) may be inserted at will to conform to limitations that may be imposed by an operating system. Restrictions on line length or the use of fixed-length records thus become straightforward.

Control characters may be removed or inserted freely by software that receives the interchange encoding. In this way, the receiving software can adhere to any restrictions imposed by its operating system.

The absence of control characters allows certain kinds of "non-transparent" data communication methods (such as binary synchronous communication) to be used freely.

A minor disadvantage of these conventions is that if a script is typed in, care must be taken not to omit a significant space at the end of a line. Since scripts are normally generated by programs, this is not important. A system for manually generating (and perhaps interactively debugging) Interscript should provide for various convenience features on input, and for prettyprinting the script on output.

Any number of space characters may also be added after any token without changing the meaning. Throughout the following, a delimiter is a space or comma, which may be omitted if the next character is not an alphanumeric, "—" or ".".

VersionId. The first characters of an interchange script conforming to this version of the Interscript standard must be "INTERSCRIPT/INTERCHANGE/1.0#". Note that the VersionId is of variable length, and ends with a space. These conventions simplify the design of systems that must deal with more than one kind of encoding.

If a privately encoded script can be interpreted as a sequence of characters, its first characters must be "Interscript/private/i.j", where private is replaced by an appropriately chosen hierarchical name that identifies the encoding, e.g., "Xerox/860", and i.j is replaced by an appropriate version identification, e.g., "2.4"; the resulting header would be "Interscript/Xerox/860/2.4".

A private encoding that cannot be interpreted as a sequence of characters (e.g., a binary, word-oriented encoding on a 36-bit machine which packs five 7-bit characters into a word) should use any available convention to make its scripts self-identifying.

Following the versionId is a span constituting the body of the script which is in turn followed by the trailer of a script, "ENDSCRIPT". The body of the script contains values encoded as follows.

Integer. An integer is represented in radix 10 notation using the characters "0" through "9" as digits, followed by a delimiter. A negative integer is preceded by a minus sign "—". Thus the decimal number 1234 is encoded as "1234", and —1234 is encoded as "—1234". The trailing delimiter may be empty if the following character is a letter.

A sequence of integer literals in the range 0..255 can be represented in radix 16 notation using the characters "A" through "P" as digits ("A" corresponds to 0, "P" to 15). The entire sequence is enclosed in "#" brackets. For example, the integer 93 is represented as "#FN#", and the sequence of integers 93, 94, 95, 96 as "#FNFOFPGA#". These sequences require only two characters for each integer (plus two characters of overhead). Note that there is no delimiter between the integers in this encoding.

Booleans are represented by the characters "F" and "T", followed by a delimiter.

Real. A real is represented using Fortran E or F notation, with a trailing delimiter. Thus "12.34" is the same as "1.234E1". Minus signs may precede the mantissa or the exponent: "—12.34E—3 ".

Identifier. An identifier is encoded by its characters (which are limited to letters and digits), followed by a delimiter: "x", "arg1". The first character of an identifier must be a letter, and must be written in lower case to distinguish identifiers from universals. Other letters may be written in either case for readability, since case is not significant in distinguishing identifiers.

Vector. A vector is encoded by surrounding a sequence of values with parentheses, "(" and ")".

String. A text vector usually contains integers that are interpreted as character codes. Often these codes lie in the range 32 to 126 inclusive, which are the numbers assigned to the characters of the interchange set by ISO 646. It is convenient to encode an element of such a vector by the character whose ISO code is the desired value. Such a string can be encoded by surrounding the characters with "<" and ">", thus "<Hello!>". If the string contains elements outside the allowed range (i.e., if the value is less than 32 or greater than 126) or the value 62 or 35 (the ISO codes for the characters ">" and "#"), those elements must be represented as integers inside "#" brackets, as described above. The two-character encoding of small integers is designed to make escape sequences compact. Thus "<Hello!>", "<Hello#CB#>", and "<Hel#GMGP#!>" are all equivalent.

Universal names. A universal is encoded by giving a name that begins with an uppercase letter followed by zero or more uppercase letters or digits, followed by a delimiter. E.g., "TEXT", "XEROX860 ".

Span. A span is encoded by a "{", followed by a sequence of items, followed by a "}".

Comment. The beginning and end of a comment are both marked by a double minus sign: the sequence "——" <any characters other than "——"> "——" is a comment and may occur between any two tokens. Comments are ignored in rendering the script.

The tokens of the interchange encoding are defined by the following BNF grammar, together with rules about delimiters:

The delimiter that terminates an identifier or universal may only be empty if the next character is not an alphanumeric, or "—".

The delimiter that terminates an integer may only be empty if the next character is not a digit, "E", "F", "—", or ".".

extra delimiters may be inserted after any token.

token ::= literal | id | ucID | op | bracket | punctuation | comment
literal ::= Boolean | integer | real | string
Boolean ::= ( "F" | "T" ) delimiter
delimiter ::= " " | "," | empty
empty ::= ""
integer ::= [ "—" ] digit digit* delimiter
digit ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
real ::= [ "—" ] digit digit* "." digit* [ "E" integer ] delimiter
string ::= "<" stringElem* ">"
stringElem ::= stringChar | hexSequence
stringChar ::= —— any character but "#" or ">" ——
hexSequence ::= "#" hex* "#"
hex ::= hexChar hexChar
id ::= lowerCase idChar* delimiter
idChar ::= letter | digit
letter ::= lowerCase | upperCase
lowerCase ::= "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | "k" | l" | "m" | "n" |
"o" | "p" | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z"
upperCase ::= hexChar | "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z"
hexChar ::= "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | "J" | "K" | "L" | "M" |
"N" | "O" | "P"
ucID ::= upperCase ucIDchar* delimiter
ucIDchar ::= upperCase | digit
op ::= "+" | "—" | "*" | "/"
bracket ::= "(" | ")" | "{ " | "}" | "<" | ">" | "[" | "]" | ""'
punctuation ::= "." | ";" | ":" | "=" | "←" | "!" | "%" | "|"
comment ::= "——" commentString "——"
commentString ::= —— any sequence of characters not containing "——" ——

A simple listing of an interchange script can just print the character sequence, with line breaks every n characters, or perhaps at the nearest convenient delimiter. Such a listing is reasonably easy to read, so that problems can be tracked down simply by studying it. Additional help in reading the file can be furnished by utility programs which format the file for more pleasant reading.

March 9, 1983 11:58 am - from interscript-standard.tioga

2.1.3.1. Notation for environments

Environments bind identifiers to expressions, in various modes ("=", ":=", "←"):

NULL denotes the "empty" environment

[E | id ← e] means "E with id bound to e"

locVal(id, E) denotes the value locally bound to id in E

locVal(id, NULL) = NIL = ""

locVal(id, [E | id' m e]) = if id=id' then e else locVal(id, E)

2.1.3.2. Semantic functions

R: expression, environment --> expression -- Reduction

R is used for evaluating right-hand sides: identifiers, expressions, etc.

C: expression --> expression -- Contents

C is basically used to indicate which evaluated expressions become part of the content of a node

B: expression, environment --> environment -- Bindings

B indicates the effect a binding has on an environment. B and R are mutually recursive functions (e.g., the evaluation of an expression may cause some bindings to occur as well)

The following four semantic functions occur less frequently in any substantive way in the semantics below. You might wish to skip them until they occur in a nontrivial manner in the semantics.

T: expression --> expression -- Tags

T indicates when an identifier is to be included in the tag set for a node

L: expression --> expression -- Links

L indicates link declarations

Ls: expression --> expression -- Link sources

Ls indicates a link to the set of nodes having associated target links

Lt: expression --> expression -- Link targets

Lt indicates that the node is to be included in the target set of all the names which are prefixes of the name to which the expression should evaluate

2.1.3.3. Presentation by feature

[E is used to represent the value of the environment in which the feature occurs.]

script ::= header node trailer
header ::= "INTERSCRIPT/INTERCHANGE/1.0#"
trailer ::= "ENDSCRIPT"

The semantics of the root node of a script are equivalent to the following general semantics for a node with the initial environment being the outermost, external environment X instead of E:

node ::= "{" item* "}"
R = C = "{" R<"Sub" item*>([NULL | "OUTER" "=" E]) "}"
B = locVal("OUTER", (B<"Sub" item*>([NULL | "OUTER" "=" E])))
T = L = Ls = Lt = NIL

<<note that bindings are evaluated left to right and take effect immediately; this is in contrast to thinking of all the vbindings as happening first and then evaluating the contents using only the "ultimate" bindings, a method which has too high a price in implementation cost, primarily SPACE>>

Nodes have nested environments, and can have more global effects only through global (:=) bindings. The items of a node are implicitly prefixed with the identifier Sub, which may be bound to any information intended to be common to all subnodes in a scope.

item* ::= ""
R = C = T = L = Ls = Lt = NIL
B = E

The empty sequence of items has no value and no effect; this is the basis for the following recursive definition.

item* ::= item1 item*
R = R<item1>(E) R<item*>(B<item1>(E))
B = B<item*>(B<item1>(E))
For F in {C, T, L, Ls, Lt}:
F = F<item1> F<item*>

In general, the value of a sequence of items is just the sequence of item values; binding items affect the environment of items to their right; NIL does not change the length of a result sequence.

term ::= primary op term
op ::= "+" | "-" | "*" | "/"
R = C = R<primary>(E) op R<term>(E)
B = E
T = L = Ls = Lt = NIL

Both the primary and the term must reduce to numbers; the arithmetic operators are evaluated right-to-left (a la APL, without precedence) and bind less tightly than application.

primary ::= literal
literal ::= Boolean | integer | hexint | real | string
R = C = literal
B = E
T = L = Ls = Lt = NIL

The basic contents of a document.

invocation ::= id
R = R<valOf(id, E)>(E)
B = B<valOf(id, E)>(E)
where
valOf(id, E) = locVal(id, whereBound(id, E)) -- Gets innermost value
whereBound(id, E) = CASE -- Gets innermost binding
locBinding(id, E) ~= NONE => E
locBinding("OUTER", E) ~= NONE =>
whereBound(id, locVal("OUTER", E))
True => NULL

Both attributes and definitions are looked up in the current environment; depending on the current binding of id, this may produce values and/or bindings; if the binding's rhs was quoted, the expression is evaluated at the point of invocation.

When an id is referred to and locBinding(id, E)=NONE, then the value is sought recursively in locVal("OUTER"). The outermost environment, X, binds each id to the "universal" name which is the uppercase equivalent of id.

invocation ::= name "." id
R = R<valOf(id, R<name>(E))>(E)
B = B<valOf(id, R<name>(E))>(E)

Qualified names are treated as "nested" environments.

universal ::= ucID
R = C = ucID
B = E
T = L = Ls = Lt = NIL

Uppercase-only identifiers are presumed to be directly meaningful and are not looked up in the environment.

application ::= invocation "[" item* "]"
R = apply(invocation, R<item*>(E), E)
B = E
where
apply(invocation, value*, E) =
CASE R<invocation>(E) OF
"EQUAL" => value1 = value2
"GREATER" => value1 > value2
. . .
"SUBSCRIPT" => value1[value2] -- value1: sequence, value2: int
"CONTENTS" => "(" C<inner(value1)> ")"
"TAGS" => "(" T<inner(value1)> ")"
"LINKS" => "(" L<inner(value1)> ")"
"SOURCES" => "(" Ls<inner(value1)> ")"
"TARGETS" => "(" Lt<inner(value1)> ")"
ELSE => R<invocation>([[NULL | "OUTER" "=" E] | "Value" "=" value*])
inner("{" value* "}") = value*

If the invocation does not evaluate to one of the standard external function names, the current environment is augmented with a binding of the value of the argument list to the identifier Value, and the value is the result of the invocation in that environment; this allows function definition within the language.

selection ::= "(" term "|" item1* "|" item2* ")"
R = if R<term>(E) then R<item1*>(E) else R<item2*>(E)
B = if R<term>(E) then B<item1*>(E) else B<item2*>(E)

The notation for selections (conditionals) is borrowed from Algol 68:
( <test> | <true part> | <false part> )
This is consistent with our principles of using balanced brackets for compound constructions and avoiding syntactically reserved words; the true part and false part may each contain an arbitrary number of items (including none).

sequence ::= "(" item* ")"
R = C = "(" R<item*>(E) ")"
B = B<item*>(E)
T = L = Ls = Lt = NIL

Parentheses group a sequence of items as a single value; bindings in the sequence affect the environment of items to the right in the containing node, but labels are disallowed. Parentheses may also be used to override the right-to-left evaluation of arithmetic operators; an operand sequence must reduce to a single numeric value.

binding ::= name "←" rhs
R = NIL
B = localBind(name, R<rhs>(E), E)
where
localBind(id, value, E) = [E | id ← value]
localBind(id "." name, value, E) = [E | id ← localBind(name, value, valOf(id, E))]

This adds a single binding to E; bindings have no other "side effects" and no value.

binding ::= universal ":=" rhs
binding ::= name ":=" rhs
R = NIL
B = globalBind(name, R<rhs>(E), E)
where
globalBind(name, value, E) = if
locVal("OUTER", E)=NIL then localBind(name, value, E)
else [E | "OUTER" ← globalBind(name, value, locVal("OUTER", E))]

Each environment, E, initially contains only its "inherited" environment (bound to OUTER). Most bindings take place directly in E. To allow for "global" bindings, the value of a globalBind(name, R<rhs>(E), E) will change E by rebinding id in the outermost environment X (reached in the semantics by following the OUTER path from E until the outermost one is reached; if we started in a nodal environment, this will be X).

Note that a global binding to some variable b does not guarantee that using b in a rhs context will result in accessing the global b because a local binding to b may intervene.

Note that in a context such as [ | a := 7], the effect of the above semantics is the same as [ | a ← 7].

binding ::= name mode op term
= <name mode name op term>

This is just a convenient piece of syntactic sugar for the common case of updating a binding.

rhs ::= "'" item* "'"
R = item*

If the rhs of a binding is surrounded by single quotes, it will be evaluated in the environments where the name is invoked, rather than the environment in which the binding is made.

rhs ::= "[|" binding* "]"
R = [B<binding*>([NULL | "OUTER" "=" E]) | "OUTER" "=" NULL]

This creates a new environment value that may be used much like a record.

rhs ::= "[" invocation "|" binding* "]"
R =[B<binding*>([R<invocation>(E) | "OUTER" "=" E]) | "OUTER" "=" NULL]

This creates a new environment value that is an extension of an existing one.

tag ::= universal "$"
R = R<valOf(universal, E)>(E)
B = B<valOf(universal, E)>(E)
T = universal
C = L = Ls = Lt = NIL

This gives the containing node the property denoted by the universal and also invokes the universal in the outermost environment (if it is not bound there, NIL will be produced, which contributes nothing to R).

link ::= "LINKS" id
R = "LINKS" id
L = id
B = E
C = T = Ls = Lt = NIL

This defines the scope of the set of links whose "main" component is id.

A link N: on a node makes that node a "target" of the link N (and its prefixes); a reference ^N makes it a "source." The "main" identifier of a link must be declared (using LINKS id) at the root of a subtree containing all its sources and targets. The link represents a set of directed arcs, one from each of its sources to each of its targets. Multiple target labels make a node the target of multiple links. A target label that appears only on a single node places it in a singleton set, i.e., identifies it uniquely.

link ::= name "^"
R = name "^"
Ls = name
B = E
C = T = L = Lt = NIL

This identifies the containing node as a "source" of the link name.

link ::= name ":"
R = name ":"
Lt = prefixes(name)
B = E
C = T = L = Ls = NIL

where

prefixes(id) = id
prefixes(name "." id) = name "." id prefixes(name)

This identifies the containing node as a "target" of each of the links that is a prefix of name.

2.1.3.4. Discussion

2.1.3.5. Grammatical feature X Semantic function matrix

March 11, 1983 5:41 pm - from interscript-standard.tioga

2.1.3.4. Discussion

Each script is evaluated in the context of an initial environment, X, which can contain attributes global to all scripts, attributes that specify values for system-specific identifiers, and in which all global bindings are made.

If the right-hand side of a binding is surrounded by single quotes, it will be evaluated in the environment where the name is invoked, rather than the environment in which the binding is made.

Expressions involving the four infix ops (+, -, *, /) are evaluated right-to-left (a la APL); since we expect expressions to be short, we have not imposed precedence rules.

September 12, 1983 1:47 pm - from interscript-standard.tioga

3.1. Tags, types, and node invariants

TAG% ← {TAG$

attributes←{ -- here are the types and defaults of the attributes of a tag node

attributes%←{TYPE$

predicate%←'NodeInvariant' -- <<Explain this>>

default←{}}

requiredTags%←AtomList^ -- list of atoms giving other tags required

contentType%←{Type^| default←Any^} -- of type TYPE to specify kind of contents

nodeInvariant%←{Predicate^| default%←'1'}

hasMoreInv%←{Bool^| default𡤀}

tagOnly %← {Bool^| default𡤁}

reducesTo%←{Any^| default←NIL
predicate%←'A^=NIL OR TYPECODE(A^)=TypeCode.Quoted^'}}

contentType←None^ -- a TAG definition node has only attributes, no contents

requiredTags←{} -- no other tags are required on a TAG node

nodeInvariant%←'1' -- a TAG definition's invariant is always True

hasMoreInv𡤀 -- and there isn't any other "outside" invariant

reducesTo←NIL

tagOnly ← 1} -- the invariant of any node containing a tag node is not dependent on the tag node's internals