[_CD6_]<interscript>DraftStd>Interscript-Intro.tioga!2

Release as [Indigo]<Interscript>Std>Interscript-Intro.tioga, .press
Draft [Indigo]<Interscript>DraftStd>Interscript-Intro.tioga, .press
Last edited By Mitchell on February 28, 1983 10:42 pm

LIMITED DISTRIBUTION: FOR XEROX INTERNAL USE

Introduction to Interscript

Version 1.0/March, 1983

Abstract

XEROX
PALO ALTO RESEARCH CENTER
COMPUTER SCIENCE LABORATORY
3333 Coyote Hill Road / Palo Alto / California 94304

Contents

1. Introduction

2. Explanation

3. How to achieve various effects

Appendix A: Glossary

1. Introduction

Interscript provides a means of representing editable documents. This representation is independent of any particular editor and can therefore be used to interchange documents among editors.

The basis of Interscript is a language for expressing editable documents as scripts. Scripts are created by computer programs (usually an editor or associated program); scripts are "compiled" by programs to produce whatever private format a particular editor uses to represent documents.

1.1. Rationale for an interchange standard

As office systems proliferate, being able to interchange documents among different editing systems is becoming more and more important. Customers need document compatibility to avoid being trapped in evolutionary cul-de-sacs and having to pay the awful price of converting documents from one product's format to another's (even within one company's product line sometimes).

Now, an editing program typically uses a private, highly-encoded representation for documents in order to meet goals of performance and functionality. Generally, this means that different editors use different, incompatible private formats, and the user can conveniently edit a document only with the editor used to create it. This problem can be solved by providing programs to convert between one editor's private (or file) format and another's. However, a set of different editors with N different document representations requires N(N-1) conversion routines to be able to convert directly from each format to every other.

This N(N-1) problem can be reduced to 2(N-1) by noticing that we could write N-1 conversion routines to go from F1 (format for editor1) to F2,. . .,FN, and another N-1 routines to convert from F2,. . .,FN to F1. Except when converting from or to F1, this scheme requires two conversions to go from Fi to Fj (j`i); this is a minor drawback. Choosing which editor should be editor1 is a more critical issue, however, since the capabilities of that editor will determine how general a class of documents can be interchanged among the editors.

This presents a truly difficult problem in the case that there is no single functionally dominant editor. If the pivotal editor1 doesn't incorporate all of the structures, formats, and content types used by all of the others, then it will not be possible to faithfully convert documents containing them. Even if we had a single editor that was functionally dominant, it would place an upper bound on the functionality of all future compatible editors. Since there are no actual candidates for a totally dominant editor, we have chosen instead to examine in general what information editors need and how that information can be organized to represent general documents.

Since we are not proposing an editor, we do not need to design a private format for its documents; we only need an external representation that is capable of conveying the content, form, and structure of editable documents. That external representation has only one purpose: to enable the interchange of documents among different editors. It must be easy to convert between real editors' formats and this interchange encoding.

Using a standard interchange encoding has the additional advantage that much of the input and output conversion algorithms will be common to all conforming editors. For example, when a new version of an existing editor is released, the only differences in the new version's conversion routines will be in the areas in which its internal document format has changed from its previous form; this represents a significant saving of programming.

1.2. Properties that any interchange standard must have

An interchange encoding for editable documents must satisfy a number of constraints. Among these are the following:

1.2.1. Universal character set

Scripts must be encoded using the graphic (printable) subset of the ISO 646 printing character set. As well as the obvious rationale that these characters are guaranteed not to have control significance to any devices meeting the ISO standard, it has the additional advantage that a script is humanly readable.

1.2.2. Encoding efficiency

Since editable documents may be stored as scripts, may be transmitted over a network, and must certainly be processed to convert them to various editors' private formats, it is important that the encoding be reasonably space-efficient.

Similarly, the time cost of converting between interchange encoding and private formats must be reasonably low, since it will have a significant effect on how useful the interchange standard is. (If the overheads were small enough, an editor might not even use a private file format for document storage.)

1.2.3. Open-ended representation

Scripts must be capable of describing virtually all editable documents, including those containing formatted text, synthetic graphics, scanned images, etc., and mixtures of these various modes. Nor may the standard foreclose future options for documents that exploit additional media (e.g., audio) or require rich structures (e.g., VLSI circuit diagrams, database views). For the same reasons, the standard must not be tied to particular hardware or to a file format: documents will be stored and transmitted using a variety of media; it would be folly to tie the representation to any particular medium.

1.2.4. Document content and form

The complete description of a document component usually requires more than an enumeration of its explicit contents; e.g., paragraphs have margins, leading between lines, default fonts, etc. Scripts must record the association between attributes (e.g., margins) and pieces of content.

Both the contents and attributes of typical documents require a rich value space containing scalar numbers, strings, vectors, and record-like constructs in order to describe items as varied as distances, text, coefficients of curves, graphical constraints, digital audio, scanned images, transistors, etc.

1.2.5. Document structure

Many documents have hierarchical structure; e.g., a book is made of chapters containing sections, each of which is a sequence of paragraphs; a figure is embedded in a frame on a page and in turn contains a textual caption and imbedded graphics; and the description of an integrated circuit has levels corresponding to modular or repeated subcircuits. The standard should exploit such structure, without imposing any particular hierarchy on all documents.

Hierarchy is not sufficient, however. Parts of documents must often be related in other ways; e.g., graphics components must often be related geometrically, which may defy hierarchical structuring, and it must be possible to indicate a reference from some part of a document to a figure, footnote, or section in way a that cuts across the dominant hierarchy of the document (section 1.6.4).

Documents often contain structure in the form of indirection. For instance, a set of paragraphs may all have a common "style," which must be referred to indirectly so that changing the style alone is sufficient to change the characteristics of all the paragraphs using it. Or a document may be incorporated "by reference" as a part of more than one document and may need to "inherit" many of its properties from the document into which it is being incorporated at a given time.

1.2.6. Transcription fidelity

It must be possible to convert any document from any editor's private format to a script and reconvert it back to the same editor's private format with no observable effect on the document's content, form, or structure. This characteristic is called transcription fidelity, and is a sine qua non for an interchange encoding; if it is not possible to accomplish this, the interchange encoding or the conversion routines (or both) must be defective.

1.2.7. Script comprehension

Even complicated documents have simple pieces. A simple editor should be able to display parts of documents that it is capable of displaying, even in the presence of parts that it cannot. More precisely, an editor must, in the course of internalizing a script (converting it from a script to its private, editable format), be able to discover all the information necessary to recognize and to display the parts that it understands. This must work despite the fact that different editors may well use different data structures to represent the content, form, and structure of a document.

At a minimum, this requires that a script contain information by which an editor can easily determine whether or not it understands a component well enough to display or edit it, and that it be able to interpret the effect that components which it does not understand have on the ones it does. For example, if an editor does not understand figures, it should still be possible for it to display their embedded textual captions correctly, even though a figure might well dictate some of its caption's content or attributes such as margins, font, etc.

This constraint requires that an interchange encoding must have a simple syntax and semantics that can be interpreted readily, even by low-capability editors. Along with the desire for openendedness (section 1.2.3), this suggests a language with some form of "extension by definition" built around a small core.

1.2.8. Regeneration

Processing a script to internalize it correctly is only half the problem. It is equally important that an editor, in externalizing a script from its private document format be able to regenerate the content, form, and structure carried by the script from which the document originally came. In particular, when regenerating a script from an edited document, it should be possible to retain the structure in parts of the original script that were not affected by editing operations. For example, an editor that understands text but not figures should be able to edit the text in a document (although editing a caption may be unsafe without understanding figures) while faithfully retaining and then regenerating the figures when externalizing it.

This problem is much less severe when an editor is transcribing a document that it "understands" completely, e.g., because the entire document was generated using that editor.

1.3. What the Interscript standard does not do

There are a number of issues that the Interscript standard specifically does not discuss. Each of these issues is important in its own right, but is separable from the design of an interchange representation

1.3.1. Interscript is not a file format

The interchange encoding of a script is a sequence of ASCII/ISO 646 characters. The standard is not concerned with how that representation is held in files on various media (floppy disks, hard disks, tapes, etc.), or with how it is transmitted over communications media (Ethernet, telephone lines, etc.).

1.3.2. Interscript is not a standard for editing

A script is not intended as a directly editable representation. It is not part of its function to make editing of various constructs easier, more efficient, or more compact: those are the purview of editors and their associated private document formats. A script is intended to be internalized before being edited. This might be done by the editor, by a utility program on the editing workstation, or by a completely separate service.

1.3.3. Combining documents is not an interchange function

This exclusion is really a corollary of the statement, "A script is not intended as a directly editable representation." In general, it is no easier to "glue" two arbitrary documents together than it is to edit them.

1.3.4. Interscript does not overlap with other standards

There are a number of standards issues that are closely related to the representation of editable documents, but which are not part of the Interscript standard because they are also closely related to other standards. For example, the issues of specifying encodings for characters in documents, how fonts should be named or described, or how the printing of documents should be specified (i.e., Interpress) are not part of this work.

1.4.4. Features of the Base Language

1.4.4.1 Values

Expressions in a script may denote

Literal values of primitive types

Booleans: F, T

Integers: . . . —3, —2, —1, 0, 1, 2, 3, . . .

Reals: 1.2E5, . . .

Strings: <this is a string>

Universal names: TEXT, XEROX, PARAGRAPH

Structured values

Spans, which can also be used as "records" or "vectors" of values

Generic operations

Invocations

Applications

Selections

Operations specific to particular types

Arithmetic

Comparison

Logical

Subscript

. . .

Bindings

Labels

Tags

Targets

Sources

Link introductions

Expressions to be evaluated at the point of invocation

1.4.4.2 Environments and Attributes

Environments bind attribute identifiers to values (or expressions denoting values), in various modes:

"←" denotes a local binding, which may be freely superseded,

":=" denotes a global binding, which creates or modifies an attribute in the outermost environment.

NULL denotes the "empty" environment, containing bindings for no attributes. The (implicit) outermost environment binds each identifier id to the corresponding universal name ID (written with all capital letters).

Each piece of content in a document has its own environment. Editors will use relevant attributes from that environment to control its form.

Attributes may also be used in scripts for two structuring purposes:

abbreviation: an identifier may be bound to a quoted expression; within the scope of the binding, the use of the identifier is equivalent to the use of the full expression;

indirection: reference through an identifier permits information (such as styles) to be defined in one place and shared throughout its scope; this is an example of structure (which must be preserved) in the form of a document.

1.4.4.3 Inheritance

The dominant hierarchy of a document is represented by grouping its pieces within spans, which are the most obvious form of content structuring. They also control the scope of bindings.

The environment of a span is initially inherited from its containing span (except for the outermost span, which inherits it from the editor), and may be modified by bindings. A binding takes effect at the point where it appears, and its scope extends to the end of the innermost span containing it, with two exceptions:

any binding except a definition may be superseded by a (textually) later binding (if the later binding is in a nested span, the outer binding's scope will resume at the end of the inner span), and

a global binding extends over the all of the document lexically to the right of the binding.

Attributes are inherited only via environments following the dominant structure. Thus the choice of a dominant structure to represent scripts from a particular editor will be strongly influenced by expectations about inheritance.

Attributes are "relevant" to a span if they are assumed by any of its tags. In general, a span's environment will also contain bindings for many "latent" attributes that are either relevant to its ancestors (and inherited by default) or are potentially relevant to its descendants.

The interior of each span is implicitly prefixed by Sub, which will generally be bound in the containing environment to a quoted expression performing some bindings, applying some labels, and/or supplying some initial content.

1.4.4.4 Expressions

Expressions involving the four infix operators (+, —, *, /) are evaluated right-to-left (a la APL); since we expect expressions to be short, we have not imposed precedence rules.

Parentheses are used to delimit vector values. Square brackets are used to delimit the argument list of an operator application and to denote environment constructors, which behave much like records.

The notation for selections (conditionals) follows Algol 68:

( <test> | <true part> | <false part> )

This is consistent with our principles of using balanced brackets for compound constructions and avoiding syntactically reserved words; the true part and false part may each contain an arbitrary number of items (including none).

1.4.4.5 Tags and Links

A tag is written as a universal name followed by $''. A tag, U, labels a span that contains it with its associated properties and also invokes the component of the outermost environment X with the name U. Tags are either present in a span or absent, whereas attributes have values that apply throughout a scope.

Layer 2 of the standard will be primarily concerned with the definition of a (small) set of standard properties that are expected to be shared among all conforming editors. For each standard property, it will describe

the associated tag that denotes it,

the assumptions it implies about the contents (values that must/may be present and their intended intepretation, invariant relations that are to be maintained, etc.),

the assumptions it makes about the environment (attributes that must be present and their intended intepretation).

Links enable a script to model associations that cut across its dominant structure: a link set denotes a set of directed arcs from each of its source spans to all its target spans. There are several ways this facility can be used:

(ST) A link set with a single source span and a single target span models a simple reference from one span in a document to another.

(S*T) For a link set with a single target span and multiple source spans, each source span can be viewed as "pointing to" that target span.

(ST*) The symmetrical extreme case of a single source span and multiple target spans corresponds closely to an entry in an index, which refers to all the places where some term is used (section 1.6 contains an example).

(S*T*) Finally, multiple source and target spans in a link set can be used for all the cross references within a document of the form "see sections 1.6, 1.7, 2.3".

To use links, a script must declare the "main" identifier of a link set ("LINKS" id) at the root of a subtree containing all its sources and targets, and textually preceding them. Once this main identifier has been introduced, spans can be labelled as sources for subsets of this linkset. For example, the label "id.a.b:" would make a span a target for source spans containing references of the sort "^id", "^id.a", or "^id.a.b".

1.4.5. Script comprehension

The Interscript standard applies to interchange among editors with widely varying capabilities. It will be important to define some structure to the space of possible scripts, just as Interpress has for printable documents. Dimensions in which we foresee reasonable variations in script comprehension are:

Abbreviations: only editor-supplied defined in document.

Dominant structure: single-layer arbitrary.

Other structure: no links or indirections links and indirections preserved.

Bindings: Local only and global (:=).

Selection: No conditionals conditionals.

Numbers: Integers only floating point.

See section 2.4 for further details.

1.4.6. Internalizing a Script

The private representations of low-capability editors are not generally adequate to provide a full-fidelity internalization of every script produced by a high-capability editor. Thus, when internalizing a script, some information may not be viewable or editable. The Interscript language has been designed to simplify value-faithful internalization, even if structure is lost, and content-faithful internalization, even if form is lostor the conversion of form to additional content to allow it to be examined (and perhaps even edited) by a low capability-editor. The standard provides some simple conditions under which a low-capability editor can safely modify parts of a document that it understands fully, without thereby destroying the value or structure of parts that it is not prepared to deal with.

A script may be internalized into an editor's (private or file) representation as follows:

Parse the entire script from left to right.

As each literal is encountered in the script, convert it to the editor's representation.

As each abbreviation (free-standing invocation) is encountered in the script, replace it with the value to which it is bound in the environment.

As each structure is recognized in the script, represent the corresponding structure in the editor's representation, if possible; if not, use the semantics of Interscript to compute the value to be internalized.

Update the environment whenever a binding is encountered or a scope is exited, according to the semantics of Interscript.

Transfer the values of all attributes relevant to each piece of content from the current environment to the editor's representation, if possible; if not, apply an invertible function to convert the attribute-value binding into additional content.

Determine the properties of each span from its tags; this list will be complete at the end of the span. A span is viewable if any of its tags denotes a property in the set of those the editor is prepared to display; it is understood if they are all in the set of those the editor is prepared to edit.

Record the sources and targets of all links; for any link, these lists will be complete at the end of the span in which its main identifier was introduced. Translate each link to the corresponding editor structure, according to the properties of the span that introduces it.

Of course, any process yielding an equivalent result is equally acceptable.

1.5. Introduction to the Interscript Base Language

This section is intended to lead the reader through a set of examples, to show what the language looks like and how it is used to represent a number of commonly occurring features of editable documents. The examples purposely use rather long identifiers and lots of white space to make them more readable. In actual use, programs, not people, will generate and read scripts; names will tend to be short; and logically unneeded spaces and carriage returns will tend to be omitted.

1.5.1. Simple text as a document

The following script defines a document consisting of the string "The text of the main span of example 1.5.1"; no font, paragraph structure, or formatting information is supplied. This example will gradually be expanded to represent accurately figure 1.5.1, below. The numbers at the left margin do not form part of the script; they are used to refer to the various lines in the discussion below.

0 INTERSCRIPT/INTERCHANGE/1.0
1 {<The text of the main span of example 1.5.1>}
2 ENDSCRIPT

Line 0 is the header denoting version 1.0 of the interchange encoding. Line 1 is the entire body of this script: it contains a single span enclosed in {} which in turn contains a single string value enclosed in <>. Line 2, with the keyword "ENDSCRIPT" marks the end of script.

The text of the main span of example 1.5.1

The text of the first subspan of example 1.5.1

Example 1.5.1: A simple document

The next version of the example adds the tag, TEXT$ to the span. The identifier TEXT is called a universal name (or atom), which is indicated by its being composed of all uppercase letters. Universal names have no definition within the base language (they are expected to be defined in Layers 2 and 3).

0 INTERSCRIPT/INTERCHANGE/1.0
1 {TEXT$
2 <The text of the main span of example 1.5.1>
3 }
4 ENDSCRIPT

A tag is denoted by placing "$" after a universal name. A span's tags are strictly local (they are not inherited by other spans in the script) and serve as "type information" about the span. The tag TEXT$ labels this span as one that can be viewed as textual data. Tags can also create implicit indirections; see section 1.6.5.

0 INTERSCRIPT/INTERCHANGE/1.0
1 {PARAGRAPH$
2 leftMargin𡤃.25*inch rightMargin𡤅.0*inch
3 <The text of the main span of example 1.5.1>
4 }
5 ENDSCRIPT

This example shows how auxiliary information, such as margins, may be associated with a span of a script. The binding leftMargin𡤃.25*inch adds the attribute leftMargin to the span's environment and binds the value of the expression 3.25*inch to it (inch is a value whose dimensions are inches/meters; meters are the standard Interscript units of distance). The bindings to leftMargin and rightMargin convey the fact that this span has margins for display. To denote the change in character of the span, we have tagged it as PARAGRAPH instead of TEXT. Figure 1.5.1 uses these margins for its first line of text.

0 INTERSCRIPT/INTERCHANGE/1.0
1 {PARAGRAPH$
2 leftMargin𡤃.25*inch rightMargin𡤅.0*inch
3 <The text of the main span of example 1.5.1>
4 {PARAGRAPH$ leftMargin←+0.5*inch
5 <The text of the first subspan of example 1.5.1>
6 }
7 }
8 ENDSCRIPT

We have further elaborated the example by nesting another text span in the primary one, with its text following the primary span's text and with an indented leftMargin. The binding leftMargin←+0.5*inch is a contraction of leftMargin←leftMargin+0.5*inch. The right side of the binding is evaluated, and since there is as yet no binding in the inner span's (lines 46) environment for leftMargin, it is looked up in the environment of the containing span (lines 13). The value of the right hand side expression is thus 3.75*inch. This value is then bound to the identifier leftMargin in the inner span's environment. Since no value is bound to rightMargin in the inner span's environment, it will have the same rightMargin as its parent span.

0 INTERSCRIPT/INTERCHANGE/1.0
1 p ← 'PARAGRAPH$ leftMargin𡤃.25*inch rightMargin𡤆.0*inch'
2 {p rightMargin𡤅.0*inch
3 <The text of the main span of example 1.5.1>
4 {p leftMargin←+0.5*inch
5 <The text of the first subspan of example 1.5.1>
6 }
7 }
8 ENDSCRIPT

One can also define an abbreviation by binding a sequence of unevaluated expressions to an identifier and subsequently using the identifier to cause those expressions to be evaluated at the point of invocation. This example binds the quoted expression 'PARAGRAPH$leftMargin𡤃.25*inchrightMargin𡤆.0*inch' to the identifier p. When p is invoked in lines 2 and 4, the quoted expression replaces the invocation and is evaluated there.

Invoking p places the tag PARAGRAPH$ on the span, sets the leftMargin to 3.25*inch and the rightMargin to 6.0*inch. In line 2, the rightMargin is then rebound to 5.0*inch, overriding the default binding created by invoking p. Similarly, the binding for leftMargin in line 4 overrides the one resulting from invoking p, resulting in its leftMargin being 3.75*inch and its rightMargin being 6.0*inch.

An identifier can also be bound to an environment value as a convenient record-like manner of naming a set of related bindings. For example, a font might be defined as follows (a more complete definition is given later in section 1.6.3):

font ← [ | family←TIMES size�*pt face←[ | weight←NORMAL style←ROMAN slant←NIL] ]

This defines font to be the environment formed by taking the empty or NULL environment and altering it according to the series of bindings following the initial "[ |." In this case font is an environment having bindings for three attributes, family, size, and face. face is itself bound to an environment (with attributes weight, style, and slant). The set of default bindings in font specify a normal weight (non-bold), non-italic Times Roman 10-point font.

We can incorporate this font definition in the example and then use it to indicate that the word "first" in the subspan should be in italics:

0 INTERSCRIPT/INTERCHANGE/1.0
1 p ← 'PARAGRAPH$ leftMargin𡤃.25*inch rightMargin𡤆.0*inch'
2 font ← [ | family←Times size�*pt face←[ | weight←NORMAL style←ROMAN slant←NIL] ]
3 {p rightMargin𡤅.0*inch
4 <The text of the main span of example 1.5.1>
5 {p leftMargin←+.5*inch
6 <The text of the >
7 font.face.slant←ITALIC <first> font.face.slant←NIL
8 < subspan of example 1.5.1>
9 }
10 }
11 ENDSCRIPT

Bindings affect span contents to their right: so, "first" will be italic, while "subspan of example 1.5.1" will be non-italic due to the binding immediately preceding it. If we expected to switch between italics and non-italics frequently, it might be profitable to introduce abbreviations to shorten what must appear. For example, in the scope of the definition

l ← [ | i ← 'font.face.slant←ITALIC' nI ← 'font.face.slant←NIL']

line 7 could be abbreviated

l.i<first>l.nI

1.6. Further Examples

This section gives some more realistic examples of the use of the Interscript language and explores the issues of making sets of standard definitions for use in scripts.

1.6.1. A Laurel Message

Here is a possible Interscript transcription of a Laurel message:

0 INTERSCRIPT/INTERCHANGE/1.0 -- standard heading --
1 {LAURELMSG$ -- tag for a Laurel document --
2 Sub ← 'PARAGRAPH$ leftMargin𡤁.0*inch rightMargin𡤇.5*inch' --standard span prelude for spans below--
3 justified𡤏
4 font.family←TIMES font.size�
5 leading.x𡤁
6 leading.y𡤁 -- overridable default leadings --
7 LINKS heading -- declare main identifier of link set --
8 laurelInfo ← -- Laurel information for easy access --
9 (^Heading.time ^Heading.from ^Heading.subject ^Heading.to ^Heading.cc)
10 {<Date: > {Heading.time: <18 June 1981 9:18 am PDT (Thursday)>}
11 <From: > {Heading.from: <Mitchell.PA> AUTHENTICATED$}
12 <Subject: > {Heading.subject: <A Sample Document Syntax>}
13 <To: > {Heading.to: <Horning.PA>}
14 <cc: > {Heading.cc: <Mitchell, Interscript.PA>}}
15 leading.y𡤆 -- override outer y leading --
16 {<text of paragraph1>} -- span which is a paragraph --
17 {<text of paragraph2>}
18 {<text of paragraph3>}
19 } ENDSCRIPT

Line 1 tags this document (by tagging its root span) as a Laurel message, and line 2 tags its subspans (starting on lines 10, 16, 17, and 18) as paragraphs with default margins. Lines 36 bind some other attributes, likely to be relevant to paragraphs. Line 7 declares the main link identifier heading, and lines 89 bind to laurelInfo a vector of source links whose targets are the parts of the document of interest for mail transport. Lines 1014 have similar structures: each consists of a string followed by a span containing a target link for the label heading and text for that Laurel "field." Line 11 is additionally tagged as AUTHENTICATED. Lines 1618 contain paragraphs constituting the body of the message.

Alternatively, the external environment might well contain a definition of laurel60 that establishes a suitable environment for a Laurel 6.0 document:

1 laurel60 ← '
2 LINKS time LINKS from LINKS subject LINKS to LINKS bodySpans LINKS cc
3 LAURELMSG$
4 cr ← <#13#> tab ← <#9#>
5 p ← 'PARAGRAPH$ leftMargin𡤁.0*inch rightMargin𡤇.5*inch'
6 justified𡤏
7 font.family ← TIMES font.size ← 10
8 margins.left� margins.right�
9 leading.x𡤁 leading.y𡤁 -- overridable default leadings --
10 printForm ←
11 '{p <Date: > ^time tab
12 <From: > ^from cr
13 <Subject: > ^subject cr
14 <To: > ^to
15 leading.y𡤆
16 ^bodySpans
17 <cc: > ^cc
18 }'
19 heading ← 'LAURELHEADING$ Sub←'TEXT$ LAURELFIELD$' '
20 body ← 'Sub←'p bodySpans:' '
21 '

One advantage of using source labels for the "bodies" of the To:, From:, etc. fields (lines 1114, 17) is that they can represent sets of spans as well as single spans.

Now the Laurel document would be described by the following script:

22 INTERSCRIPT/INTERCHANGE/1.0 -- standard heading --
23 {laurel60% -- invoke Laurel 6.0 definitions
24 {heading% -- invoke heading style --
25 {time: <18 June 1981 9:18 am PDT (Thursday)>}
26 {from: AUTHENTICATED$ <Mitchell.PA>}
27 {subject: <A Sample Document Syntax>}
28 {to: <Horning.PA>}
29 {cc: <Mitchell, Interscript.PA>}
30 }
31 {body% -- Invoke body style --
32 {<text of paragraph1>}
33 {<text of paragraph2>}
34 {<text of paragraph3>}
35 }
36 } ENDSCRIPT

Invoking laurel60 in line 23 introduces the quoted expressions heading and body into the root span's environment, tags it as LAURELMSG and declares the labels time, from, etc. It also acquires a definition for a print form, which could be used to format the message for sending to a printer. The "%" (indirection) operator indicates that this is intentional structure, to be preserved by each internalization, rather than merely an abbreviation. Thus the message heading and body should "see" the effects of any future changes made to laurel60, by editing its definition. By contrast, p is used as an abbreviation; when the script is rendered, its value may safely be copied at each use.

Look at the definition of heading (line 19): the right side is a quoted expression sequence. The first expression of the sequence produces the tag LAURELHEADING$ and the second binds the quoted expression 'TEXT$ LAURELFIELD$' to Sub. As a result, each subspan of the one beginning on line 24 will be initialized by invoking Sub implicitly from its containing span, which gives each the tags TEXT$ and LAURELFIELD$.

Similarly, the definition of body (line 20) defines Sub, and the spans on lines 3234 will be initialized by invoking p and having the target link bodySpans placed on it. Labelling the set of body spans this way means that the source link, ^bodySpans, in printForm (line 19) denotes the entire sequence of body spans, in left-to-right depth-first tree order.

1.6.2. A page of a Star document

This example is taken from page 71 of the Star Functional Specification and shows one page of a paginated document with a diagram and a footnote (we recommend that you have that page in front of you when analyzing this transcription):

-- pages 1 .. 6 supposedly precede this one --
{pg.a7:
Sub←'PARAGRAPH$'
{<Many of these conclusions are based on prior experience>

{fn.n1: -- just a unique label: fn: introduced somewhere earlier --
FOOTNOTE$
<See the 1970 report titled "Organizational Changes and Sales Margin" and other documents referenced in that document. Further reports are available if you need them.>
}

< which has shown our techniques to be valid. Other data can be collected by future changes to your accounting and billing packages, which will allow us to perform even better analyses and lead to better problem discovery and correction.>
}

{<The results of the sales analysis suggest that certain organizational changes can improve the overall efficiency of the operation. The March figures, in particular, bear this out. You will note below a suggested change that we feel will correct the problems noted in the analysis above.>
}

Sub←'FRAME$' -- change to subspan tag FRAME --

{Alignment.horizonally𡤏lushLeft Alignment.vertically𡤏loating

height𡤂.8*inch width𡤃.67*inch
edges.expandingRightEdge←T
border𡤍ots1
-- change to default subspan environment Rectangle with solid, double width outline --
Sub←'RECTANGLE$ lineType.width𡤂 lineType.style←solid Sub←'Title''
LINKS rect -- declare label class to be used below --
{rect.a1: UpperLeft←(.0254 .07) shading𡤇 height←.01 width←.027 {<Headquarters>} }

{rect.a2: UpperLeft←(.073 .015) height←.01 width←.018 {<Staff Support>} }

height←.013 -- attribute value shared by following subspans

{rect.a3: UpperLeft←(.02 .03) width←.025 {<Development>} }

{rect.a4: UpperLeft←(.02 .03) width←.028 {<Manufacturing>} }

{rect.a5: UpperLeft←(.042 .055) width←.016 {<West Coast>} }

{rect.a6: UpperLeft←(.067 .055) width←.016 {<East Coast>} }

-- default subspan environment is LINE with solid, double width outline --
Sub←'LINE lineType.width𡤂 lineType.style←solid'
LINKS ln

{ln.out1: ^rect.a1 ^ln.in34}
{ln.out2: ^rect.a2 ^ln.out1}
{ln.in3: ^ln.in34 ^rect.a3}
{ln.in4: ^ln.in34 ^rect.a4}
{ln.in34: ^ln.in3 ^ln.in4}
{ln.out4: ^rect.a4 ^ln.in56}
{ln.in56: ^ln.in5 ^ln.in6}
{ln.in5: ^ln.in56 ^rect.a5}
{ln.in6: ^ln.in56 ^rect.a6}

} -- end of Frame1 --
Sub←'PARAGRAPH$' -- restore default subspan initialization to PARAGRAPH --
{<The process of switching to this new organization will not be an easy one. However, the reports seem to suggest many reasons why it should not be postponed. In particular, the separation of Manufacturing from Development should have significant impact.>}

{<Also, we feel strongly that merging East and West Coast Development will help. As we have suggested in past reports, there has always been considerable replication of effort due to this geographic separation. You will recall the events leading up to the initial contract with our firm.>}

} -- end of page --

1.6.3. Some Star property sheets

Here a few of the definitions invoked in the above example (these were derived from page 148 of the Star Functional Specification). Some of them simply give default values for various attributes; some, like default.font, define a collection of related attributes as an environment; and most are quoted expression sequences for providing abbreviations or "decorating" spans with tags and their environments with relevant attributes.

1.6.3.1. Font-related defaults and definitions

baseline𡤀 -- the base line for characters --

underlined𡤏 -- whether or not text in span is to be underlined --

strikeOut𡤏 -- whether or not text in span is to have strike-out line through it --

-- there is no rhyme and little reason behind the names of type fonts. The following definition is intended to provide enough choice, using standard "terms" to name any existing font in an arbitrary font catalog (of course, it doesn't, but perhaps it is close enough) --
default.font ← [ | -- Definition --
family←Times -- a font family name --
face←[ | -- Definition --
weight←NORMAL -- In (EXTRALIGHT, LIGHT, BOOK, NORMAL, MEDIUM,
DEMIBOLD, SEMIBOLD, BOLD, EXTRABOLD, ULTRABOLD,
HEAVY, EXTRAHEAVY, BLACK, GROTESQUE) --
lineType←SOLID -- In (SOLID, INLINE, OPEN, OUTLINE, DISPLAY, SHADED) --
proportions←NORMAL -- In (NORMAL, CONDENSED, EXPANDED, EXTENDED,
WIDE, BROAD, ELONGATED) --
style←ROMAN -- In (ROMAN, GOTHIC, EGYPTIAN, CURSIVE, SCRIPT) --
slant←NIL -- In (NIL, ITALIC, OBLIQUE) --
swash𡤏 -- T => use swash capitals --
lowercase←T -- T => use lowercase letters --
uppercase←T -- T => use uppercase letters --
smallCaps𡤏 -- T => use small capitals --
]
size�*pt -- distance --
]

-- some useful font shorthands: --
Helvetica ← 'font ← [default.font% | family←HELVETICA]'
Italic ← 'font.face.slant←ITALIC'
Bold ← 'font.face.weight𡤋OLD'
Helvetica10BI ← 'Helvetica font.size�*pt Bold Italic'

1.6.3.2. Footnote-related definitions

fnCount:=0 -- global variable for counting footnotes
FOOTNOTE ← 'fnCount:=+1 font.size𡤈*pt FootnoteRef%'

FootnoteRef ← '{FOOTREF$ baseline←+5*pt fnCount}' -- raise 5 pts --

1.6.3.3. Paragraph-related definitions

Tab ← [ |
position𡤀
type←LEFT -- In (LEFT, CENTERED, RIGHT, DECIMAL) --
]

MakeTabs ← 'n𡤀 tabs←(RecursiveMakeTab[Value])'
RecursiveMakeTab ← '(EQ[Value 0] | NIL | n←+.25*inch [Tab | position←n ] RecursiveMakeTab[Value-1])'

Default.PARAGRAPH ← 'Indent ← [ | Left𡤀.0 Right𡤀.0] -- distance --
Alignment𡤏LUSHLEFT -- In (FLUSHLEFT, FLUSHRIGHT, BOTH, CENTERED) --
Justified𡤏
leading←[leading | between𡤁*pt above�*pt below𡤀]
charStyle←[|
Normal←'font�ult.font'
Emphasis1←'font�ult.font Italic'
Emphasis2←'font�ult.font Bold'
]
Hyphenation𡤏
KeepOn←NIL -- In (NIL, SamePageAsNextParagraph) --
MakeTabs[8] -- binds tabs to a sequence of 8 tabs (0, .25 inch, .50 inch, . . .) --
charStyle.Normal -- initializes to normal style

1.6.3.4. frame, rectangle, and line definitions

Def.UpperLeft ← 'UpperLeft←(0.0 0.0)' -- Def is just a convenient place to put useful auxiliary definitions --

Def.lineType ← '
lineType←[ |
Visible←T
Width𡤁
Style←SOLID] -- IN (SOLID, DOT, DASH, DOTDASH, DOUBLE, . . .) --
'

Def.Shading ← 'Shading𡤀'

Def.Box ← 'Def.UpperLeft Def.lineType Def.Shading'

Frame ← 'FRAME$ Def.Box'

Rectangle ← 'RECTANGLE$ Def.Box
Constraint←MagnifyOnly -- IN (NIL MagnifyOnly) --
'

Def.LineEnd ← '
LineEnd←(LeftUpper𡤏lush RightLower𡤏lush) -- IN (Flush Round Square arrow1 arrow2 arrow3) --
'

Line ← 'LINE$ constraint𡤏ixedAngle Def.lineType Def.LineEnd'

Title ← 'CAPTION$ Paragraph'

1.6.4. Using links

Links are intended to provide the means for associating spans in non-hierarchical ways. They can be used for referring to figures, examples, tables, etc., for describing tables of contents, for denoting index items, keeping lists, etc.

1.6.4.1. References to figures

The following outlines how the labelling facilities and global bindings can be used to generate references to (source links for) a figure whose number may not be known at the point of reference. The identifier n5 is assumed to have been generated by the program that produced the script and is assumed to be unique over the target labels with naming prefix "figures." in the script.

LINKS figures figCount:= 0 -- should appear in a script's root span --
makeFigureNum ← 'HIDDEN$ figCount:=+1 figCount'

{. . . ^figures.n5 . . .} -- ref to span with label figures.n5: --

{ . . . {figures.n5: makeFigureNum} . . .} -- a hidden span holding the figure number --

The span in which the figure number for figure n5 is defined contains a tag, HIDDEN$, which means that the span is not to be considered a part of the dominant structure for display purposes even though it is part of it. The span's sole content is the value of figCount after it has been incremented by 1. Because figCount is bound with ":=", the scope of the binding is global.

1.6.4.2. Collections of index items

Assume that the word "diarchy" is to be considered an index item in certain places where it occurs in a document. The link class Indexable should be introduced at the root of the document, and each to-be-indexed occurrence of "diarchy" in a string, e.g., <When a diarchy is established, it . . .>, should be replaced by the sequence <When a > diarchy% < is established, it . . .>. Somewhere in the script within the scope of the declaration of Indexable, at the root of a subtree containing all the uses of diarchy should be the following definition:

diarchy ← '{HIDDEN$ indexable.diarchy: pageNumber} <diarchy>'

Invoking diarchy results in the appearance of a hidden span containing the current page number (assumed to be held in the attribute pageNumber) and labelled as being in the set of target links indexable and indexable.diarchy. The index for the document might then contain the following entry for "diarchy":

{INDEXENTRY$ <diarchy> ^indexable.diarchy}

This entry contains the minimal information needed to generate the sequence of page numbers corresponding to indexable occurrences of diarchy. If some occurrences are considered primary and some secondary, then these mechanisms can be generalized to have diarchy defined as

diarchy ← [ | primary ← '{HIDDEN$ indexable.diarchy.primary: pageNum} <diarchy>'
secondary ← '{HIDDEN$ indexable.diarchy.secondary: pageNum} <diarchy>']

Primary references are denoted in the script as diarchy.primary% and secondary ones as diarchy.secondary%. Similarly, the index entry takes the form:

{INDEXENTRY$ <diarchy> ^indexable.diarchy.primary ^indexable.diarchy.secondary}

1.6.5. Using indirections

Indirections provide a way to centralize (and delay) the binding of information within a document. They can be used to share information that is intended to be consistent.

1.6.5.1 Styles and style sheets

Documents generally follow stylistic conventions for presenting different kinds of content. E.g., major headings may be in bold face with twelve points of extra leading, minor headings in italic with six points of extra leading. If this information is explicitly bound for each piece of content, then a stylistic change may require locating and changing all the relevant bindings (note that italic is likely to be also used for other purposes, such as emphasis). If, however, the binding is done indirectly, through a style, a single change will be effective for all places where the style is referenced. Note that each occurrence of a tag implicitly establishes an indirection through the same identifier; this is convenient in associating styles with semantically meaningful tags. For example:

MajorHeading ← 'PARAGRAPH$ Bold leading←+12'
MinorHeading ← 'PARAGRAPH$ Italic leading←+6'

2. The Language Basis: Syntax and Semantics

2.0. Notes and possible additions/changes

January 13, 1983 10:52 pm Consider defining a set of standard names that can safely be used in a "self-contained" scope without danger of ever conflicting with a standard name; e.g., something like v, v1, v2, ...

January 13, 1983 11:17 pm, consider the following definition:

MakeList ← '

IF LE!{value.n 0} THEN NIL; -- or some conditional --

value.item -- lay down an item --

value.n -← 1 -- count off one item placed --

MakeList -- "recur" --

' -- end of MakeList --

It can be applied (invoked), e.g., as

MakeList!{n� item←'10-value.n'} = {0 1 2 3 4 5 6 7 8 9}

MakeList!{n𡤃 item←'MakeList!{n←value.n item←'value.n'}' } =
{ {3 2 1} {2 1} {1} }

2.2. Discussion of Features

[Note that we have a formal semantic definition for this language that is every bit as precise as the grammar above. However, we have not yet figured out how to present it in a form that humans find equally palatable, so we have placed it in Appendix C.]

Literals are the primitive elements by which the values in a document are represented. There are two constants of type Boolean, T (true) and F (false). Integer, real, and string literals are denoted as in InterPress <<REF>>. Universals are like Lisp atoms; a universal is denoted by an identifier formed from uppercase letters and digits only. A universal denotes only itself, nothing more. As well as its normal uses to denote null elements, NIL has the additional property that when used as a piece of content in a span, it disappears, as if nothing had been written there, i.e., {1 NIL 2} is the same as {1 2}, which is the same as {NIL 1 NIL 2 NIL NIL}, etc.

term ::= primary op term
op ::= "+" | "—" | "*" | "/"

Both the primary and the term must reduce to numbers; the arithmetic operators are evaluated right-to-left (a la APL, without precedence) and bind less tightly than function application. The result is a real if either operand is real.

application ::= id

Id is looked up in the context of the current span (section <<LookupRule>> describes the lookup rule in detail, but it is basically dynamic scoping following the span nesting structure, with deeper and more-to-the-right bindings masking those higher and more-to-the-left in the tree corresponding to span nesting). Depending on its current binding, the value of id could be a piece of content, a binding, or a label; if the rhs bound to id was quoted, that expression is evaluated in the current context. In the outermost context X, every identifier is bound to the universal formed by replacing each letter of the identifier by its uppercase equivalent and each digit in the identifier by itself.

application ::= name "." id

A qualified name represents lookup in the structured context of a span value; name must be bound to a span, in which context id is looked up.

application ::= universal

A value may be bound to a universal; this value is only used when the universal appears as a tag in a span. This mechanism can be used to provide some global default value(s) or sequence of bindings, or whatever for that universal, or nothing if its global value is NIL.

application ::= ( name | universal ) "!" content

The dyadic operator "!" is used to pass arguments in an application. The value of the content is bound to the id value in a new, empty context nested within the current context and then the name or universal is invoked. This nesting ensures that any bindings or side effects obtained while evaluating context (with the exception of global bindings) will not be visible in the context in which the application appears. If a quoted expression is bound to the name, the arguments to it can be accessed using the name value.

If the application involves a universal (either explicitly, or because the name is bound to a universal), the (builtin) function corresponding to that universal is applied to the argument value. Part of the definition of Layer 2 will involve the specification of a small set of standard functions, which may be expanded in various Layer 3 extensions.

In any application, the name may be followed by "%" to indicate that the indirection reflects structure that must be maintained in the document. Replacing the indirection by its value in the current context is a value-preserving loss of structural fidelity. (An invocation that is simply a name is an abbreviation that need not be preserved.)

conditional ::= "IF" term "THEN" item* "ELSE" item* ";"

This is a standard conditional item sequence. The value and effect are those of item1* if the term evaluates to T in the current context, those of item2* if it evaluates to F. Only one of item1* or item2* will be evaluated, depending on the value of term.

<<EDIT POINT>>

span ::= "{" item* "}"

Spans form nested contexts, and affect the containing context only through global (:=) bindings to ids.

item* ::= ""

The empty sequence of items has no value and no effect; this is the basis for the following recursive definition.

item* ::= item1 item*

In general, the value of a sequence of items is just the sequence of item values; bindings change the context of items to their right in the sequence.

subscription ::= primary "(" term ")"

Primary must evaluate to a span for subscription to be meaningful. The value of term must be a non-negative integer. Then the term'th piece of content in the span denoted by primary is extracted as the value of the subscription.

local ::= name "←" rhs

This adds a single binding to the current scope (i.e., to its associated context); bindings have no other "side effects" and no value (i.e., they do not change the length of a containing span value).

global ::= ( name | universal ) ":=" rhs

This adds a single binding to the outermost environment X. It makes sense to bind something to a universal only if the universal is a tag name (see tag below).

binding ::= name mode op term

"name mode op term" is just a convenient piece of syntactic shorthand for
"name mode name op term".

mode ::= "←" | ":="

A value can be bound to a name either locally ("←") in the environment of the span in which the binding appears, or globally (":=") in the environment of the root span of a script.

rhs ::= "'" item* "'"

A quoted rhs is evaluated in the environment of invocation, rather than the environment current at the point of binding.

rhs ::= "{" binding* "}"

This creates a new span value that may be used much like a record.

openSpan ::= "\" item

This opens the span value designated by item, essentially stripping away its bounding braces. For Example

os←{a𡤃 b←T} { \os c𡤋}

is equivalent to

{a𡤃 b←T c𡤋}.

tag ::= universal "$"

This gives the containing span the property denoted by the universal. It also looks for a binding to the universal in X, the outermost environment; if a binding exists, it is invoked in the current context. This gives an easy way to attach a tag to a span and provide a set of defaults associated with the tag.

link ::= "LINKS" id

This introduces the link set whose main name component is id, and defines its scope.

link ::= name "^"

This identifies the immediately containing span as a source of the link name (like a reference to the set of spans which are link targets).

link ::= name ":"

This identifies the immediately containing span as a target of each of the links that is a prefix of name. For example, the link target "id1.id2...idn:" would make the span containing it a target in the link sets for id1, id1.id2, ..., id1.id2...idn.

APPENDIX A

GLOSSARY

Italics indicate words defined in this glossary.

abbreviation An invocation used to shorten a script, rather than to indicate structure

attribute A component of an environment, identified by its name, which is bound to a value

base language The part of the Interscript language that is independent of the semantics of particular properties and attributes

base semantics The semantic rules that govern how scripts in the base language are elaborated to determine their contents, environments, and labels

binding The operation of associating a value with a name to add an attribute to an environment; also the resulting association

binding mode A value may be bound to an identifier as local, const or global

Boolean An enumerated primitive type (F, T) used to control selection and as primitive values

const binding A binding of an attribute that prevents its being rebound in any contained scope

contents The vector of values denoted by a span of a script

definition Another name for a const binding

document The internalization of a script in a representation suitable for some editor

dominant structure The tree structure of a document corresponding to the span structure of its script

editor-specific name A non-standard name used by a specific editor in scripts it generates; an editor may use editor-specific terms without interfering with the interchangeability of a script if it provides definitions of the standard names in terms of its editor-specific names

elaborate (verb) To develop the semantics of a script or a span of a script according to the Interscript semantic rules. This is a left-to-right, depth-first processing of the script

encoding A particular representation of scripts

environment A value consisting of a set of attributes. An environment may be either free-standing or nodal. A free-standing environment is a structured value much like a record, with the components being the attributes of the environment. A nodal environment is associated with a span of a script and represents the attributes bound in that span.

expression A syntactic form denoting a value

external environment A standard environment relative to which an entire script is elaborated

externalization The process of converting from a document to a script; also the result of that process

fidelity The extent to which an externalization or internalization preserves contents, form, and structure

hexInt A component of a hexSequence formed from a pair of letters in the set {A,B,...,O,P}, and representing an integer in the range [0..256)

hexSequence A sequence of hexInt pairs enclosed between "#" pairs and used to encode characters in string literals, e.g., #ENCODE#

hierarchical name A name containing at least one period, whose prefix unambiguously denotes the naming authority that assigned its meaning

identifier A sequence of letters used to identify an attribute

integer A mathematical integer in a limited range; one of the primitive types

interchange encoding The standard encoding for scripts

internalization The process of converting from a script to a document; also the result of that process

Interscript The current name of this basis for an editable document standard

invocation The appearance of a name in an expression, except as the attribute of a binding

label A tag, or a source, a target, or a link introduction placed in a span

link The cross product of a source and a target; in general, a link is a set of (source, target) pairs; in the special case when there is exactly one source and one target, a link behaves like a directed arc between a pair of spans

link introduction The appearance of LINKS id in a span, where id is the main identifier of a link

literal A representation of a value of a primitive type in a script

local binding A binding of a value to a name, causing the current environment to be updated with the new attribute; any outer binding's scope will resume at the end of the innermost containing span

name A sequence of identifiers internally separated by periods; e.g., a.b.c

nested environment The initial environment of a span contained in another span

NIL A name for the empty value; it does not lengthen a vector or span in which it appears

span Everything between a matched pair of {}s in a script; this generally represents a branch point in a document's dominant structure

NULL Identifies the empty environment; the value it associates with any identifier is NIL

OUTER A standard attribute of every environment:

For a free-standing environment (i.e., a record-like, structured value), OUTER=NULL

For a nodal environment, OUTER's value is the environment of the current span's parent just prior to the start of the current span.

For the root span of a document, OUTER=X.

For X, OUTER=NULL

global binding A kind of binding (indicated by ":=") that modifies the environment of the root span of a document only, and hence may endure beyond the end of the current span and may be seen by spans to the right of the current span, even those not hierarchically descended from the current span.

primitive type Boolean, Integer, Real, String, or Universal

primitive value A literal or a span, vector, or environment containing only primitive values

private encoding One of a number of non-standard encodings of a script

property Each tag on a span labels it with a property; the properties of a span determine how it may be viewed and edited

quoted expression A value which is an expression bracketted by single quotes ("'"); the expression is evaluated in each environment in which the identifier to which it is bound is invoked

real A floating point number

scope The region of the script in which invocations of the attribute named in a binding yield its value; the scope starts textually at the end of the binding, and generally terminates at the end of the innermost containing span

script An Interscript program; the interchangeable result of externalizing a document

selection A conditional form in a script that denotes one of two expressions, depending on the value of a Boolean expression in the current environment

source The set of spans with REF link, which thereby refer to the set of target links.

string A literal which is a vector of characters bracketed by "<>", e.g., <This is a string!>

style A quoted expression to be invoked in a span to modify the span's environment, labels, or contents

Sub A standard component of each environment, which is implicitly invoked to initialize nested environments

SUBSCRIPT A function that can be used to extract a value from a vector,
e.g. SUBSCRIPT[(a b <str>), 3] is the value <str>

tag A universal name labelling a span using the syntax universal$; the properties of a span correspond to the set of tags labelling it

target The set of spans labelled with link:

transparency A characteristic of scripts that allows an editor to identify the spans of a script that it understands and thereby enables it to operate on those spans without disturbing the ones that it doesn't understand

Units A set of definitions relating various typographical and scientific units to the Interscript standard units, meters; e.g., inch=2.54E2*meter, pt=.013836*inch

universal An identifier formed entirely of uppercase letters and digits

value A primitive value, span, vector, environment, universal, or quoted expression

vector An ordered sequence of values that may be subscripted

X The standard outer environment for an entire script; the value of an unbound identifier in X is the universal consisting of the same letters in upper case