[phylum]<3-LISP>documentation>old-3-manual-02.bravo!1

Page Numbers: Yes X: 306 Y: 1.0" First Page: 7
Margins: Top: 1.0" Bottom: 1.3"
Heading:

CS-370 (FALL 1982)3-LISP REFERENCE MANUALOctober 30, 1982

——————

4. Notation

——————

The 3-LISP internalisation function (the notational interpretation function Q that maps notations into internal structures) is not, strictly speaking, a primitive part of the language definition, since it is not used in internal processing (i.e., discarding it will not topple the tower). There is, however, what is called a standard notation that is used in all documentation (including this reference guide), and which is provided with a 3-LISP system upon initialisation. (A user may, however, completely replace it with his/her own version, if desired). This section explains that notation, in two passes: section 4.a gives an informal overview; section 4.b provides a detailed but precise characterization.

4.a. Overview

The lexical notation is designed to satisfy three goals:

1.In so far as possible, to resemble standard LISP notational practice;
2.To maintain category alignment with the field (one lexical type per structural type);
3.To be convenient.

The goal of category alignment is met by having the standard notation for each type be identifiable in the first character, as indicated in the following chart:

TypeLeading CharacterExamples

Numeraldigit0, 1, -24, ...
Boolean$$T, $F
Pair((A . B), (PLUS 2 3), ...
Rail[[1 2 3], []
Handle’’A, ’(+ 2 3), ’’[]
Atomletter‘A’, ‘REDUCE’, ...
Closure{CLOSURE{CLOSURE IF}
Streamer{STREAMER{STREAMER Primary}
Charat‘#’#A, #/, ...

Some subtleties complicate this clean correspondence. Specifically:

1.Numerals can have a leading ‘+’ or ‘-’ (i.e., ‘-24’);

2.An atom label may begin with a numeral providing it contains at least one non-digit (i.e., ‘6N237E’ and ‘1+’ are valid atom labels);

3.Left brace (‘{’) is used as a general notational escape, not only for closures, but also for un-labelled atoms, errors, and other notational commentary. Its only current use on input is for closures (although it is unlikely that a user will ever want to read closure notation directly).

4.Case is ignored in atom labels (converted to upper case on input).

5.Some lexical abbreviations (notational sugar) are supported:

‘(exp1 exp2 ... expk)’abbreviates‘(exp1 . [exp2 ... expk])’

‘"char1 char2 ... chark"’abbreviates‘’[#char1 #char2 ... #chark]’

There are many detailed questions (such as how to embed quotation marks within a string) that are answered in section 4.b. In addition, the internalisation function is described in that section. The following grammar, however, presents the essence of the standard 3-LISP notation (ignoring such issues as just how tokens must be separated, and so forth):

A Summary of the Standard 3-LISP Grammar

1.Expression::=Regular | Escape | Abbreviation

3.Escape::=‘{’ Escape-type additional-information ‘}’
Escape-type::=‘CLOSURE’ | ‘STREAMER’ | ‘ERROR’ | ‘ATOM’ | ...

5.Sign-character::=‘+’ | ‘-’
Digit-character::=‘0’ | ‘1’ | ‘2’ | ‘3’ | ‘4’ | ‘5’ | ‘6’ | ‘7’ | ‘8’ | ‘9’
Any-character::=Any character in the character set
Atom-character::=Any-character except Space, End-of-line, ‘$’, ‘#’, ‘"’, ‘;’, or Special
String-character::=Any-character except ‘"’
Normal-character::=Any-character except End-of-line
Special::=‘(’ | ‘)’ | ‘[’ | ‘]’ | ‘↑’ | ‘↑’ | ‘.’ | ‘,’ | ‘{’ | ‘}’ | ‘’’ | ‘̀ ’

6.Token-sequence::=(Separator* Token Separator* )*
Separator::=Space | End-of-line | Tab | Comment
Comment::=‘;’ Normal-character* End-of-line

For convenience, the following table lists all "special" (i.e., non alpha-numeric) characters that are used for some special purpose:

Character and UseCharacter and Use

(—starts pairs↑—abbreviation for UP
)—ends pairs↑—abbreviation for DOWN
.—separates CAR and CDR̀ —back-quote
[—starts rails,—normalised expression within back-quote
]—ends rails;—starts comments (to CRLF)
’—handles-—starts negative numerals
#—charats+—starts some positive numerals
$—booleans ($T and $F){—starts notational escapes
"—starts and ends strings}—ends notational escapes

In additional to the foregoing notational protocols embodied in the internaliser and externaliser, we adopt a set of additional notational conventions on identifiers. The 3-LISP system pays no attention to these, but all of the code presented in this manual honours them, and they are recommended for users as well. Specifically:

1.A suffix exclamation point is used on variables (atoms) that are intended always to designate a normal-form structure. For example:

(NORMALISE ARGS ENV (LAMBDA SIMPLE [ARGS!] ... )).

2.A suffix asterisk is used on variants of procedures that take an indefinite number of arguments, where the standard version accepts only one (for example: ID* is a multi-argument version of ID).

3.Prefix and suffix asterisks are used to identify global variables used for non-functional referents. For example, the global environment is standardly bound to the atom *GLOBAL*.

4.A prefix at-sign is used to identify atoms that are used as constants or flags (i.e., that are not bound). For example, the primitive TYPE function maps entities onto the atoms @NUMBER, @HANDLE, and so forth.

4.b. Detailed Specification

The meta-language used to describe the standard notation is a variant of BNF:

‘?’—terminal symbol
Type—non-terminal symbol
[x]—optionality
x | y—alternation (lowest precedence)
x y—concatenation (medium precedence)
x*—indefinite repetition , zero or more (highest precedence)
x+—indefinite repetition, one or more times (highest precedence)
(x)—grouping

In the standard notation, structures are notated with sequences of lexical tokens, each of which is composed of a sequence of one or more characters chosen from a collection of characters called the character set. Although the exact composition of the character set is unimportant, we assume that it includes all of the ASCII characters. There are six basic kinds of lexical tokens:

Charat-tokens are used to notate charats (i.e., character designators):

Charat-token::=‘#’ Any-character

where Any-character is any character in the character set.

Boolean-tokens are used to notate booleans (i.e. truth-value designators):

Boolean-token::=‘$’ (‘T’ | ‘F’)

‘$T’ (and ‘$t’) notate the truth-designating boolean; ‘$F’ (and ‘$f’) notate the one that designates falsity.

Numeral-tokens are used to notate (decimal) numerals in the conventional way:

Numeral-token::=[Sign-character] Digit-character+
Sign-character::=‘+’ | ‘-’
Digit-character::=‘0’ | ‘1’ | ‘2’ | ‘3’ | ‘4’ | ‘5’ | ‘6’ | ‘7’ | ‘8’ | ‘9’

Examples: ‘100’, ‘+100’, and ‘00100’ all notate the numeral that designates the abstract number one hundred; ‘-100’ notates its additive inverse.

Atom-tokens are used to notate atoms:

Atom-token::=Atom-character+

where Atom-character excludes Space, End-of-line, ‘$’, ‘#’, ‘"’, ‘;’, and all characters used as special tokens. Two notes:

1.Any atom token that also satisfies the rules for numeral tokens will be taken to be the latter. For example, ‘1+’ and ‘+1+’ notate atoms, whereas ‘+1’ notates a numeral.

2.Case is ignored for atom tokens. For example, ‘HELLO’, ‘UP’, ‘1+’, ‘=’, and ‘A-very-long-name’ notate distinct atoms, whereas ‘Zaphod’, ‘ZAPHOD’, ‘zaphod’, and ‘ZaPhOd’ all notate the same atom.

String-tokens have the following form:

String-token::=‘"’ (String-character | ‘"’ ‘"’)* ‘"’

where the legal String-characters exclude only the ‘"’ character. String-tokens notate character sequence designators (i.e. rails of charats) in the conventional way. Examples: ‘"A"’, ‘"Good day"’, ‘""’, ‘""""’, and ‘"--> Your message here <--"’.

Special-tokens do not notate structures by themselves; rather, they are used to punctuate the notation for composite structures.

Special-token::=‘(’ | ‘)’ | ‘[’ | ‘]’ | ‘↑’ | ‘↑’ | ‘.’ | ‘,’ | ‘{’ | ‘}’ | ‘’’ | ‘̀ ’

Sequences of characters are broken down into tokens in the conventional way, with the rule that there must always be at least one token separator between adjacent non-special tokens.

Token-sequence::=(Separator* Token Separator*)*
Separator::=Space-character | End-of-line-character | Comment
Comment::=‘;’ Normal-character* End-of-line-character

where Normal-character includes all characters except End-of-line-Character. For example, the character stream ‘(foo [1 $T ↑#’])’ consists of the nine tokens: ‘(’, ‘foo’, ‘[’, ‘1’, ‘$T’, ‘↑’, ‘#’’, ‘]’, and ‘)’.

The main grammar for expressions can now be given in terms of a sequence of tokens:

Examples: ‘(A . B)’ notates a pair whose CAR is notated ‘A’ and whose CDR is notated ‘$T’; ‘[]’ notates an empty rail; ‘[1 2 3 4 5 6]’ notates a rail whose FIRST and REST would be notated ‘1’ and ‘[2 3 4 5 6]’, respectively; ‘’100’ notates the handle for the numeral ‘100’.

Several abbreviations are provided:

The notation ‘↑s’ is shorthand for ‘(UP s)’, ‘↑s’ abbreviates ‘(DOWN s)’, and ‘(s1 s2 ... sn)’ is notationally equivalent to ‘(s1 . [s2...sn])’. String-tokens are actually abbreviations for rails of charats; e.g., ‘"Ford"’ is short for ‘[#F #o #r #d]’.

A general comment: The internalization function described above is not onto; in other words, there are structures that are not the result of internalizing any lexical expression, for two reasons. First, upon internalization, all pairs and rails notated are created from previously-inaccessible cells. Hence, any structure with a shared sub-structure will have no lexical counterpart. Second, closures and un-named atoms (those created by ACONS) have no standard lexical notation also have no lexical counterparts.

The standard version of PRINT, approximately the inverse of READ, currently makes no attempt to deal in a sophisticated way with either of these problems. In particular, no attempt is made to show shared substructure, and un-notatable structures — closures, nameless atoms, and circular structures — are marked with a standard lexical escape: a note enclosed in braces (e.g., ‘{CLOSURE}’, ‘{CIRCULAR RAIL}’, etc.).

4.c. Back-Quote

The back-quote feature is useful when defining macros, since it allows one to conveniently notate expressions for constructing structures that resemble the lexical expression notated. For example, ‘̀ (A . B)’ is notationally equivalent to ‘(PCONS ’A ’B)’, which notates a structure that normalises to a structure that would be notated ‘’(A . B)’. The comma notation, meaningful only within the scope of a back-quote, gives one a fill-in-the-blanks-like capability; for example, the notation ‘̀ (A . ,X)’ is shorthand for ‘(PCONS ’A X)’, which notates a structure that would normalize to a structure that would be notated ‘(A . HELLO)’ in any environment where X was bound to ’HELLO. The following examples may help to make the workings of this feature clear (assume that B is bound to ’2 and C to ’’3):

Notationabbreviatesand normalises to

̀ 1’1’1
̀ (A . B)(PCONS ’A ’B)’(A . B)
̀ [A B](RCONS ’A ’B)’[A B]
̀ [A ,B C](RCONS ’A B ’C)’[A 2 C]
̀ [[A B][,C D]](RCONS ’[A B] (RCONS C ’D))’[[A B][’3 D]]
̀ ̀ [A ,B ,,C](PCONS (’RCONS ̀ [’A B ,C]))’(RCONS [’A B ’3])

An intensional note: the back-quote expander will not use a token tail of a rail if any part of that rail has a comma’ed expression within it. Specifically, we have:

1> (DEFINE TEST
(LAMBDA SIMPLE [A]
̀ [,A 2 3]))
=> TEST
1> (LET [[X (TEST ’1)]
[Y (TEST ’2)]]
(= (REST X) (REST Y)))
=> $F

since ̀ [,A 2 3] abbreviates (RCONS A ’2 ’3), not (PREP A ’[2 3]).

4.d. Graphical Notation

It is sometimes useful to have a graphical notation that is one-to-one and complete (the lexical notation is of course both ambiguous and incomplete). LISP 1.5 used boxes and arrows for CONS cells; for 3-LISP we adopt essentially the same style, although we have to distinguish pairs from rails and so forth. Specifically:

[ ... Get from pages 182 – 183 of the thesis ... ]