[phylum]<3-LISP>documentation>2-lisp>2-manual-1.bravo!5

Page Numbers: Yes X: 310 Y: 10.44" First Page: 1
Margins: Top: 1.0" Bottom: 1.3"
Heading:

CS-370 (FALL 1982)2-LISP REFERENCE GUIDE

————————————————————————————————————————————

2-LISP REFERENCE GUIDE

————————————————————————————————————————————

NOTE BENE: 2-LISP is an intermediate dialect — merely a rest stop on the road to 3-LISP. We do not intend to implement it, although it could be implemented in very much the same style as SCHEME. As currrently defined, 2-LISP contains lots of rough edges, some of which represent incurable problems with any meta-structural but non-reflective language. We will be using it in class because it is markedly simpler to understand than 3-LISP, and because 3-LISP is essentially a conservative extension of it.

————————————————————————————————————————————

0. Contents

1.Overview
2.The 2-LISP Structural Field
3.Semantics
4.Notation
5.Processor
6.Meta-Circular Processor
7.2-LISP Primitives
8.Standard Utilities Reference Manual

————————————————————————————————————————————

1. Overview

2-LISP is a statically scoped, higher order, untyped and unsorted, semantically rationalised, category aligned, meta-structural, dialect of LISP. Unlike 3-LISP, 2-LISP does not support any form of reflection. More specifically:

a.Statically Scoped and Higher Order: Like SCHEME and the untyped l-calculus, functions of any degree may be used as arbitrary arguments. Free variables are statically (lexically) scoped. Consequently, flat (non meta-structural) 2-LISP is on its own an omega-order functional calculus.

b.Untyped and Unsorted: Though the semantic domain and primitive functions are strongly typed (in the computer scientist’s sense — what logicians call sorted), user-defined procedures need not be typed, and no typing information is explicitly stated. In addition, there are no type restrictions (in the logician’s sense).

c.Meta-structural: Like all LISPs, quotation is provided primitively, allowing the ability to designate program structures. Because naming and normalisation are orthogonal, quotation is a structural primitive, not a functional primitive (i.e. there is no primitive QUOTE procedure). Handles (normal-form structure designators) are unique.

d.Semantically Rationalised: Traditional evaluation is rejected in favour of independent notions of simplification and designation. The 2-LISP processor is instead based on a form of simplification called normalisation that takes each structure into a co-designating structure in normal form. As a consequence, processing is semantically flat: programs may cross the meta-structural hierarchy only with the explicit use of level-crossing primitives (↑ and ↑, q.v.). In addition, the processor is idempotent (all normal-form structures normalise to themselves). With the exception of side-effect primitives (REPLACE and SET), the declarative semantics (F) and the procedural semantics (Y) are independently specified.

e.Category aligned: The primitive structural categories are set in one-to-one correspondence with the declarative semantic categories, and with the categories of procedural treatment. In addition, there are corresponding notational categories (although the standard notation, q.v., has some additional complexity for user convenience).

f.Procedures and Functions: The primitive LAMBDA is used purely as a naming operator; recursion is treated with the explicit use of Y-operators. Closures are a distinguished structural type, and are normal-form function designators. There is uniform treatment of EXPRs, IMPRs, and MACROs.

2. Structural Field

Objects in the structural field are called structures (mathematical entities like numbers and sets are called abstractions). All structures are internal to the machine; all abstractions are external (the world consists of both structures and abstractions, and undoubtedly of many other things as well). There are 8 structure types, over which 8 asymmetric first-order locality relationships are defined. (The notational information here is informal; for more detail see section ???.) Note that five and a half of the categories are normal-form structures, of which 4 are canonical.

TypeDesignationNormal?Canonical?Standard Notation

1.NumeralsNumbersYesYesa sequence of digits
2.BooleansTruth valuesYesYes$T or $F
3.HandlesS-expressionsYesYes’EXP
4.CharatsCharactersYesYes#CHAR
5.ClosuresFunctionsYesNo<CLOSURE: TYPE ENV PATTERN BODY>
6.RailsSequencesSomeNo[EXP EXP ... EXP]
7.Pairs(Value of application)No—(EXP . EXP)
8.Atoms(Referent of binding)No—a sequence of alphanumerics

There are 8 accessible binary first-order relationships (all total functions on their domains):

Accessible
NameTypeMutable ←←

1.CARPairs ← S-expressionsYesYesNo
2.CDRPairs ← S-expressionsYesYesNo
3.FIRSTRails ← S-expressions U {T}YesYesNo
4.RESTRails ← Rails U {T}YesYesNo
5.ENVClosures ← RailsYesYesNo
6.PATTERNClosures ← S-expressionsYesYesNo
7.BODYClosures ← S-expressionsYesYesNo
8.REFHandles ← S-expressionsNoYesYes

Notes:

a.There is no derived notion of a list, and no atom NIL.

b.Because of these functions, pairs, rails, and closures are pseudo-composite; the rest are atomic.

c.Only those rails whose elements are in normal form are themselves in normal form.

d.REF is one-to-one and onto; therefore REF-1 is a total function on s-expressions.

e.FIRST and REST are undefined together; thus [ FIRST(P) = T ] W [ REST(P) = T ]

f.Ultimately we intend to support full rational arithmetic, but at the moment only integer numerals are defined.

3. Semantics

The full semantical significance of a 2-LISP structure is consists of two parts: its declarative import and its procedural consequence. Very roughly, the declarative import is what a structure stands for or designates, independent of any active processing on the part of the machine; its procedural consequence is what behaviour it engenders upon being processed (what it returns, how it affects the processor and field state, etc.). For example, declaratively, the boolean constant $T designates Truth, whereas its procedural consequence is that it is returned unchanged, causing no side-effects.

The declarative semantical domain is typed as follows:

2-manual-figure-01.press

The eight structure types divide into two natural classes: the normal-form categories and the general-purpose categories. In the first class are the five types (booleans, numerals, rails, closures, and charats) that serve as normal-form designators for the five abstract types, plus the handles, which serve as normal-form designators for all of the structures. In the second class are the atoms and pairs, which are used for general context-relative expressions as names and redexes, and which can designate entities of any type. These semantical relationships are identified in the following table:

2-manual-figure-02.press

More formally, we associate with each expression its full significance, using the meta-theoretic function S. The procedural consequence is broken into three parts: what is returned (Y), the resulting environment, and the resulting structural field. Thus S is typed as follows:

S : [ STRUCTURES X FIELDS X ENV X CONT ] ← [ OBJECTS X STRUCTURES X ENV X FIELDS ]

In terms of this general signficance function, the declarative and procedural interpretation functions F and Y are defined as simple projections:

F : [ STRUCTURES X FIELDS X ENV X CONT ] ← [ OBJECTS ]
= lS,F,E,C . [S(S,F,E,C)]1

Y : [ STRUCTURES X FIELDS X ENV X CONT ] ← [ STRUCTURES ]
= lS,F,E,C . [S(S,F,E,C)]2

For example, ...

4. Notation

The 2-LISP internalisation function (i.e., the notational interpretation function Q that maps notations into internal structures) is not, strictly speaking, a primitive part of the language definition, since it is not used in internal processing. There is, however, what is called a standard notation that is used in all documentation (including this reference guide), and which is provided with a 2-LISP system upon initialisation. (A user may, however, completely replace it with his/her own version, if desired). This section explains that notation, in two ways: section 4.a gives an informal overview; section 4.b provides a detailed but precise characterization.

4.a. Informal Overview

The lexical notation is designed to satisfy three goals:

1.In so far as possible, to resemble standard LISP notational practice;
2.To maintain category alignment with the field (one lexical type per structural type);
3.To be convenient.

The goal of category alignment is met by having the standard notation for each type be identifiable in the first character, as indicated in the following chart:

TypeLeading CharacterExamples

Numeraldigit0, 1, -24, ...
Boolean‘$’$T, $F
Pair‘(’(A . B), (PLUS 2 3), ...
Rail‘[’[1 2 3], []
Handle‘’’’A, ’(+ 2 3), ’’[]
Atomletter‘A’, ‘REDUCE’, ...
Closure‘<CLOSURE’<CLOSURE IF>
Charat‘#’#A, #/, ...

Some subtleties complicate this clean correspondence. For example:

1.Numerals can have a leading ‘+’ or ‘-’ (i.e., ‘-24’);

2.An atom label may begin with a numeral providing it contains at least one non-digit (i.e., ‘6N237E’ and ‘1+’ are valid atom labels);

3.Left angle bracket is used as a general notational escape, not only for closures, but also for un-labelled atoms, errors, and other notational commentary. Its only use on input is for closures (although it is highly unlikely that the user will ever want to read closure notation directly).

4.Case is ignored in atom labels (converted to upper case on input).

5.Some lexical abbreviations (notational sugar) are supported:

‘(exp1 exp2 ... expk)’abbreviates‘(exp1 . [exp2 ... expk])’

‘"char1 char2 ... chark"’abbreviates‘’[#char1 #char2 ... #chark]’

There are of course many detailed questions (such as how to embed quotation marks within a string) that are answered in section 4.b. In addition, the internalization function is described in that section. The following grammar, however, presents the essence of the standard 2-LISP notation (ignoring such issues as just how tokens must be separated, and so forth):

A Summary of the Standard 2-LISP Grammar:

2.Numeral::=[Sign-character] Digit-character+
Boolean::=‘$T’ | ‘$F’ | ‘$t’ | ‘$f’
Charat::=‘#’ Any-character
Atom::=Atom-character+
Pair::=‘(’ Structure ‘.’ Structure ‘)’
Rail ::=‘[’ Structure* ‘]’
Handle::=‘’’ Structure

4.Sign-character::=‘+’ | ‘-’
Digit-character::=‘0’ | ‘1’ | ‘2’ | ‘3’ | ‘4’ | ‘5’ | ‘6’ | ‘7’ | ‘8’ | ‘9’
Any-character::=Any character in the character set
Atom-character::=Any-character except Space, End-of-line, ‘$’, ‘#’, ‘"’, ‘;’, or Special
String-character::=Any-character except ‘"’
Normal-character::=Any-character except End-of-line
Special::=‘(’ | ‘)’ | ‘[’ | ‘]’ | ‘↑’ | ‘↑’ | ‘.’ | ‘,’ | ‘{’ | ‘}’ | ‘’’ | ‘̀ ’ | ‘<’ | ‘>’

5.Token-sequence::=(Separator* Token Separator* )*
Separator::=Space | End-of-line | Tab | Comment
Comment::=‘;’ Normal-character* End-of-line

For convenience, the following table lists all "special" (i.e., non alpha-numeric) characters that are used for some special purpose:

Character and UseCharacter and Use

(—starts pairs↑—abbreviation for UP
)—ends pairs↑—abbreviation for DOWN
.—separates CAR and CDR̀ —back-quote
[—starts rails,—normalised expression within back-quote
]—ends rails;—starts comments (to CRLF)
’—handles-—starts negative numerals
#—charats+—starts some positive numerals
$—booleans ($T and $F)<—starts notational escapes
"—starts and ends strings>—ends notational escapes

4.b. Detailed Specification

The meta-language used to describe the standard notation is a variant of BNF:

‘?’—terminal symbol
Type—non-terminal symbol
[x]—optionality
x | y—alternation (lowest precedence)
x y—concatenation (medium precedence)
x*—indefinite repetition , zero or more (highest precedence)
x+—indefinite repetition, one or more times (highest precedence)
(x)—grouping

In the standard notation, structures are notated with sequences of lexical tokens, each of which is composed of a sequence of one or more characters chosen from a collection of characters called the character set. Although the exact composition of the character set is unimportant, we assume that it includes all of the ASCII characters. There are six basic kinds of lexical tokens:

Charat-tokens are used to notate charats (i.e., character designators):

Charat-token::=‘#’ Any-character

where Any-character is any character in the character set.

Boolean-tokens are used to notate booleans (i.e. truth-value designators):

Boolean-token::=‘$’ (‘T’ | ‘F’)

‘$T’ (and ‘$t’) notate the truth-designating boolean; ‘$F’ (and ‘$f’) notate the one that designates falsity.

Numeral-tokens are used to notate (decimal) numerals in the conventional way:

Numeral-token::=[Sign-character] Digit-character+
Sign-character::=‘+’ | ‘-’
Digit-character::=‘0’ | ‘1’ | ‘2’ | ‘3’ | ‘4’ | ‘5’ | ‘6’ | ‘7’ | ‘8’ | ‘9’

Examples: ‘100’, ‘+100’, and ‘00100’ all notate the numeral that designates the abstract number one hundred; ‘-100’ notates its additive inverse.

Atom-tokens are used to notate atoms:

Atom-token::=Atom-character+

where Atom-character excludes Space, End-of-line, ‘$’, ‘#’, ‘"’, ‘;’, and all characters used as special tokens. Two notes:

1.Any atom token that also satisfies the rules for numeral tokens will be taken to be the latter. For example, ‘1+’ and ‘+1+’ notate atoms, whereas ‘+1’ notates a numeral.

2.Case is ignored for atom tokens. For example, ‘HELLO’, ‘UP’, ‘1+’, ‘=’, and ‘A-very-long-name’ notate distinct atoms, whereas ‘Zaphod’, ‘ZAPHOD’, ‘zaphod’, and ‘ZaPhOd’ all notate the same atom.

String-tokens have the following form:

String-token::=‘"’ (String-character | ‘"’ ‘"’)* ‘"’

where the legal String-characters exclude only the ‘"’ character. String-tokens notate character sequence designators (i.e. rails of charats) in the conventional way. Examples: ‘"A"’, ‘"Good day"’, ‘""’, ‘""""’, and ‘"--> Your message here <--"’.

Special-tokens do not notate structures by themselves; rather, they are used to punctuate the notation for composite structures.

Special-token::=‘(’ | ‘)’ | ‘[’ | ‘]’ | ‘↑’ | ‘↑’ | ‘.’ | ‘,’ | ‘{’ | ‘}’ | ‘’’ | ‘̀ ’ | ‘<’ | ‘>’

Sequences of characters are broken down into tokens in the conventional way, with the rule that there must always be at least one token separator between adjacent non-special tokens.

Token-sequence::=(Separator* Token Separator*)*
Separator::=Space-character | End-of-line-character | Comment
Comment::=‘;’ Normal-character* End-of-line-character

where Normal-character includes all characters except End-of-line-Character. For example, the character stream ‘(foo [1 $T ↑#’])’ consists of the nine tokens: ‘(’, ‘foo’, ‘[’, ‘1’, ‘$T’, ‘↑’, ‘#’’, ‘]’, and ‘)’.

The main grammar for expressions can now be given in terms of a sequence of tokens:

Examples: ‘(A . B)’ notates a pair whose CAR is notated ‘A’ and whose CDR is notated ‘$T’; ‘[]’ notates an empty rail; ‘[1 2 3 4 5 6]’ notates a rail whose FIRST and REST would be notated ‘1’ and ‘[2 3 4 5 6]’, respectively; ‘’100’ notates the handle for the numeral ‘100’.

Several abbreviations are provided:

The notation ‘↑s’ is shorthand for ‘(UP s)’, ‘↑s’ abbreviates ‘(DOWN s)’, and ‘(s1 s2...sn)’ is notationally equivalent to ‘(s1 . [s2...sn])’. String-tokens are actually abbreviations for rails of charats; e.g., ‘"Ford"’ is short for ‘[#F #o #r #d]’.

The back-quote feature is useful when defining macros, since it allows one to conveniently notate expressions for constructing structures that resemble the lexical expression notated. For example, ‘̀ (A . B)’ is notationally equivalent to ‘(PCONS ’A ’B)’, which notates a structure that normalizes to a structure that would be notated ‘’(A . B)’. The comma notation, meaningful only within the scope of a back-quote, gives one a fill-in-the-blanks-like capability; for example, the notation ‘̀ (A . ,X)’ is shorthand for ‘(PCONS ’A X)’, which notates a structure that would normalize to a structure that would be notated ‘(A . HELLO)’ in any environment where X was bound to ’HELLO. The following examples may help to make the workings of this feature clear (assume that B is bound to ’2 and C to ’’3):

Notationabbreviatesand normalizes to

̀ 1’1’1
̀ (A . B)(PCONS ’A ’B)’(A . B)
̀ [A B](RCONS ’A ’B)’[A B]
̀ [A ,B C](RCONS ’A B ’C)’[A 2 C]
̀ [[A B][,C D]](RCONS ’[A B] (RCONS C ’D))’[[A B][’3 D]]
̀ ̀ [A ,B ,,C](PCONS (’RCONS ̀ [’A B ,C]))’(RCONS [’A B ’3])

A general comment: The internalization function described above is not onto; in other words, there are structures that are not the result of internalizing any lexical expression, for two reasons. First, upon internalization, all pairs and rails notated are created from previously-inaccessible cells. Hence, any structure with a shared sub-structure will have no lexical counterpart. Second, closures and un-named atoms (those created by ACONS) have no standard lexical notation also have no lexical counterparts.

The standard version of PRINT, approximately the inverse of READ, currently makes no attempt to deal in a sophisticated way with either of these problems. In particular, no attempt is made to show shared substructure, and un-notatable structures — closures, nameless atoms, and circular structures — are marked with a standard lexical escape: a note enclosed in angle brackets (e.g., ‘<CLOSURE>’, ‘<CIRCULAR RAIL>’, etc.).

5. Processor

[ ... Needs to be written. ... ]