Page Numbers: Yes X: 306 Y: 0.9" First Page: 1 Not-on-first-page
Margins: Top: 1.1" Bottom: 1.2"
Heading:
3-LISP PRIMER LISP: LANGUAGE AND LITERATURE April 13, 1984
————————————————————————————————————————————
VERY MUCH A FIRST DRAFTVERY MUCH A FIRST DRAFT
3-LISP PRIMER
Filed as:[phylum]<3-lisp>course>manual>primer.bravo
User.cm:[phylum]<BrianSmith>system>user.classic
Last edited:April 14, 1984 2:25 AM
————————————————————————————————————————————
This primer is a guided introduction for first time 3-LISP users. For detailed questions about 3-LISP, especially once you get started, you will want to use the 3-LISP manual. For questions about using 3-LISP on the Dandelions — how to "boot" the machine, how to use the editor and load files, etc. — you should refer to the 3-LISP System Guide.
A — Overview
A.1. A Declarative, Extensional View
Being at least in part a language, all 3-LISP expressions are taken as designating something (think of designating as a kind of naming). For example, the expressions 1, 100, and -6 designate the abstract numbers one, ten and negative six, respectively; $TRUE designates truth and $FALSE designates falsity; [] designates the empty sequence, and [1 2 3] the sequence whose elements are the numbers one, two, and three in that order. More generally, expressions of the form [x1 x2 ... xn] designate the abstract sequence of length n consisting of the objects designated by the xi in the specified order. And expressions of the form (f . a) designate the result of applying the function designated by f to the argument object designated by a (i.e., function applications are expressed in prefix form). The common case of applying a function to a sequence of n (>0) arguments (f . [x1 x2 ... xn]) is abbreviated (f x1 x2 ... xn).
So-called identifiers, such as Knights, f, and Calculus (case is unimportant) designate different objects in different contexts. We use the term ‘identifier’ here rather generally, and include within its scope both variables and what you might think of as constants, such as + and lambda. As it happens, all identifiers are treated in the same manner; there is a most general context, called the global context, in which many identifiers are taken as names of standard functions. For example, in the global context, + designates the addition function, and therefore the expression (+ 1 3), in the global context, designates the number that is the square of two.
The global context includes names for a wide assortment of standard functions defined over numbers, truth-values, and sequences. For numbers, +, -, *, and / designate the conventional arithmetic functions; ** designates exponentiation, and 1+ and 1- are (slightly confusing) names for increment and decrement functions, respectively. Thus (- 100 1) designates ninety-nine, (1+ 3) designates four, and (1- 3) designates two. There are five simple sequence operations, named NULL, LENGTH, FIRST, REST, and CONS. NULL is a simple predicate true only of empty sequences; LENGTH maps sequences onto the number of elements; FIRST selects the first element of a sequence; and REST selects the second through last. In general, sequences in 3-LISP are usually dealt with in terms of their first element and their "rest" (i.e., all but the first element); it turns out that this fits in well with the sorts of recursive program definitions that one typically writes. CONS (for "construct") can be viewed as the inverse of FIRST and REST; it takes an element and a sequence and yields a new sequence whose first element is the given element, and whose rest is the given sequence. I.e., CONS will put together what FIRST and REST take apart. Thus (FIRST [1 2 3]) designates the number one; (REST [10 20 30]) designates the two-element sequence of twenty and thirty in that order; (LENGTH [10 20 30 40]) designates four; and (CONS 1 [2 3]) and [1 2 3] designate the same sequence. In general, (FIRST (CONS x s)) and x will be co-designating expressions, as are (REST (CONS x s)) and s. (NULL s) will designate truth if and only if s designates an empty sequence. Finally, = designates the equality function — a function that has the value truth just in case its two arguments are in fact the same object. Thus (= 1 1) and (= 2 (1+ 1)) both designate truth.
There are several special operators (including LAMBDA, IF, and COND) that do not fit the standard mold in that they cannot be characterized solely in terms of what their arguments designate. So-called lambda-expressions, of the form (LAMBDA pattern body), are commonly used to designate functions for which one does not already have a simple name. In general, the pattern part will be written as a sequence of identifiers (called the formal parameters), and the body part will be a general expression, usually one involving some of the formal parameters. For example, (LAMBDA [X] X) designates the identity function (of one argument); (LAMBDA [X] (FIRST (REST X))) designates the function that yields the second element of its (single) argument, which is assumed to be a sequence. LAMBDA is not extensional because the function is not built up out of the designations of the pattern and body expressions. Those expressions are really acting as templates; in order to determine what function is designated, one must look at the expressions themselves, not at what they designate. (The pattern [X] in the foregoing definition of the identity function, for example, doesn’t designate anything in the global context, because X is not the name of anything in that context.)
Free variables take their binding from the closest binding in the lexically enclosing environment. (Unless otherwise indicated, all uses of standard identifiers will assume their global meaning.)
IF and COND are the two standard conditional operators in 3-LISP. IF designates a function (or at least something like a function — but lets not worry about technicalities yet) that selects its second or third argument, depending on whether its first argument is true or false, respectively. Thus, an expression of the form (IF p x1 x2) can be read (and thought of) as "if p then x1, else x2".
For example, the expression (IF (= 1 2) 100 200) designates two hundred, whereas (IF $TRUE [] [1]) designates the empty sequence. COND is a little more complex; its general form is (COND [p1 x1] [p2 x2] ... [pn xn]), and it designates the designation of the first xi for which the corresponding pi designates truth. Thus for example (COND [(= 1 2) 3] [$TRUE 4]) designates four; the following expression designates sixteen:
(COND [(= 1 2) (+ 3 4)]
[(= 1 3) (* 3 4)]
[(= 1 4) (** 3 4]
[$TRUE (/ 3 4)])
Note how much easier it is to read this expression when it is printed it out with each of the clauses of the conditional lined up vertically. This general style of printing expressions in a structured indented fashion is called pretty-printing.
We said that (LAMBDA pattern body) designated a function, and that (f x1 x2 ... xn) designated the value of applying the function designated by f to the arguments designated by x1 through xn. These two expression forms can be combined. For example, the following four expressions all designate the number sixteen:
1. ((LAMBDA [X] (* X X)) 4)
2. ((LAMBDA [X] (FIRST (REST X)))
[(+ 8 2) (* 8 2) (/ 8 2)])
3. ((LAMBDA [F N] (F N N))
+
8)
4. ((LAMBDA [X Y Z] (X Y Z))
(LAMBDA [N M] (* N (1+ M)))
(FIRST [2 7])
(FIRST (REST [2 7])))
The third and fourth of these illustrate the fact that 3-LISP is what is called higher-order: functions, as well as more normal arguments, can be used as arguments to other functions.
There is a standard abbreviation supported in 3-LISP, that makes expressions like the ones just listed much easier to write. In general, the expression:
(LET [[v1 x1]
...
[vn xn]]
body)
is an abbreviation for:
((LAMBDA [v1 ... vn] body)
x1 ... xn)
Thus the four expressions given just above could equally well have been written as follows:
1. (LET [[X 4]] (* X X))
2. (LET [[X [(+ 8 2) (* 8 2) (/ 8 2)]]
(FIRST (REST X)))
3. (LET [[F +] [N 8]]
(F N N))
4. (LET [[X (LAMBDA [N M] (* N (1+ M)))]
[Y (FIRST [2 7])]
[Z (FIRST (REST [2 7]))]]
(X Y Z))
Two more types of facility will complete this simple introduction. First, 3-LISP contains primitive functions and notations for dealing with strings. In particular, the expression "string-of-characters" designates the string of characters within the double quote marks. For example, the expression "Ethelred" designates the eight-character name of a famous old English king; the expression "" designates a string consisting of no characters at all. Similarly, the expression #c designates the character c. For example, #e designates the last letter in this sentence. Some functions defined on strings include: STRING-LENGTH, STRING-CONS, and STRING-NULL. For example, (STRING-CONS #B "ale") designates the string ‘Bale’.
There is also an interesting function in 3-LISP called TYPE (i.e., the identifier TYPE, in the global context, designates this function), which maps all kinds of objects onto identifiers that indicate what type they are. For example, (TYPE 3) designates the identifier NUMBER; (TYPE [1 2 3]) designates the identifier SEQUENCE, and (TYPE +) designates the identifier FUNCTION.
A.2. A Procedural, but still Extensional, View
We could go on in the spirit of the last section for quite a while, gradually introducing almost all the standard 3-LISP functions. But we still haven’t talked about anything computational — about anything to do with procedures or activity, or about what happens when you type something in to a 3-LISP process. In this section we will shift our attention from what expressions designate to what 3-LISP does with expressions — i.e., from a declarative to a procedural viewpoint. But we will still describe those procedures primarily extensionally, in terms of designated objects. Only in the next section will we talk about how 3-LISP works.
When you are typing things to 3-LISP, we will say you are talking to the 3-LISP processor — a particular process inside the computer that you can have conversations with about particular kinds of behaviour you would like to create. We will start with exceedingly simple interactions, but build up the complexity fairly quickly.
If you type (+ 2 3) to the 3-LISP processor (just processor, for short), it will respond with the expression 5. If you simply type 5 back to the processor, it will respond with the same expression 5. In general, the processor always responds with an expression that designates the same thing that your input expression designates (in this case, the number five), but that is in some simple or canonical form. We will call the expression that the processor types back the response. Just what is viewed as simple or canonical is a complex story in general, but the answer is straightforward for simple cases. In particular, for numbers the response will be the corresponding numeral expression (5 for the number five, in the previous example); for truth-values it will be one of the two boolean expressions $TRUE or $FALSE; for sequences it will be an expression of the form [x1 x2 ... xn], where the xi are, correspondingly, appropriate responses for each of the elements of the sequence; for strings, the response will be the string enclosed in double quote marks: "string-of-characters".
To put this more formally, the 3-LISP processor simplifies input expressions, returning responses that are in normal-form; it is therefore called a normalizing processor. The types of response just described are the normal-form designators for numbers, truth-values, sequences, and strings. We will talk about normal-form designators of functions in the next section. Normal-form expressions have a number of properties: the most important one is that they are independent of context; thus no response from the processor will ever be an identifier, or contain a free occurence of an identifier.
The general thing to remember is this: if you type an expression x1 to the 3-LISP processor, it will respond with an expression x2 that is a normal-form co-designator of x1.
The way that interaction with the processor happens is this: it types out a prompt, which is usually of the form ‘1> ’, and awaits your input. When you have typed one full expression, it will type ‘1= ’ followed by the normal-form response, and then repeat. Thus, the processor performs what is called a read-normalize-print loop. The following gives an example of some typical interactions. For pedagogical convenience of this primer, we have put the responses in italics, but this convention will probably not be supported by the implementation. One detail: anything on a line to the right of a semi-colon is taken as a comment, and is entirely ignored by the processor. This makes it convenient for you to annotate what you are doing, make notes to yourself, etc.
;;; This is a comment. The 3-LISP reader ignores everything between a ";"
;;; and a carriage-return. Here we use three semi-colons for aesthetics.
;;; The semi-colon needn’t be the first character in the line, but it
;;; is convenient for general comments like this one.
1> (+ 2 5) ; "1> " is the standard 3-LISP prompt. The user typed "(+ 2 5)".
; The processor reads (+ 2 5) and responds with a normal-form
1= 7 ; expression that designates the same number — seven.
1> [1 2 3] ; Then it prompts the user for another input.
1= [1 2 3]
We can try out some more complex function applications:
1> (* 100 (IF (= [] []) 3 4))
1= 300
1> [(* 1 1) (+ 1 1) (= 1 1)]
1= [1 2 $TRUE]
1> (TYPE 1)
1= ’number ; Ignore the single quote mark now; it will be explained later.
1> (TYPE +)
1= ’function
1> (TYPE TYPE)
1= ’function
As a notational convenience in this primer, we will use the symbol ‘g’ to mean "normalizes to"; thus we will say (+ 1 2) g 3; [(= 3 4)] g $FALSE, etc.
Given that we have started talking about behaviour and processors, we can start defining names to use in other expressions. In the purely declarative account we gave in section A.1, we talked about identifiers being bound in the scope of lambda expressions, and we used some convenient globally defined names like + and IF. But we didn’t have any way to define new names and have them stick. Now, however, we can do this. There are, in particular, two ways of defining new names in the global environment: SET and DEFINE. As a rule of thumb, you should use SET for simple setting of identifiers to designate objects like numbers and sequences and strings, and DEFINE for defining functions.
The form (SET var x) will cause the name var to designate the designation of x from now on, in the global environment. The response from a SET form is just the response from the x part, which, although sometimes useful, is by and large irrelevant. The expression (SET var x), in other words, is primarily used for its (procedural) effect, not for its (declarative) designation. It is more like a command than like a name (SET is sort of like a verb, whereas + and LENGTH are more like common nouns). Thus we have:
1> (SET X 3)
1= 3
1> X
1= 3
1> (+ X X)
1= 6
1> (SET L (LENGTH [10 20 30 40 50]))
1= 5
Identifiers that are bound in the global context, but that are also bound by patterns in lambda expressions, take their designation from the most local enclosing context. 3-LISP is said to be lexically scoped, because it is always the surrounding lexical context that determines bindings ("lexical" as opposed to "dynamic" — the latter referring to a dynamic aspect of the computation; we will explain all this in class). Some examples:
1> (SET X 3)
1= 3 ; X now designates three in the global context.
1> X
1= 3 ; Confirmation of this fact.
1> (LET [[X 4]] (+ X 1)) ; The local binding of X takes precedence.
1= 5
1> ((LAMBDA [Y] (+ X Y)) ; but only locally; the original binding is still
100) ; in force globally.
1= 103
With DEFINE we can define functions using lambda expressions. For example, the following defines a function called APPEND that concatenates two sequences. (Note: when talking declaratively, we would have said that APPEND was a function from two sequences to a sequence that is their concatenation, or something static like that. Now we say it "takes" two arguments, or "appends" two sequences — a more active way of speaking. This reflects the fact that we are now viewing (APPEND s1 s2) in a procedural light.)
1> (DEFINE APPEND
(LAMBDA [SEQ1 SEQ2]
(IF (NULL SEQ1)
SEQ2
(CONS (FIRST SEQ1)
(APPEND (REST SEQ1) SEQ2)))))
1= ’APPEND ; DEFINE responds with the name of the function defined;
; Ignore the single quote mark for now.
1> (APPEND [1 2 3] [4 5 6])
1= [1 2 3 4 5 6]
1> (APPEND [(= 1 2) (= 2 2)] [])
1= [$FALSE $TRUE]
The definition of APPEND is recursive, because the name APPEND was used within the definition. Recursion is an extremely powerful mechanism for defining functions.
The definitions just given illustrate a very important point about names. When defining a function, like APPEND, the names of the variables (i.e., the formal parameters) are chosen to reflect what happens one level of designation away. Thus we called the formal parameters of the APPEND function by the names SEQ1 and SEQ2, because they will designate sequences when the function is applied. We did not call them SEQUENCE-DESIGNATING-EXPRESSION-1 and SEQUENCE-DESIGNATING-EXPRESSION-2, even though in some sense they are sequence designating expressions. We, as external observers and theorists, look at these two identifiers, and use the phrases "sequence-designating-expression", in our theoretic language (English, as it happens), to describe those expressions in 3-LISP. Those 3-LISP expressions, however, are looking at sequences, and so are named in ways that are appropriate for the things they look at. This practice — of having expressions named in ways that correspond to the things they are looking at — works out very well indeed. (Imagine how complicated it would be if my name were ‘Brian’s-name’, rather than ‘Brian’.) It is also a pretty automatic practice for us native language users, in simple cases (there is very little tendency for you to call me ‘Brian’s name’). When we get to complex intensional situations, however, when we are using language to look at language, we will need to have this practice — which we will maintain — very clearly in mind, in order to avoid confusion. Also, there is an ambiguity in computer science about whether we are theoreticians looking at computer languages, or whether we are theoreticians using computer languages as extensions of our own language to look at the world. This ambiguity can also lead to some confusion. Although in some cases the latter is a powerful strategy, it must be our first priority, in this class, to do the former.
Another issue that comes up, once we adopt the procedural stance, has to do with how long things take. In the problem set, for example, you will encounter two different definitions of the fibonnaci function. In terms of designated function, these two definitions are identical; in terms of procedural consequence, however, they are quite different. The 3-LISP processor, for example, will respond to the input (FIBONNACI 100) with the number ??? in just a few seconds, given one definition; given the other, it would take it thousands of years to complete its calculation. Computational complexity — a large area of active research in computer science — is something we will not consider in too much detail in this course. See the discussion at the end of the first chapter of Abelson and Sussman’s book for more information.
Given expressions that have effects, it sometimes becomes important to delineate rather carefully the order in which things happen. For example, suppose we define a procedure that takes as arguments two strings, and that prints out these strings in a message. We might define such a routine as follows:
1> (DEFINE TEST
(LAMBDA [S1 S2]
(BEGIN (STRING-OUT PS "The first argument was ")
(STRING-OUT PS S1)
(STRING-OUT PS ", and the second argument was ")
(STRING-OUT PS S2)
(CHAR-OUT PS CARRIAGE-RETURN))))
1= ’TEST
1> (TEST "wonderful" "too")
The first argument was wonderful, and the second argument was too.
1= ’OK
STRING-OUT is a routine that prints its second argument on the terminal; it will be explained in due course.
More complex functions, and more complex ways of combining them, are explored in the first problem set, which you will be working through in sections and labs.