Page Numbers: Yes X: 306 Y: 1.0" First Page: 1
Margins: Top: 1.0" Bottom: 1.3"
Heading:
3-LISP IN INTERLISP3-LISP IMPLEMENTATION NOTESMarch 3, 1983
————————————————————
The INTERLISP-D Implementation of 3-LISP
————————————————————
This section briefly summarizes the current INTERLISP-D implementation of 3-LISP by relating it to the 3-LISP in 3-LISP implementation (3X3) presented in chapter 7 of the 3-LISP Reference Manual. Despite the numerous differences, these implementations are very much the same in spirit. To as great an extent possible, each "trick" is described in isolation from the others in an effort to keep this discussion simple; nevertheless, bear in mind that the current implementation incorporates all of these techniques and, consequently, there are a lot of piddling little details that have been omitted.
Structural Field (the Basics)
[This section could be considerably simplified when section 7.e (Implementing the Structural Field) is written.]
The 3X3 processor implements 3-LISP structures directly (that is why it is called a direct embedding of 3-LISP in 3-LISP). As a consequence, the 3X3 processor offers no explanation of how the 3-LISP structural field might be realized with existing data structure techniques.
Each of the 9 3-LISP internal structure types (atoms, numerals, booleans, charats, handles, rails, pairs, closures, and streamers) is represented by an INTERLISP-D DATATYPE. This allows 3-LISP’s TYPE primitive to be defined in terms of INTERLISP-D’s TYPENAME primitive. For example, the 3-LISP numeral 100 is represented by an instance of an INTERLISP-D DATATYPE called 3-NUMERAL-OBJECT whose single field contains the INTERLISP-D numeral (FIXP) 7. The 3-LISP boolean $F is represented by an instance of an INTERLISP-D DATATYPE called 3-BOOLEAN-OBJECT whose single field contains the INTERLISP-D atom NIL. Charats are handled in a similar manner. With the exception of booleans, the simple 3-LISP structures are not uniquely represented — e.g., there may be more than one INTERLISP-D 3-NUMERAL-OBJECT representing the same 3-LISP numeral. Hence, 3-LISP equality is defined in terms of the INTERLISP-D EQUALity of their fields.
A 3-LISP handle is represented by an instance of an INTERLISP-D DATATYPE called 3-HANDLE-OBJECT whose single field contains the INTERLISP-D object that represents the referent of the handle. Handles are not uniquely represented. Hence, equality of handles is determined by the 3-LISP equality of the referents. Note that the process of repeatedly going from a handle to its referent will always terminate — i.e., no handle is self-referential (or, in other words, no handle has an infinite number of ‘’s).
A 3-LISP atom is represented by an instance of an INTERLISP-D DATATYPE called 3-ATOM-OBJECT. Atoms are uniquely represented; two 3-ATOM-OBJECTs represent the same 3-LISP atom iff they are EQ in the INTERLISP-D sense. 3-ATOM-OBJECTs serve mainly as carriers of type and identity information — they have no user-accessible fields. However, since the current implementation doesn’t yet explain how READ and PRINT work, an extra field has been added to carry the atom’s print name. Additionally, global binding information is maintained with each atom (discussed later).
A 3-LISP pair is represented by an instance of an INTERLISP-D DATATYPE called 3-PAIR-OBJECT having two fields: one containing the INTERLISP-D representation of the CAR object, the other, that of the CDR object. Like atoms, pairs are uniquely represented.
A 3-LISP rail is represented by an instance of an INTERLISP-D DATATYPE called 3-RAIL-OBJECT having three fields: EMPTY, FIRST and REST. If the EMPTY flag is set, the rail represented is empty. Otherwise, the FIRST field contains the representation of the 3-LISP object that stands in the corresponding structural field relationship to the rail represented by this INTERLISP-D object; the REST field contains the representation of the 3-LISP rail that represents the first tail of this rail. Like atoms and pairs, rails are uniquely represented. Additionally, global binding information is maintained with rails (discussed later).
A 3-LISP closure is represented by an instance of an INTERLISP-D DATATYPE called 3-CLOSURE-OBJECT having four fields, corresponding to the four component of a closure. Like atoms, pairs, and rails, closures are uniquely represented. (Additional fields having to do with recognizing standard continuations, primitives, etc. are discussed later.)
It is important to note that the INTERLISP-D implementation distinguishes internal objects from external ones is the obvious way: by the use of 3-HANDLE-OBJECTs. For example, if $TYPE (the INTERLISP-D implementation of 3-LISP’s TYPE primitive) is passed a 3-CLOSURE-OBJECT, it will return a 3-HANDLE-OBJECT whose referent is the 3-ATOM-OBJECT that represents the 3-LISP atom FUNCTION; if passed a 3-HANDLE-OBJECT whose referent was a 3-CLOSURE-OBJECT, it would return a handle to the 3-LISP atom CLOSURE; and if passed a 3-HANDLE-OBJECT whose referent was another 3-HANDLE-OBJECT, it would return a handle to the 3-LISP atom HANDLE. As a second example, consider $PCONS, the INTERLISP-D implementation of PCONS. Contrary to what one might first expect, $PCONS does not simply create a new 3-PAIR-OBJECT capturing its two arguments and return it. Instead, $PCONS expects as arguments two 3-HANDLE-OBJECTs, constructs a new 3-PAIR-OBJECT containing the referents of its arguments, and returns a (newly-constructed) 3-HANDLE-OBJECT whose referent is that new 3-PAIR-OBJECT. (This suggests that the implementation may spend a substantial amount of time creating and de-referencing 3-HANDLE-OBJECTs — actual measurment shows that this is indeed the case.)
Structural Field (with REPLACE)
The above would be a true picture of how to represent the 3-LISP structural field if it were not for the REPLACE primitive. Recall that REPLACE can be applied to atoms, rails, pairs, and closures, with both arguments having the same type. After (REPLACE X Y), it will always be the case that the structure formerly designated by X will be the same as the structure designated by Y. To implement this, the representations of the four smashable structures have an INVISIBLE flag that, when set, indicates that this structure has been REPLACEd by another structure. This other structure is located via one of the fields of the DATATYPE (e.g., in the case of pairs, the CAR field of a 3-PAIR-OBJECT contains the representation of the replacement object). The presence of these "invisible pointers" (or "snakes") implies that all operations on instances of these DATATYPEs must first check to see if it is INVISIBLE and, if it is, follow the invisible pointer (and repeat the test). This constant checking for invisible pointers can be a source of inefficiency in the implementation. One observation that can help to lessen the impact is that since REPLACE is TYPE-preserving, there is no need to follow invisible pointers in order to determine the type of an object. However, it is essential that = follow them. Another implementation concern is the avoidance of creating a circular chain of invisible pointers; this can be easily avoided by treating (REPLACE X Y) as a no-op whenever (= X Y) holds beforehand.
Another slight complication: all smashable structures must carry a PROTECTED flag indicating that attempts to REPLACE this structure are to be bounced. The closures for all of the standard system procedures, their patterns and bodies, and the initial global environment rail are all protected through this mechanism.
Special Continuation Closure Representation
Closures for the standard processor continuations (i.e., the ones generated by MAKE-CONTINUATION) are represented with a special DATATYPE called a 3-SPECIAL-CLOSURE-OBJECT. A 3-SPECIAL-CLOSURE-OBJECT carries an explicit environment designator field, but the PATTERN and BODY are inheirited from an explicitly-stored TEMPLATE field, which contains the sample 3-CLOSURE-OBJECT for that family of continuations, and the PROCEDURE-TYPE is always the atom SIMPLE. (Like 3-CLOSURE-OBJECTs, INVISIBLE and PROTECTED flags are necessary.) The motivation behind having a special representation for standard continuation closures is that having fewer fields that full-fledged 3-CLOSURE-OBJECTs makes them faster to create. Space tends not to be much of a concern with closures since a relatively small proportion of them created tend to remain accessible for very long.
Tagging Closures
The 3X3 processor uses PROCESSOR-PROCEDURE to recognize standard processor procedures, PRIMITIVE to recognize the primitive closures, and COMPILED to recognize those closure that may be treated as if they were primitives. Since all of them are used regularly in :C-ARGS!, the speed of these operations will have a bearing on the overall speed of an implementation. In the INTERLISP-D implementation, extra fields have been added to closure representations to hold just this sort of information. The REDUCTION-TYPE field indicates if the closure is primitive (e.g., TYPE), compiled (e.g., 1ST), a processor-procedure (e.g., NORMALISE), or a processor-continuation (e.g., C-FIRST! family). The COMPILATION field carries the name of the INTERLISP-D procedure that is the implementation equivalent of the closure (for reflective closures, this field pertains to the implementation equivalent of a DE-REFLECTed version of the closure). The IMPLEMENTATION-OF procedure simply accesses this field.
Closures constructed with CCONS have a REDUCTION-TYPE field that indicates that they are of no special significance to the implementation. DE-REFLECT preserves the REDUCTION-TYPE and COMPILATION fields of its argument so that DE-REFLECTed versions of compiled kernel reflectives (e.g., IF) will also be given special treatment.
The closures for the standard procedures (all except the processor continuations) are protected from REPLACE and REBIND. This allows PROCESSOR-PROCEDURE to be implemented simply by checking the REDUCTION-TYPE. However, when attempting to recognize a processor continuation, it is necessary to inspect the environment designator of the closure to ensure that it is still intact.
Elimination of DE-REFLECT in :C-PROC!
The handling of reflective procedures in :C-PROC! is done with (CALL (DE-REFLECT PROC!) ARGS ENV CONT). Close inspection of the implementation processor with reveal that the closure constructed by DE-REFLECT is guaranteed to have a short lifespan (i.e., it will never "get out of the bag", so to speak). Provided that PROCESSOR-PROCEDURE and IMPLEMENTATION-OF do not insist that the procedure type of the closure be SIMPLE, we can simply eliminate the call to DE-REFLECT entirely. Taking care not to cause an automatic chicken-out, we can replace the CALL with (CALL-SIMPLE PROC! [ARGS ENV CONT]).
Who Won’t Cause a Shift Up in :C-PROC!
The INTERLISP-D implementation of :C-PROC! will not shift up upon encountering a reflective closure with one of the following COMPILATION tags: :IF, :COND, :LAMBDA, :AND, :OR, :BLOCK, :SET, :LET, and :DEFINE. All other reflective closures will cause a shift up. While not as general as 3X3’s handling of reflective closures, it does catch most of the frequently ocurring cases.
Who Won’t Cause a Shift Up in CONT?
The INTERLISP-D implementation of CALL will only stay level upon encountering a closure with a processor continuation REDUCTION TYPE and one of the following COMPILATION tags: :C-PROC!, :C-ARGS!, :C-FIRST!, :C-REST!, :C-BLOCK, :C-IF, :C-AND, :C-OR, :C-COND, :C-SET, and :C-REPLY. Other continuations will cause the processor to shift up.
Who Will Cause a Shift Down in :C-ARGS!
The INTERLISP-D implementation of :C-ARGS! will only shift down upon encountering a closure with a processor procedure or processor continuation REDUCTION TYPE and one of the following COMPILATION tags: :NORMALISE, :REDUCE, :NORMALISE-RAIL, :C-PROC!, :C-ARGS!, :C-FIRST!, :C-REST!, :BLOCK-HELPER, :C-BLOCK, :C-IF, :C-AND, :AND-HELPER, :C-OR, :OR-HELPER, :C-COND, :COND-HELPER, :READ-NORMALISE-PRINT, :C-SET, and :C-REPLY. Other closures will cause the processor to remain at the same level.
Special Contour Representation
According to the reflective processor, environments are encoded as rails structures. For example, the result of BINDing the pattern ’[A B C] and the arguments ’[1 2 3] in an empty environment will result in the rail [[’A ’1][’B ’2][’C ’3]]. In terms of the standard representation, this will mean that four 3-RAIL-OBJECTs and two 3-HANDLE-OBJECTs will be required per pattern variable. While space is not an issue, the time it would take to initialise all those structures definitely is. Moreover, there are also the independent considerations of being able to look up variables quickly and to rapidly recognize environments that were created when a processor procedure was entered (the latter task being MATCH-ENV’s).
For these reasons, a special DATATYPE called 3-CONTOUR-OBJECT is used to represent the "contours" ("stack frames", "activation records") created by the kernal operation BIND. These objects have four fields (not counting PROTECTION and INVISIBLE flags): VARS, a list of 3-ATOM-OBJECTs bound in this contour; VALS, a list of their respective values; CONTAINER, the object representing the enclosing environment; and TAG to record the source of the contour. If a contour object is poked at with NTH, TAIL, or other rail operations, it will be automatically exploded into the appropriate web of 3-RAIL-OBJECTs (the invisible pointer mechanism allows this to be done without fuss). However, both BINDING and REBIND, the most common operations on contours, do not cause contours to be expanded.
Whenever a contour is created by BIND, TAG is set to the value of the COMPILATION field of the closure that supplied the pattern and enclosing environment. This allows MATCH-ENV to be implemented simply as a check against a contour’s TAG.
Parameter Pattern Matching
The 3X3 processor relies heavily on the parameter pattern matching of the implementation to trap errors on calls to those procedures which the implementation handles specially; the only explicit pattern match is the call to BIND in EXPAND-CLOSURE, an that will only get done when a used procedure is invoked. The INTERLISP-D implementation differs in that it always calls BIND in :C-ARGS! (unless, of course, it chickens out), and REGISTERs the contour resulting from this call rather than the arguments that were matched. The following modified 3X3 processor illustrates this transformation:
(define :C-ARGS!
(lambda simple [args!]
(import [proc! cont]
(if (primitive proc!)
(call cont (block (bind (pattern proc!) ; Just for errors.
args!
(environment proc!))
↑(proc! . args!)))
(let [[new-env (bind (pattern proc!) args! (environment proc!))]]
(if (processor-procedure proc!)
(block (shift-down cont)
(register proc! new-env)
((implementation-of proc!) . args!))
(expand-closure proc! new-env cont)))))))
(define EXPAND-CLOSURE
(lambda simple [proc! env cont]
(call normalise (body proc!) env cont)))
(define CALL
(lambda macro exp
̀ (let [[(fun ,(1st exp)]]
(if (or (reflective ↑fun) (primitive ↑fun))
(expand-closure @last-processor-procedure @last-contour (shift-up))
(call-simple ,(rest exp))))))
(define CALL-SIMPLE
(lambda simple [fun args]
(let [[new-env (bind (pattern ↑fun) ↑args) (environment ↑fun))]]
(if (processor-procedure ↑fun)
(block (register fun new-env)
((implementation-of ↑fun) . args))
(expand-closure ↑fun new-env (shift-up)))
(define REGISTER
(lambda simple [fun env]
(block (set @last-processor-procedure ↑fun)
(set @last-contour env))))
(define MAKE-CONTINUATION
(lambda simple [template]
(simple ↑@last-contour (pattern template) (body template))))
Use of Global Registers
The 3X3 processor uses the implementation language’s parameter-passing mechanism to communicate between the processor procedure implementations. In the INTERLISP-D processor, all communication between processor procedure implementations is via a set of registers (global variables). For example, :NORMALISE accepts its three arguments in the registers @EXP, @ENV, and @CONT. The protocol for calling a continuation is to place the result in @RESULT, the continuation to call in @CONT, and call CONTINUE. Logic in CONTINUE takes care of moving the contents of @RESULT into the register appropriate for the continuation, as well as the restoration of the context (formerly IMPORT’s job).
The following registers are current defined: @PROC, @PROC!, @ARGS, @ARGS!, @EXP, @ENV, @CONT, @RAIL, @FIRST!, @REST, @REST!, @RESULT, and @VAR.
Running with Handles Stripped Off
Within the reflective processor the variables EXP, PROC, ARGS, PROC!, ARGS!, RAIL, FIRST!, and REST! will (almost) always designate internal structures. This means that in the implementation processor, these variables will (almost) always be bound to 3-HANDLE-OBJECTs, which is inconvenient since these handles will be forever being dereferenced. Most of this dereferencing and (a lot of handle creation, too) can be avoided by adopting the convention that the implementation processor will always function with one level of handle stripped off of these variables. Of course, care must be taken to ensure that the handles are re-attached whenever necessary, but this is no real problem. Also, there will be additional situations that will require chickening out — e.g., out of :NORMALISE if a call to BINDING doesn’t return a handle.
Driver
Whereas the 3X3 processor simply uses tail-recursive calls to pass control from one processor procedure to the next, the INTERLISP-D processor is written as a big PROG, with GOs being used to transfer control around. Actually, this is the way it would have been written had the PROG not been so large (20 pp.) as to be too awkward to work with. Instead, there is a driver loop:
(DE 3-DRIVER () (* This is INTERLISP code.)
(PROG ()
LOOP (SETQ *PC* (APPLY* *PC*))
(GO LOOP)))
and a macro called @GO that is used to pass control (but not parameters) to one of the processor procedure implementations indirectly via the driver loop (e.g., (@GO :NORMALISE)).
(DM @GO X
(LIST ’RETURN (LIST ’QUOTE (CAR X))))
Delaying/Avoiding the Creation of Contours and Continuations
The creation of continuation closures by the implementation processor is expensive in a number of ways. There is the obvious cost of allocating and initialising these closures. Less obvious is the cost of later recognizing an intact processor continuation closure, and restoring the context saved in it.
He is what we can do. We maintain a single register called @LAST-REGISTRATION that will keep track of how far behind we are in having up-to-date information recorded in the registers @LAST-PROCESSOR-PROCEDURE and @LAST-CONTOUR. Here are typical values of @LAST-REGISTRATION and how they are interpreted:
NILWe have not registered for anything; @LAST-PROCESSOR-PROCEDURE and @LAST-CONTOUR are completely out of date; the state of the computation is in the registers; e.g., when :NORMALISE calls :REDUCE, it passes arguments in registers but does not bother to REGISTER.
:REDUCEWe are registered for REDUCE; both @LAST-PROCESSOR-PROCEDURE and @LAST-CONTOUR were up-to-date at that time. This situation happens when we shift down to call :REDUCE.
:C-PROC!We are registered for C-PROC!; both @LAST-PROCESSOR-PROCEDURE and @LAST-CONTOUR were up-to-date at that time. This situation happens when we shift down to call :C-PROC!.
:C-PROC!-NO-CONTOURWe are almost registered for C-PROC!; @LAST-PROCESSOR-PROCEDURE is up-to-date, but @LAST-CONTOUR is not. This occurs when :C-PROC! was entered from one of the processor procedures via a CALL to the continuation.
:C-ARGS!We are registered for C-ARGS!; both @LAST-PROCESSOR-PROCEDURE and @LAST-CONTOUR were up-to-date at that time. This situation happens when we shift down to call :C-ARGS!.
:C-ARGS!-NO-CONTOURWe are almost registered for C-ARGS!; @LAST-PROCESSOR-PROCEDURE is up-to-date, but @LAST-CONTOUR is not. This occurs when :C-ARGS!was entered from one of the processor procedures via a CALL to the continuation.
Since it is only possible to chicken out or to call MAKE-CONTINUATION when both @LAST-PROCESSOR-PROCEDURE and @LAST-CONTOUR accurately reflect the state of the processor at the current level, the processor must be prepared to consult @LAST-REGISTRATION and catch up on the backlog of work.
A number of continuation closures that the processor constructs can be eliminated simply by expanding CALLs inline at those sites where some simplification can be done. For example, the CALL to NORMALISE within :REDUCE builds a fresh C-PROC! continuation which might get CALLed immediately if it happens that PROC designates an atom or a normal-form structure. A transformed version of :REDUCE follows:
(define :REDUCE
(lambda simple []
(cond [(normal @proc)
(block (set @proc! @proc)
(call-c-proc!-without-changing-registration))]
[(atom @proc)
(block (set @proc! (binding @proc @env))
(call-c-proc!-without-changing-registration))]
[(pair @proc)
(block (set @cont (build-c-proc!-continuation))
(set @args (cdr @proc))
(set @proc (car @proc))
(call-reduce-after-deregistering))]
[(rail @proc)
(block (set @cont (build-c-proc!-continuation))
(set @rail @proc)
(call-normalise-rail-after-deregistering))])))
Normalisation of Trivial Rails
Call a non-normal-form rail trivial if its elements are either atoms or normal-form designators. Trivial rails are important since we are guaranteed that the implementation will never shift levels while normalising a trivial rail. This means that the implementation has the freedom to significantly change its handling of these structures; in particular, there is no real need to create any C-FIRST! or C-REST! continuation closures.
The current INTERLISP-D processor special cases trivial rails found in the argument position of a redex. The processor constructs the result of normalising the elements and passes it to :C-ARGS! directly.
The major difficulty with this trick is that the processor has no information in advance that indicates whether or not a rail is trivial. Consequently, it can only presume that the rail is trivial and begin to normalise it quickly. Should it turn out that the rail is non-trivial, the intermediate result will be discarded and :NORMALISE-RAIL invoked. The gains made by quickly normalising trivial rails are therefore diminished by the time wasted in determining that a rail is non-trivial.
Variable Lookup
The overall speed of any implementation will be influenced to a great extent by the techniques employed in determining the binding of a variable in an environment. According to BIND and REBIND in the reflective processor, a deep binding scheme is employed. However, this implementation uses a shallow binding scheme for global variables (i.e., atoms having bindings in the global environment rail). When BINDING encounters the global environment’s rail, it consults special fields of the atom that it is looking up in order to obtain its global binding (if any). Similarly, the implementation of REBIND knows how to create/change global bindings. Since most environments are extensions of the global environment, and since a large number of free variables are bound in the global environment, and since the global environment tends to be quite large (150+ bindings initially), this scheme tends to substantially improve the performance of the processor.
Shallow binding of global variables is implemented as follows. Extra fields are added to 3-ATOM-OBJECTs to indicate whether this atom is currently bound in the global environment and, if so, what is the corresponding binding. Moreover, to ensure that arbitary REPLACEs will not invalidate the global binding mechanism, all rails involved in the global environment are specially marked. For example, if X were bound to 1 in the global environment then some tail of the global environment rail would look like [[’X ’1]...]. The 3-RAIL-OBJECT representing this rail would be flagged as being part of the global environment (the GLOBALENV flag). The 3-RAIL-OBJECT corresponding to [’X ’1] would be marked as the first part of a global binding rail (the GLOBALVAR flag), and its first tail ([’1]) would be marked as the portion of a global binding rail containing the variable’s value (the GLOBALVAL flag). The start of the global environment rail is specially marked as such (the GLOBALHEAD flag). The 3-ATOM-OBJECT for X would be marked as having a global binding (the GLOBALLYBOUND flag (actually, its a timestamp)) and would have the [’1] rail associated with it. Whenever the first argument to REPLACE designates one of these specially-marked structures, all global binding information is recomputed from scratch (but REBIND is smart enough to avoid all this work).
There are some bizarre cases to be taken care of. For example, mangling the global environment rail is such a way that BINDING or REBIND would croak — e.g., (RPLACN (LENGTH GLOBAL) ↑GLOBAL ’’DIE) — should disable all fast lookups so that these errors will indeed be detected. Also, care must be taken to ensure that for variables with more that one binding in the global environment are handled in a manner consistent with the definition of BINDING.
Eliminating Calls to NORMAL
The call to NORMAL in :NORMALISE can be all but eliminated by expanding it inline and then rearranging the alternatives. The following rendition of :NORMALISE illustrates this:
(define :NORMALISE
(lambda simple [exp env cont]
(cond [(atom exp) (call cont (binding exp env))]
[(pair exp) (call reduce (car exp) (cdr exp) env cont)]
[(rail exp)
(if (normal-rail exp)
(call cont exp)
(call normalise-rail exp env cont)
)]
[$T (call cont exp)])))
Speeding up NORMAL-RAIL
The global environment rail must always be a non-normal-form structure; otherwise, the tower would crumble since (ENVIRONMENT-DESIGNATOR ’{simple NORMALISE closure}) is being done on a regular basis. Hence all its elements, and all its tails are also normal-form designators. Therefore, the implementation of NORMAL-RAIL may immediately return $T when it encounters a designator of one of these structures. This turns out to be cheap to implement because the global binding mechanism already requires these structures be marked (GLOBALHEAD, GLOBALENV, GLOBALVAR, and GLOBALVAL flags).