[_CD8_]<cedardocs>PE2>Report>pe2.3b1!1

Heading:

Second EPE report (section 3b, Lisp)

Page Numbers: No

Second EPE report (section 3b, Lisp)

Mesa facilities needed in Lisp

The following facilities from the ABC list are present in Mesa but not adequately available in Lisp:

(A2) Statically checked type system
(A4) Abstraction mechanisms, explicit notion of "interface"
(A6) Adequate run-time efficiency
(B1) Encapsulation/protection mechanisms
(B4) Consistent compilation
(B9) (in some contexts) User ability to pack data

In the Lisp tradition, we propose to add these facilities by defining new objects to represent types, name scopes, etc. and associated functions for manipulating them. These objects are accessible to the programmer at run time; however, greater efficiency can be obtained by compiling programs with respect to a particular state of these objects, and paying the price of recompilation (or, more typically, reversion to interpreted or less fully compiled status) if a change is made. A similar comment applies to static checking -- a change in a type or scope requires re-checking the affected parts of the program.

The remainder of this section describes the principal features of a Lisp dialect we have been calling XLisp, which we propose as the EPE base. XLisp includes the current Interlisp facilities as a special case, but it is based primarily on SCHEME, a lexically-scoped Lisp dialect developed at MIT, with additions motivated by the Mesa type and scope system. We intend XLisp not only as a response to the EPE challenge, but as a conceptual framework for thinking about scope and type issues which has some room for future exploration of capabilities not present in either Lisp or Mesa.

To summarize how we propose to provide each of the six facilities listed above:

Static type checking -- by making types be objects, and (as needed) checking that all users of a declared variable refer to identical (EQ) type objects.

Explicit interfaces -- by defining an object called a dictionary that generalizes the notions of parameter list, record, and interface, and by requiring that such an object be associated with every defined or imported function.

Greater run-time efficiency, packed data -- by making the internal representation of a quantity be one of its attributes in the type system, and extending the instruction set to allow more efficient execution when more tightly bound representations are used.

Encapsulation/protection -- by associating protection information with entries in dictionaries.

Consistent compilation -- by extending the notion of type identity and compatibility into the permanent filing system, through an object called a permanent pointer which somewhat resembles Mesa’s time stamps.

Dictionaries

A new kind of object called a dictionary is central to the interpretation of names (and, hence, programs) in XLisp. A dictionary consists of one or more name tables, each of which applies to a different syntactic context ("function" or "variable") and specifies the interpretation of a set of names in that context. Each entry in a name table consists of a name (atom) and an associated property list. Dictionaries implement function argument and result lists, record types, and interfaces: the latter uses are discussed more fully under Types below.

Dictionaries are the building blocks for scopes. Conceptually, a scope is an area of a program within which the interpretation of names does not change; implementationally, it is an object which specifies how to determine the meanings of names. (We will freely confuse the concept and implementation in the discussion that follows.) A scope is made up of components, each of which gives the meaning of some names, and a rule for deciding how to resolve conflicts and what to do with undefined names. Each component may be a scope itself, or it may be a dictionary, which actually associates names and meanings.

Every name in a program must eventually resolve to a meaning of one of three basic kinds: manifest, in which the actual value is found in a dictionary (for example, local and system functions, record names, and what are now thought of as compile-time constants); dynamic, in which the value is found in a run-time data structure (for example, most variables and non-local functions); or unbound, in which only declarative information (if that) is available. In principle, every dynamic name eventually reduces through macro-expansion to a value obtained by applying elementary operations (at roughly the instruction set level) to constants and to pointers which form part of the virtual machine’s state, such as the current stack frame pointer. These elementary operations are available as functions in the system, since they are needed for the interpreter and compiler, but their use is deliberately rendered inconvenient since it can lead to violations of scope rules on which not only compiler optimizations but the correct working of programs may depend. For ordinary use, we provide, instead of these primitives, a set of higher-level constructs for use in name tables: these constructs lend themselves better to static analysis and permit enforceable protection boundaries. These constructs are the following:

(MANIFEST value) -- the actual value is present in the name table. The value may be any Lisp object (data structure, function object, etc.). The value is treated as a constant: changing it may require recompiling any function that uses it. If it is a non-scalar value, it is only given out in read-only form.

(SCRATCH value) -- the value is present in the name table, but is not read-only. One can think of this as an OWN structure in the Algol sense.

(MACRO function-object) -- the name table contains a function which is applied to the form (application or atom), and the result is taken in place of the original form. It is vital that macros produce consistent results, i.e. only depend on the form being expanded and the scope in which it is being expanded. We assert that is sufficient to analyze the macro function and all functions it calls to make sure that they refer only to local and manifest variables, and to pass the macro function its arguments in read-only form; then any operations with side effects, if applied to objects not created during this call of the macro function, will cause an error.

(FLUID fluid-cell) -- this is the traditional Lisp binding method. Note that function and variable names are different for fluid binding purposes, and that the name itself is not a global atom but an object local to the dictionary, so that the binder and the user of a fluid name must share the dictionary.

(UNBOUND)

(LOCAL location-in-frame) -- this is the SCHEME lexically bound variable. It differs from a fluid variable in that closures which use local variables free retain a shared pointer to the binding in effect at the time the closure is created, while a free use of a fluid variable refers to the binding at the time the closure is applied. LOCAL names are also used for fields in records: in this case, location-in-frame defines the location of the field within the record instance.

(RELATIVE name scope base) -- this is used to refer to objects found in other scopes. Examples include argument or outer PROG variables referenced from within a PROG, instance variables and abstract operations in a record, and individual entries (usually manifest or fluid) imported from other scopes. A relative entry contains three parts: a name, a pointer to the scope in which to interpret the name, and, if the interpretation is local, an expression (interpreted in the original scope) which gives the base pointer. This base pointer becomes the default base pointer for the purpose of the interpretation, so cascading of relative entries will work properly. The Mesa OPEN construct amounts to making appropriate RELATIVE entries in the current scope for all the names in the object being opened.

Types

Framework

Types are objects, just like integers or lists. There are three kinds of types: interface or I-types, which describe how objects behave in general; descriptive or D-types, which specify predicates that hold true of objects in particular situations; and representation or R-types, which describe the bit patterns used to represent objects inside the machine. Conceptually every object carries its I-type with it at execution time, but not its R-type (D-types are associated with variables, not with objects). The system provides a default representation for every I-type. The phrases "X is of type T" and "X satisfies type T" are interchangeable -- I- and D-types are predicates.

I-types are built up from elementary types like INTEGER and ATOM by (possibly recursive) applications of the following operators (the syntax is for expository purposes only):

SATISFIES[p:FUNCTION[[t:Type],[Boolean]]] is a type for objects x of type t such that p[x] is true.

ANYOF[t1 ... tn:Type] is a type for objects which can be of any of the types t1 ... tn.

ALLOF[t1 ... tn:Type] is a type for objects which must be of types t1 ... tn simultaneously.

READONLY[t:Type] is a type for objects which are of type t shorn of any operations which produce externally visible side-effects (more on side-effects below).

UNIQUE[t:Type] is a type which is the same as t, except that objects of this type are distinguishable from objects of type t and of any other type created by UNIQUE[t]. This operation is sometimes called "painting".

FUNCTION[argT,resultT:Type] is a type for functions which take arguments of type argT and return results of type resultT. ArgT and resultT must be structure types.

VARID[t:Type] is a type for variable id’s of type t. Variable id’s are used to link together binders and users of variables, leaving the identity of the variable unbound. (Same idea as the Mesa SIGNAL linking mechanism.)

Structuring -- read on.

The other kind of definable I-type is a structure type. It has named components, each of which may have type declarations. There are two kinds of components: fields, which obey certain rules set forth below that make them act like variables, and operations.

Every field component has two associated operations called fetching and replacing that field. These operations obey the following rules:

(1) Fetching a field has no externally visible side-effects (although it may have internal side-effects like caching or statistic-taking -- obviously "visible" can’t be defined rigorously). In particular, it does not affect the contents of any field.

(2) Replacing a field has no externally visible side-effects on any operation other than fetching that same field.

(3) Fetching a field always returns what was last replaced into that field.

(4) No operation other than replacing a field affects what fetching that field will return.

The default R-type for fields is a storage cell: if the programmer specifies a different R-type, its adherence to the above rules is up to him.

Fields in structure types may have names, or an ordering, or both. The ordering is for argument and constructor lists -- it has nothing to do with storage format.

Arrays are a structure type in which the fields, instead of having names, are indexed by a set which may be of any type. If the type is an integer range, the default representation is a traditional array; if not, the default is a hash array.

An interface type, in the Mesa sense, is a structure type with fields which are of various function types. A Mesa module is a group of functions which all agree to use the same instance(s) of the interface types they import. Making an instance of a module consists of making an instance of the interface type it implements, filling in the fields with closures of the exported functions in which the imported interface pointers get bound.

Declarations

A D-type declaration just establishes an invariant for a variable -- checked whenever the variable is stored into, can be assumed whenever the variable is used. D-types can also be used as predicates, and wrapped around expressions. R-type declarations specify the representation of a quantity and hence are only meaningful for variables, as opposed to arbitrary expressions. ("Variables" include record fields, of course.)

Conformity and coercions

Type checking depends only on the I- and D-types of the objects. The conformity relation is straightforward to compute, given the type composition primitives. The compiler does all it can, and then produces code to do the remainder of the check.

Coercions are meaningful at both the I-type and R-type levels. I-type coercions up or down the type lattice are mostly either no-ops or type checks, and can be supplied automatically. Other I-type coercions require user-supplied functions. R-type conversions require user-supplied functions to convert to and from the default R-type, the familiar Lisp typed pointer.

Protection

We implement compile time checking by providing information in a dictionary that specifies the external accessibility on a name-by-name basis. The actual access rights are determined by a function of the originator, the path used to arrive at the dictionary entry, and the information in the entry itself.

Mesa provides four kinds of protection that are not checkable at compile time:

1) Variables, unless otherwise declared, are not accessible at run time by programs outside the lexical scope where they are defined.

2) A client and an implementor must use identical (EQ) versions of an interface.

3) An implementor can define types whose contents are (partially) private, but which the client can use to declare variables.

4) The types of the arguments passed in a control transfer always agree, regardless of how the environment receiving control was obtained (e.g. by binding, through a PORT, through a procedure variable).

In Mesa, all of these are based on intra-module checking by the compiler, and inter-module enforcement based on (2). This, in turn, is accomplished by creating a unique name for an interface when it is compiled, and copying that unique name into both the client and the implementor when they are compiled.

In XLisp, the privacy information in dictionaries and scopes provides (1), since the only way to access a non-local variable is by going from a frame or other object to the dictionary that describes it, and the operations to access objects in general are implemented within dictionaries and check the privacy information.

Our type checker provides (2) by simply using EQ -- XLisp always keeps with any compiled object a pointer to the dictionaries accessed during compilation, which include all imported and exported interfaces. The permanent pointer mechanism described under Miscellaneous below guarantees that the EQ relation is correct even when externally filed data are involved.

(4) is also a type checking question. If the user declares procedure and environment variables appropriately (which is automatic if an interface dictionary is involved), then the check can be made when the interface is bound rather than at the time of the control transfer.

The heart of our protection system is (3). Something like (3) is required to be able to have virtual objects, and objects accessible only through interface procedures (operations). In XLisp one can declare a record type as having sealable instances. What this means is that the record has a piece of code associated with it, and a "sealed" flag. If the record is in unsealed state, anyone may access it freely. If the record is in sealed state, only the associated code may access it. A record must be sealed to be used as the local environment of a function: the operation of calling a function logically breaks down into the steps of creating an instance of its local frame type, filling the parameters into the instance, sealing the frame, and then running the function. Similarly, records which define Smalltalk-like instances are sealed in a way that accepts incoming messages and then executes the implementing function. Such functions can access the sealed record by having a link in their scope structure that says they implement an operation on that record. (This is no less safe than the IMPLEMENTING construct in Mesa.)

A sealable record and its underlying unsealable record are two different types. Coercing the former to the latter (going into an implementor) requires that the sealed flag be set. Coercing the latter to the former is always possible -- the result is always sealed. The implementor can specify an arbitrary error check to be applied to a sealable record in order for the seal to be set: the check and the sealing take place atomically. This can be accomplished by copying the record, doing the check, and then atomically copying the checked result back. Alternatively, the record can be accessed through a level of indirection, and the indirect pointer can be disabled at the start of the check.

Miscellaneous

Filing

XLisp programs contain inter-program references (to dictionaries, in particular) that must be representable on external files, so that a program knows it will have the same environment when run subsequently, rather than some environment made up of objects that happen to have the same strings as their names.

We define a new object called a permanent name (PN) which consists of two parts: a file name (a string), which is to be considered only a hint, and a permanent id (PID, an integer of roughly 64 bits), which is the actual name of the object, and which is guaranteed different for every permanent object. Potentially, many different kinds of objects have permanent names -- function definitions, dictionaries, maybe any object whatever. To find the object addressed by a permanent name, it is necessary to hunt around in the file system, presumably starting at the file named in the PN, until an object with the right PID is found. Each file contains a directory that maps PIDs to text representations of objects in the file.

A PN is only needed for an object if a cross-file reference (or multiple reference within a single file) is made to it. PNs and PN directories are large, so we think it would be satisfactory to assign a PN only when (1) an object was written on a file, and "potentially referenced via a PN" was a property of its type; (2) a program asked for a PN for an object.

PNs are a basic mechanism for constructing permanent pointers. To make them useful, one needs such things as the notion of versions of an object, directories mapping PNs of original versions to PNs of current versions, etc. The crucial question is one of overwriting: how can one change an object without changing its PN? Fortunately most (all?) file storage systems provide some kind of atomic operation which is guaranteed to execute either completely or not at all, and an atomic overwriting operation can be constructed using this.