Heading:
Mesa Response to Lisp Challenge
Page Numbers: Yes X: 527 Y: 10.5"
Inter-Office Memorandum
ToPE-InterestDateOctober 9, 1978
FromLampson, LevinLocationCSL, Palo Alto
SubjectMesa Response to Lisp ChallengeFile[Ivy]<Lampson>mesaresponse.memo
XEROX
We culled five major (read "potentially difficult to implement") facilities from the memos by Bobrow and Masinter:
1)Programs as data
2)
Variable syntax analysis
3)
Dynamic variable binding
4)
Safe language (subset)
5)
Automatic storage allocation
This memo summarizes the analysis presented at the September 21st meeting.
1)Programs as data.
The ability to treat programs as data requires three major facilities within the Mesa environment: (a) an S-expression-like representation of source programs, (b) a means of using atoms (in the LISP sense) uniformly throughout the environment, and (c) a universal pointer mechanism.
S-expression representation. This seems straightforward in principle, though we haven’t looked at the details. We think that a representation other than parse trees should be explicitly defined, permitting the representation of parse trees to change as the compiler/language evolve.
Atoms. The symbol tables built by the Mesa compiler currently include a mapping between source identifiers and internal unique ids (atoms). At present, these mappings exist only on a per-compilation unit basis, implying that they will need to be merged into a global map when the compilation unit "enters the environment". A similar merge presently occurs inside the compiler when the source program accesses an included module (i.e., the symbol table for that module is opened and the identifiers (used by the source program) are remapped into the current atom table). A merge into a global atom table therefore seems feasible, at least in principle.
Universal Pointers. We will need pointers that carry the type of their referent with them. This in turn requires a scheme for universal type IDs, which do not currently exist in Mesa. It seems possible to extend the techniques used to ensure uniqueness of record types to accommodate general types. Run-time efficiency is vital, so the check that two type IDs are equivalent must be fast. We anticipate using the LISP "spacing" technique, which places all data structures of the same type within a well-defined region of the address space (e.g., each page allocated for such dynamic structures can hold objects of precisely one type). This scheme avoids keeping any type ID bits with the pointer, but makes it difficult to point inside a composite data structure with a universal pointer. Two possible remedies have been suggested: (1) an explicit type identifier immediately preceding the (sub)structure, or (2) an indirection mechanism in the data structure itself. In any event, access to statically allocated variables (i.e., those in local and global frames) via universal pointers will have to be prohibited, unless special steps are taken to prevent problems with garbage collection.
2)Variable syntax analysis.
LISP provides three opportunities for the programmer to affect the parsing/interpretation of input text: (a) at scanning time, (b) at compile time, and (c) at function call time. (a) may be accomplished by providing scanner escapes to user-written code, which will replace some appropriate substring of the current parsing context (i.e., the as-yet unclosed lists) with a piece of program in S-expression form. It appears straightforward to provide a facility comparable in power to LISP’s readtables -- a more elegant mechanism might be desirable. (b) may be accomplished by more-or-less conventional macro expansion along the lines of the Mesa mechanisms for in-line procedures. (c) is deemed undesirable and will not be provided.
3)Dynamic variable binding.
Mesa, at present, can only support static binding of identifiers. To be able to bind a name to a variable in an enclosing dynamic context, we need to know what variables are to be found in each frame. Instead of the LISP default, we propose that variables to which dynamic binding may occur be declared as such at compile time. The compiler can then produce a table of entries of the form
atom -> <frame,type>
for each compilation unit. Such a table could reside in the code segment, with a bit in the associated global frame(s) to indicate its existence. Dynamic binding would then occur much in the same way the signal-to-catch phrase binding occurs (but more efficiently).
4) Safe language
By a safe language we mean here one in which a pointer value of a given type POINTER TO T is guaranteed to point to a region of storage containing a value of type T. We will also consider a relaxation of this condition in which certain regions of the storage may not be safe in this sense; as long as values stored in such unsafe regions do not themselves contain pointers to safe regions, the damage which can be caused is confined to the unsafe regions.
There are three ways in which a pointer variable p may acquire an improper value
1)Lack of initialization - in this case the value of p is just the random bit pattern which happens to be in the storage occupied by p when it is instantiated.
2)Incorrect computation - in general a pointer value can only be the value of some pointer variable (which could of course be a record component) or the result returned by a storage allocation. If pointer values can be computed in other ways, e.g. with LOOPHOLEs or their equivalents, or by arithmetic, then there is no guarantee that such values point to storage containing values of the proper type.
3)Premature deallocation - in this case p is pointing at a region of storage which once contained a value of type T, but is now being used for some other purpose.
If none of these bad things can happen, then induction tells us that the language is safe with respect to pointers. In addition, it is necessary to do bounds checking on array references, since otherwise every array index is as dangerous as a pointer. Finally, array descriptors require the same treatment as pointers.
Preventing uninitialized pointers is straightforward, by generating code to initialize all pointers in procedure frames when they are allocated, and in dynamic variables of record and array types when they are allocated. Mesa 5 will have language constructs for allocating dynamic variables, so it is known where to do the initialization.
Preventing incorrect computation requires excluding all the dangerous constructs from the language
a)Loopholes into a pointer type or a pointer-containing type (PC for short)
b)Computed or overlaid variants in which the variant part is PC
c)Pointer arithmetic
d)Use of UNSPECIFIED (this restriction is probably stronger than necessary)
e)What did I forget
Preventing premature deallocation has two aspects: dynamic (explicitly allocated) variables and stack variables. For the latter, a simple and reasonable rule seems to be to disallow the use of @ on local variables. As partial compensation, it seems desirable to introduce var parameters (call by reference); these can be thought of as pointers to stack variables which are always dereferenced, so that it is not possible for the called procedure to make a copy of the pointer which will outlive the call. We also need to enforce a restriction that var parameters are not permitted across process and coroutine calls, since in those cases there is no guarantee that the caller will live longer than the called procedure (again, this seems too strong, but it isn’t clear how to weaken it).
For dynamic variables some support from an automatic deallocation mechanism seems essential, since otherwise there is no way to tell when the programmer says Free(p) whether other pointers to p↑ still exist. We see three possibilities
a)Reference counting. This could be added to Mesa easily enough, but the cost during execution is rather high, and it is probably not a good idea in view of the other alternatives.
b)Deferred freeing. The idea is to make a list of all the storage freed by the programmer, and at suitable intervals to invoke the scanning phase of a garbage collector and check that there are no outstanding references to the storage queued to be freed. At the end of the scan any storage which has passed this check can be actually freed.
c)Automatic deallocation, discussed below.
Quite possibly deferred freeing will be worthwhile, since it is fairly easy to implement and has substantially lower running costs than fully automatic deallocation. With plenty of address space there should be no serious objection to a long interval between the programmer’s call of Free and actual availability of the storage.
Mesa currently has primitive and unsafe facilities for relative pointers, which there is a design to strengthen and improve. Relative pointers have several appealing properties, one of which is particularly relevant in this context
a)They allow an entire complex structure to be moved in storage, or between storage and a file, as long as all its contained pointers are relative to the structure.
b)They can save substantial amounts of space, especially as the address space grows and pointers become longer.
c)Misuse of a relative pointer can only damage data in the region it refers to. If none of the storage in this region contains any external pointers, then no global damage can occur. Embedded external pointers are still possible if they are treated as array indices, i.e. go through fingers. The advantage is that the programmer can forgo the safety precautions for such relation pointers without risking the integrity of the entire system. Of course a bounds check is still required (analogous to an array bounds check), but in many cases of interest it can be provided by storing the pointers in n-bit fields and making the age of the region 2n; an especially nice value for n is 16.
5) Automatic deallocation
Our proposal is to have automatically deallocated (AD) types, and to use the Deutsch-Bobrow scheme for incremental garbage collection. This requires knowledge of the location and type of every pointer to an AD type, but the techniques are well-known and straightforward.