Heading:
Cedar Mesa Proposal
Page Numbers: Yes X: 527 Y: 10.5"
PROPOSAL — Circulated for comment — PROPOSAL
PROPOSAL — Circulated for comment — PROPOSAL
CSL Notebook Entry
ToCSLDateSeptember 1, 1979 9:43 PM
Programming Environment Interest Group
FromCedar Language Features Design CommitteeLocationPARC/CSL, Palo Alto
SubjectCedar Mesa ProposalFile[Ivy]<CedarDocs>Lang>Proposal.press
XEROX
Archive ID:yyCSL-xxx
Attributes:proposal, technical, Cedar, Mesa, Programming research
References:(All filed on [Ivy]<CedarDocs>)
"Some Preliminary Thoughts"—Ed Satterthwaite, Feb. 14, 1979
"Unsafe at any speed?"—Jim Horning, Mar. 2, 1979
"Minutes of 2-28-79 meeting"—Deutsch, Mar. 4, 1979
"Aspirations for run time types"—Deutsch, Bobrow, May 9, 1979
"Apr-May-Design.mail"
"June-Design.mail"
"Language changes for ’objects’ in Mesa"—Bobrow and Deutsch, June 12, 1979
"Garbage Collection Safety"—D. G. Bobrow, S. Owicki, June 13, 1979
"Cedar Mesa Safe Subset Proposal"—June 28, 1979
Invariants Proposal—Owicki, in preparation
"Delayed Binding in Cedar, Round 3"—P. Deutsch,
July 5, 1979
Objects/Exported Types Proposal—Morris and Satterthwaite, in preparation
Abstract:This document summarizes our current thinking on the version of Mesa to be provided by Cedar. Its motivation and required properties are described. Proposals are advanced for specific changes to Mesa, falling into three major areas: safe language, delayed binding, and exported types. The specific language design described here is called Mesa 5C1; since some design will proceed concurrently with implementation, it may not correspond exactly to any implemented version. This memo concludes with a summary of the proposed syntactic changes and a plan covering the work to be done for implementation.
INTRODUCTION
The proposals presented here have evolved after considerable discussion, taking into account reactions to the references cited above, which were circulated to CSL and the Programming Environment Interest distribution list. Comments and objections should be returned speedily if they are to affect the first version of the design of Cedar Mesa (informally called Mesa 5C1).
SAFE LANGUAGE
Motivation
A desirable property of a high-level language is implementation independence, which means that the effects of each program are explicable in terms of the language (rather than its implementation)—even if the program is erroneous. Mesa comes rather close to meeting this goal (as evidenced by the fact that most Mesa programs can be debugged "at the Mesa level," without ever worrying about the format of stack frames or the details of storage management), but it does contain some "unsafe" features whose use can lead to messy implementation dependencies.
It would be desirable to reduce implementation dependencies in Cedar on general grounds. However, the decision to include facilities for automatic storage deallocation ("garbage collection") within Cedar Mesa makes a shift in this direction imperative. The collector can cause storage to be deallocated (permitting its subsequent reallocation and re-use) at times that are completely unpredictable from examination of the source program. A single programming error that smashes a pointer used by the collector can effectively bring the whole system down in a rubble of bits, from which it is difficult to reconstruct any evidence of the cause of the crash.
It has been one of the major goals of the Cedar Language Features Design Committee to design a subset of Mesa for which it would be safe to do automatic storage deallocation. To first approximation, the "safe subset" is that part of the language where even incorrect programs cannot interfere with the reliable operation of the collector.
We intend that the vast majority of programs will be written primarily in the safe subset. However, we recognize that it is not presently feasible to provide acceptably efficient substitutes for all uses of Mesa’s unsafe features. Therefore, we propose instead to provide textual means for indicating that some regions of a program are unchecked. This will inhibit compiler enforcement of the safety restrictions and simultaneously indicate that the programmer has assumed the additional responsibility of ensuring that these regions of the program cannot violate the integrity of the system.
Invulnerability, Safety, and Checking
It is an obviously desirable property of a programming system that no user programming error can reduce its world to a rubble of bits. We will call this property invulnerability. In general, it can be ensured only by maintaining the integrity of certain data structures known to the run-time system (e.g., dispatch vectors, core maps). Collectively, the properties that must be maintained to ensure invulnerability will be called the safety invariants; each portion of the system will be responsible for ensuring that they are not destroyed, and will be free to assume that the rest of the system does likewise.
Invulnerability is not a local property. If any part of the system fails to maintain the invariants, the entire system (including regions that are themselves correct) is potentially vulnerable. We use the term safety for the property that the invariants cannot be invalidated locally (even by incorrect programs). Checked program regions are portions of a system where safety is guaranteed by language-enforced restrictions. Even unchecked regions are supposed to preserve the invariants, but the guarantee must be provided by the programmer, rather than by the language and its implementation. If a system consists entirely of checked regions (and the invariants hold initially), then by induction the program is invulnerable. However, an error in an unchecked region could make even the checked regions vulnerable. Thus the checked/unchecked boundary limits responsibility, but not vulnerability.
Unchecked regions may use unsafe language features, and errors may cause violation of system invariants. These regions are sometimes called "trusted regions," because confidence in the correct operation of the system must be based on trust that they are free of such errors. By contrast, our confidence that errors in checked regions will not cause system crashes is based on the knowledge that the safety restrictions have been automatically enforced.
Type Confusion
Mesa is a strongly typed language, which means that the types of variables are declared, and that the language imposes restrictions to keep values of one type from being accidentally interpreted as values of another. Because knowledge of the type structure of memory is so essential to the collector (it must locate and follow pointers in order to determine storage usage), it is particularly vulnerable to any operations that cause things in memory to be interpreted as having other than their declared types. Thus, much of the effort in designing the safe subset has gone into identifying all the features in the present Mesa language that allow type-checking to be circumvented (accidentally or deliberately) and designing safe replacements for important uses of those features.
LOOPHOLE, of course, is the most obvious means of defeating type-checking. It causes a safety problem if it allows mis-typing of some piece of memory (i.e., if the target type contains an address, such as a pointer or a procedure value); other uses will introduce implementation dependencies that do not threaten safety. We have formalized the distinction between these two sorts of breach of the type system, retaining LOOPHOLE in the unsafe language for arbitrary reinterpretation of the type of a value, and introducing the safe language type transfer function PUN, which is semantically equivalent to LOOPHOLE, except that address-containing types are not allowed as targets.
Narrowing
This proposal introduces a number of new type distinctions into Mesa, frequently leading to a number of separate, but closely related types. It is often desirable to coerce a value of one of these types into a value of a related type. If it can be statically guaranteed that no information can be lost by such a conversion, we call it a widening. Widening will be performed automatically whenever demanded by context—this is analogous to lengthening a CARDINAL to a LONG CARDINAL. In general, conversion in the other direction will require a run-time check to ensure that information is not being lost. For each such narrowing described below we indicate the nature of the necessary check; if the check fails, the signal RangeError will be raised. To make the possibility of such failure explicit in the program text, the NARROW type converter must be applied whenever a narrowing is to be performed; it will also be applicable to present Mesa types (e.g., LONG CARDINAL).
References
Although Mesa pointers are nominally typed, they provide a rich source of opportunities for type confusion, including—but not limited to—the classical "dangling pointer" problem, where a pointer persists after the storage it refers to has been deallocated. We are introducing a new class of reference types to make explicit several distinctions important for safe garbage collection. Conceptually, REFERENCE TO is a type schema much like a variant record schema; we allow it to be particularized in the same way as bound variants. Certain bound variants correspond closely to traditional uses of pointers, and can generally be substituted for them. POINTER TO will be an abbreviation for one such variant of reference.
DELAYED BINDING
Motivation
A desiable property of a high-level language is that it allow a wide range of binding times: that is, it should allow the programmer maximal control over when the attributes of a particular variable are determined, with different choices not requiring changes at each use of the variable. Examples of attributes include type, storage allocation method, implementation (of abstract objects), and actual value; examples of binding times include program-writing time, compilation, program initialization, block entry, and dynamic assignment. Generally speaking, deferring the binding of an attribute leads to greater generality in the program at the cost of decreased static checkability and (often) runtime efficiency.
Experience with languages like Lisp and Smalltalk, which provide only very late binding, suggests that even though late binding may actually be required only in few situations and have significant runtime cost, there are certain kinds of programs which are extremely clumsy to write if late binding of type or implementation is not available. In particular, programming tools (debuggers, performance monitors) and knowledge representation systems seem to be of this nature. Thus, in addition to wanting more binding time flexibility in Cedar Mesa on general grounds, we believe such flexibility will greatly simplify the writing of an important class of programs.
Dynamic typing
Mesa provides very limited variability in the binding time of an object’s type. Variant records allow a deferred choice between specific eunmerated alternatives, and string and array descriptors allow deferring the choice of an object’s length. Otherwise, all types must be fixed at compile time. This requirement makes it impossible to avoid LOOPHOLEs and ad hoc type tagging schemes when writing schedulers, searchers, printers, etc. which must operate on objects of unpredictable type. Our solution to this problem requires two new mechanisms: a runtime representation for types, and a way to associate a type with an object at runtime that is guaranteed consistent with the type system established at compile time. Note that we explicitly embrace the more traditional view that an object’s type is inherent in the object (rather than the Russell view that it is merely a description of the ways in which the object is used).
For runtime types to be useful, programs must be able to perform at least a limited set of operations on types (equality comparison and assignment); since we intend to provide a logically complete set of such operations in the future, we have integrated types into the language as first-class values. Thus the operation that forms the type POINTER TO T from the type T is simply viewed as another operator in the language, like +. In Mesa 5C1, we have kept the present limitation that the arguments to such operators must be compile-time constants, but it is a semantic limitation, not a syntactic one.
For Mesa 5C1, the only requirement that we place on the runtime representation for types is that different types be distinguishable, hence we have few constraints on the choice of representation. To minimize space, we assign short unique identifiers to types; the compiler’s representation of types is converted to these identifiers when an object module is loaded.
Since present machine architectures only handle fixed-size objects efficiently, we have chosen to associate dynamic type information only with pointers (which are fixed-size). We offer three different representations for dynamically typed pointers:
For contexts where it is known at the time an object is allocated that it will require a dynamically typed reference in the future, and where it is known that many objects of the same type will be allocated, we associate the type with an entire block of storage.
For contexts where it is known that a dynamically typed reference will be required, but objects of mixed type are involved, we store the type with the object.
For contexts where it is very likely (but not necessarily certain) that no dynamically typed reference will be required, we create a "hash link" at the time that such a reference is created (from a statically typed reference, hence we know the type at that time).
Opaque procedure invocation
Mesa currently requires that the argument and result lists for a procedure be written out explicitly at each invocation. This rules out two constructs we would very much like to have in the language: the ability to pre-assemble an argument record and treat it like any other kind of record, and the ability to bring together a dynamically typed procedure and a dynamically typed argument record and apply the former to the latter (provided the types match). Both are most useful in the context of programs that store procedures and associated arguments in data structures.
Dynamically bound variables
Mesa’s SIGNAL mechanism can be viewed as a way to bind the name of a procedure to an actual procedure, in a way that cuts across normal static scoping and follows the dynamic chain of control instead. We propose to extend this to cover variables of any type, not just procedure values. Experience in Lisp suggests that this mechanism is very useful if the rebinding of state information depends conditionally on the flow of control.
Items deferred
We deliberately decided not to include a few high-payoff "delayed binding" items in Mesa 5C1 since we felt we did not understand them well enough or simply did not have the resources to pursue them. These items were sequences (arrays of flexible size), the ability at runtime to convert symbolic names back to the objects they denote (using a mixture of dynamic state and compiler-generated tables), an S-expression-like representation for programs which included full type and scope information (although we do have a purely syntactic representation for programs), and a user-definable iteration facility like Alphard or Euclid generators.
EXPORTED TYPES AND OBJECTS
Motivation
Current Mesa systems are constructed by binding together collections of separately written and compiled modules that communicate through explicit and precisely specified interfaces. The langauge has been designed so that specifications of operations, which appear in interfaces, are decoupled from their implementations, which are hidden within program modules. Specifications of types do not currently have similar properties; all information about types used in interfaces must appear within the interface definitions themselves and must be fixed before any client code can be compiled. This shortcoming sacrifices information hiding and reconfigurability of systems; it also creates awkward compilation dependencies.
Support of the so-called "object-oriented" style of programming, as developed in such languages as Simula and (especially) Smalltalk, is an important goal of Cedar. Attempts to use Mesa for such programming particularly suffer from the inability to hide representational detail. In addition, Mesa does not distinguish between abstract and concrete types. The result is that dealing with an abstract class of objects with multiple and coexisting concrete implementations requires considerable circumlocution; the code is awkward and error-prone.
Programmers avoid some of the dependencies and awkwardness by relying upon extralinguistic (and thus uncheckable) conventions and then using various LOOPHOLEs to circumvent normal checking. This approach is obviously undesirable on general principles; it is unacceptable if we want most programming to be done within the safe langauge.
Opaque Types
The current technique for hiding the underlying representation of something is to declare its type as a place holder with no useful structure and then to allow the implementors to use LOOPHOLEs of some sort to map the values into the actual underlying type. The purpose of introducing exported types is to institutionalize this well-understood use of LOOPHOLE.
For Mesa 5C1, we propose to allow types to be exported from implementors into interfaces, just as procedures are. An interface may contain a declaration of an opaque type, the structure of which is largely unspecified and thus invisible to importers of that interface. Any exporter of the interface must supply a fully specified concrete type as the value of each opaque type, as well as a set of procedures implementing the operations upon that type. We do not require that all exporters supply the same concrete type; thus Mesa’s rules for type checking must be modified slightly to prevent the mixing of different concrete types.
Opaque types themselves allow the implementation of a type appearing in any particular interface to be fixed at the time of binding. They permit the use of different concrete types for a single abstract type as long as the concrete types are kept well separated. In addition, they have the practical benefit of eliminating many compilation dependencies.
Generic Types
Opaque types themselves do not handle the situation in which multiple implementations of an abstract type coexist within a system and clients wish to use the various implementations interchangably. This case can, in theory, be handled by merging all the concrete types into a variant record type and by explicitly constructing suitable operations records. The representation of an object is then a pair, with one component pointing to a particular variant and the other to an operations record. This approach requires either LOOPHOLEs or useless discriminations, leads to awkward notation, and generally suffers from the problems noted above.
Our proposal extends the opaque type mechanism to allow the intermixing of different implementations. It thus accommodates a more general style of object-oriented programming. For each opaque interface type, we introduce a generic type, the values of which can have any concrete type supplied for the opaque type. Such a value is represented by a value with the concrete type plus a link to the interface into which that concrete type, and the operations upon it, were exported.
Notational conventions
The notation used for object-oriented programming in Mesa depends upon the degree of generality anticipated by the programmer. If only one implementation is anticipated, he uses standard functional notation, e.g., Interface.Op[object, args]. If multiple implementations are anticipated, an object-oriented notation, e.g., object.Op[args] is more appropriate. We propose conventions that allow these notations to be used interchangably, given suitable declarations. This allows the programmer to adopt a uniform style and delay another level of decisions about representation.
Subclasses
Sometimes several classes of objects share a number of abstract operations; these can all be viewed as subclasses of some other class for which just the common operations are defined. Algorithms applicable to the superclass automatically extend to all the subclasses. Part of our proposal is a means for defining an interface as an extension of other interfaces. The effects of a subclassing can be achieved if one of the interfaces being extended includes an opaque type.
Items deferred
During the design process several approaches to polymorphic types and type parameters were discussed, but they are not proposed for inclusion at this time. We were also unable to find a satisfactory special-purpose solution to a specific problem that had high value to us, namely, to provide singly-linked lists (i.e. a family of types LIST OF T) in a manner that would allow the writing of polymorphic list-processing procedures which could guarantee type consistency to their clients.