Page Numbers: Yes X: 530 Y: 10.5" First Page: 3
Columns: 1 Edge Margin: .6" Between Columns: .4"
Margins: Top: 1.3" Bottom: 1"
Line Numbers: No Modulus: 5 Page-relative
Even Heading:
REQUIREMENTS FOR AN EXPERIMENTAL PROGRAMMING ENVIRONMENT
Odd Heading: Not-on-first-page
CATALOG OF PROGRAMMING ENVIRONMENT CAPABILITIES
2. Catalog of programming environment capabilities
We have divided the capabilities of a programming environment into four categories. Virtual machine / programming language refers to those capabilities that are primitive concepts in the programming language or in the virtual machine on which the programming language runs. Tools refers to capabilities for operating on programs. Packages refers to readily available programs that implement particular clearly defined higher-level concepts. Other includes documentation and non-technical considerations.
It is important to note that the division between tools and packages is a somewhat arbitrary one: in a good environment, there is little distinction between the two, in that all the capability provided by the tools is available to the programmer in the form of packages, and all the capability of the language (including the packages, of course) is available to the human user to extend the functions of the tools.
Additionally, it should be understood that we are discussing a programming environment for a computationally rich environment: many of these capabilities are feasible only when each programmer has substantial computing power available to him at all times. More specifically, we expect our computing facilities to be dominated by high-performance personal computers such as the Dorado [Lampson & Pier, 1980], which is a successor to the now inadequate Alto computer [Thacker, et al., 1979]. Our orientation toward single-user machines in a distributed environment requires us to consider a number of ‘‘operating system’’ capabilities that might be taken for granted in a time-sharing environment.
2.1. Summary
Here we enumerate the programming environment capabilities we have considered and place each in one of the four categories. In subsequent sections we shall discuss each capability in varying amounts of detail.
Virtual machine / programming language
(L1)Large virtual address space (> 24 bits)
(L2)Direct addressing for files
(L2a)Segmenting
(L2b)An enormous virtual address space (> 48 bits)
(L3)Well-integrated access to large, robust data bases
(L4)Memory management—object/page swapping
(L5)Object management—garbage collection, reference counting
(L6)Some support for interrupts
(L7)Adequate exceptional condition handling
(L8)User access to the machine’s capability for packed data
(L9)Program-manipulable representation of programs
(L10)Run-time availability of all information derivable from source program (e.g., names, types, scopes)
(L11)Statically checked type system
(L12)Self-typing data (a la Lisp and Smalltalk), run-time type system
(L13)Encapsulation/protection mechanisms (scopes, classes, import/export rules)
(L14)Abstraction mechanisms; explicit notion of ‘‘interface’’
(L15)Non-hierarchical control (coroutines, backtracking)
(L16)Adequate run-time efficiency
(L17)Inter-language communication
(L18)Uniform screen management
(L19)Inheritance/defaulting (e.g., Smalltalk subclassing; difficulty depends a lot on how much it has to do)
(L20)Ability to extend language (e.g., operator overloading)
(L21)Ability to create fully integrated local sublanguages
(L22)User access to the machine’s capability for multi-precision arithmetic
(L23)Good facilities for processes, monitors, interrupts
(L24)Simple, unambiguous syntax (including infix notation)
(L25)Control over importation of names
(L26)User packages as ‘‘first-class citizens’’
(L27)Closures
(L28)Full-scale inter-language communication
(L29)User microprogramming
(L30)Clean data and control trapping mechanisms
(L31)‘‘Good’’ exceptional condition handling
Tools
(T1)Fast turnaround for minor program changes (less than 5 seconds)
(T2)Compiler/interpreter available with low overhead at run time
(T3)Cross-reference/annotation capability
(T4)Prettyprinter
(T5)Consistent compilation
(T6)Version control
(T7)Librarian, program-oriented filing system (including Browser)
(T8)Source-language debugger
(T9)Dynamic measurement facilities
(T10)Checkpoint, establishing a protected environment
(T11)History and undoing
(T12)Editor integrated with language system
(T13)More optimizing compiler if user willing to bind more tightly—with full compatibility
(T14)Aids for incremental development (stubs, outstanding task list)
(T15)Regression testing system
(T16)Random testing aids
(T17)(high capability) Masterscope
(T18)Access to on-line documentation (Helpsys)
(T19)Static analyzers: verifier, performance predictor
Packages
(P1)Text objects and images
(P2)Line objects and images
(P3)Scanned (bitmap) objects and images
(P4)Formatted document files
(P5)More elaborate screen management
(P6)Remote file storage
(P7)Small data base manager
(P8)Message transmission system
(P9)Remote procedure call
(P10)Event logging
(P11)Background processing
(P12)Generalized cache
(P13)Document editing
(P14)Forms
(P15)Menus and other standard user interfaces
(P16)History lists
(P17)User access to full bandwidth of disk
(P18)(English) dictionary service
(P19)Teleconferencing
(P20)Audio
(P21)User access to full bandwidth of networks
Other
(X1)Adequate reference documentation
(X2)‘‘Efficient’’ interface for experts
(X3)Uniformity in command interface
(X4)‘‘Self-teaching’’ interface for beginners
(X5)Good introductory documentation
2.2. Virtual machine / programming language
(L1)Large virtual address space (> 24 bits)
(L2)Direct addressing for files
(L2a)Segmenting
(L2b)An enormous virtual address space (> 48 bits)
(L3)Well-integrated access to large, robust data bases
The issue here is that things should scale smoothly as programs grow to encompass more functions or larger data bases. As matters stand now, the time required tends to grow in predictable ways, but when the space (addressing or memory) requirements grow beyond a certain point, radical redesign of the program is usually required. A secondary issue is the ability to combine programs without running into the same kind of hard constraint on the space taken up by the code.
An address space of, say, 224 items would be adequate to hold all the code actually being used, even in a large system, but not adequate for all of its data (e.g., the American Heritage dictionary); whereas an address space of, say, 248 items would be adequate for all the code and data of a very large, even multi-machine system. We tend to favor the former, more conservative definition, since we do not understand how to provide an efficient, robust implementation of the latter (more on the robustness question below). However, we feel that it is essential that a transition across the 224-object boundary not require the kind of wholesale reorganization of programs that such a transition requires in current language systems.
System facilities for accessing large, external data bases are required for several reasons:
Many questions of organization and efficient implementation can be solved once, rather than over and over again by applications.
The system itself needs data base facilities for tools like the program librarian.
Access to externally stored data objects needs to be smooth for the debugger and other system facilities, not just the application programs.
Programs normally refer to internal objects with individual references, but to external data bases with both individual references and mass queries. There are three basic techniques for speeding up references to external data:
Caches (good for individual references);
Using sequentiality properties (good for searching);
Inversion (good for searching).
All of these techniques should be available in package form.
Integrity. A programming environment in which the file system is viewed as an extension of the address space must be extremely robust—much more so than any programming system we now have. Our current practice is to make the ‘‘truth’’ for a data base be a text file, and not to expend a lot of effort on armoring the binary version against all imaginable errors. Notable exceptions are the basic file facilities; we believe that a system that provides robustness facilities usable at higher levels would have a lot of advantages.
Long-term integrity seems to require some kind of history or redundancy information. To protect against ‘‘cosmic rays,’’ it is sufficient to record this information in a form that only the implementing program understands. To provide Undo capability, the history must be in a form that makes sense to clients.
Another kind of integrity has to do with not losing information. The following kinds of ideas are important in this connection:
A computed object (e.g., a Mesa configuration) should keep track of how it got made, in enough detail to make it again. The description of the putting-together process should be program-manipulable.
Files should be self-identifying in a way that is not altered by their transfer between storage media or locations.
It should not be possible to destroy all traces of a file containing input (as opposed to purely computed) information. The space requirements can be kept under control by storing descriptions of files (such as changes from other files) rather than all the bits.
Even the dangling reference problem and various garbage collection approaches can be viewed as integrity questions to some extent.
(L4)Memory management—object/page swapping
(L5)Object management—garbage collection, reference counting
These facilities are important for essentially the same reasons as (L1) through (L3), namely, to free programmers from excessive concern for the size and location of their code and data, for both explicitly named and dynamically constructed objects.
(L6)Some support for interrupts
The ability to interrupt program execution is essential, at least to gain control of runaway programs, to allow interaction with long computations, and to provide local multiprocessing; ‘‘good’’ facilities are less important. We have no strong religious feelings about the form of a process mechanism.
(L7)Adequate exceptional condition handling
An integrated mechanism for handling exceptional conditions is required for clean debugging and for the construction of robust programs; it helps clarify program structure by separating normal and exceptional algorithms. Certain mechanisms are required in this area: some way to unwind the stack, some way of catching an unwind, and program control of error processing. What should be provided beyond this is not agreed—the controversies raging around the Mesa ‘‘signal’’ facilities are a reflection of our poor understanding of the problem.
(L8)User access to the machine’s capability for packed data
The ability to pack data is essential to obtain acceptable storage efficiency for large data structures. However, in a system without static (pre-runtime) type checking, the desire to access packed structures efficiently (without checking or indirection through descriptors) conflicts with the desire to prevent corruption of the storage management system.
There was a long discussion of why it is hard to add packed data to Dorado Lisp, which centered around this protection question in the following guise: how carefully should the system prevent the user from smashing its underlying data structures. There wasn’t much agreement on this point, but it does seem to have considerable practical importance, since a highly restrictive attitude makes it difficult to code low-level parts of the system in itself.
(L9)Program-manipulable representation of programs
(L10)Run-time availability of all information derivable from source program (e.g., names, types, scopes)
These two issues are closely related: underlying them is our desire to make it easy to extend the set of tools, to make communication between sublanguages easy, and to break down as many as possible of the artificial distinctions between programs and their compilers, on one hand, and data and their programs, on the other.
A long discussion led to the conclusion that all this information is currently available in Mesa, modulo the question of how stable the compiler’s internal representation of the program as a tree should be expected to be. Straightforward methods (like the ones used in CLisp) will allow compiled code to be attached to source in a user-created data structure, although it is certainly easier to do this in Lisp, where interpretation is simple. Not explored was the usefulness of an interpreter for a subset of the language, such as currently exists in the Mesa debugger.
(L11)Statically checked type system
(L12)Self-typing data (a la Lisp and Smalltalk), run-time type system
The primary value of type systems is descriptive and structural—specifying the intended properties of one’s data, and providing a mechanism for ensuring consistency between the suppliers and clients of an interface. Enabling a compiler to generate better code is secondary. There are at least two dimensions of variability in type systems:
—whether type information is bound early, as in Mesa, or late, as in Lisp and Smalltalk. There is a need to provide the programmer a selection from a spectrum of alternatives—Mesa provides variant records, which are a limited form of run-time binding, and Lisp has DeclTran, which provides some compile-time binding.
—whether types form a strict partition of the data values, a coercion hierarchy, or some even richer structure. We believe that a richer structure is desirable. It might include provision for generic operators, for type-parameterized programs or schemes [Mitchell & Wegbreit, 1977], for a more general notion of type as simply a partial specification of the behavior of an object (perhaps like the Alphard ‘‘needs’’ list [Shaw, et al., 1977]), and for automatic pointwise extension of operators over collections.
It appears feasible to deduce much of the type information for a program automatically, starting from the assumption that each variable only takes on values of one type [Milner, 1978; Cousot & Cousot, 1977]. This would alleviate some of the nuisance of having to write type declarations. Name conventions (e.g., capitalization or standard prefixes), if interpreted by the compiler, would also eliminate most of the need for separate specification of type information.
This is an area ripe for research. We believe that, at a minimum, both the early-bound Mesa type system and the late-bound Lisp/Smalltalk type system must be supported (as alternatives) by an EPE.
(L13)Encapsulation/protection mechanisms (scopes, classes, import/export rules)
(L14)Abstraction mechanisms; explicit notion of ‘‘interface’’
Abstraction mechanisms are important because they make explicit the logical dependencies of one part of a program on another, while concealing implementation choices irrelevant to the communication between such parts. Thus, these mechanisms enable the ability to factor the development, debugging, testing, documentation, understanding, and maintenance of programs into manageable pieces, while leaving individual programmers the appropriate freedom to design those pieces.
The ability to specify interfaces in the abstract, and to conceal their implementation, is important, but a difficult research area. It is possible to derive this information about implicit interfaces after the fact using tools like Masterscope. The code produced by an optimizing compiler need not reflect source-level modularization, if tighter binding improves efficiency and the user is willing to pay the price of more compilations and possibly decreased debugging information.
We believe one facility in this area is absolutely essential: it must be possible for a programmer to control which names get exported from a package. In addition, it is important for the system to conceal the distinction between user-defined packages and system-provided primitives (types, operations, etc.) at least as well as Mesa does.
(L15)Non-hierarchical control (coroutines, backtracking)
Coroutines and generators are essential: they provide a natural way to write transducers (programs that consume a data stream and produce another), which in turn are often the best way to modularize a data transformation algorithm. Lisp has backtracking, a control discipline sometimes used to explore alternatives in goal-directed searching, but there is considerable disagreement over the mechanism used to provide it, and its uses can probably be covered by more restricted coroutine and Undo mechanisms. Closures are a method of providing a wide variety of interesting control and binding environments, but we are not sure whether they can be implemented efficiently enough, or are structured enough, to replace the more specialized coroutine constructs.
(L16)Adequate run-time efficiency
The ultimate efficiency criterion is whether a system can meet its external specifications and constraints (responsiveness to human users or program clients). Computational efficiency equivalent to Alto Mesa, coupled with the larger real memory and faster disk of the Dorado, is adequate for many projects; for Lisp and Smalltalk, we think we can attain this through internal re-engineering. For other, more computation-intensive systems, at least another factor of 5 is attainable (based on raw hardware speed).
(L17)Inter-language communication
We believe the best way to attain the benefits of uniform methods for accessing the machine’s facilities at the lowest level is for all language systems to run under a single operating system that is carefully constructed not to impose unnecessary space or time penalties on its clients. For the Dorado, we believe that the Pilot operating system satisfies this criterion [Redell, et al., 1979].
Inter-language communication at a higher level helps reduce duplication of effort and also provides one way of attaining extra efficiency for particular functions. Calling Mesa subroutines from Lisp or Smalltalk will probably be adequate. Communication through the network or file system interface requires very little work, but is too inefficient. The general problem seems difficult and less important.
(L18)Uniform screen management
Use of the display is pervasive in our interactive systems. Lack of uniformity leads to duplicated effort, often of low quality since an individual builder cannot easily draw on all past experience or devote the time to taking advantage of it. On the other hand, too much central control over screen management may frustrate the desire to experiment with new paradigms for interaction.
We believe that it is possible to ‘‘virtualize’’ the screen and the user input devices—that is, require people to write programs on the assumption that they will only have access to a subpart of the screen and to a slightly filtered stream of input events—in a way that will not markedly impede our ability to experiment, and that will have a large payoff in terms of the user’s ability to construct a screen environment containing multiple windows on different programs. At a minimum, the user must have direct access to all the capabilities of RasterOp [Newman & Sproull, 1979], appropriately mapped or confined to work on a virtual screen.
(L19)Inheritance/defaulting (e.g., Smalltalk subclassing)
Languages that provide for programmer-controlled defaulting or inheritance reduce the time and chance for error in the programming process by making it unnecessary to write the same code or parameter values over and over again. The basic idea is that one should be able to write programs in a way that only specifies how they differ from some previously written program. Examples include default standard values for procedure arguments (how does this call differ from a ‘‘standard’’ call?), variant records (how does this particular record distinguish itself from the invariant part?), and the Smalltalk subclass concept (how does this class of objects differ from some more general class?)
We did not discuss this area beyond observing that it is somewhat related to the schemes question discussed under (L12), and that Smalltalk seems to derive considerable benefit from it.
(L20)Ability to extend language (e.g., operator overloading)
Languages may be extended by users in a variety of ways. Data structure extension, through user-defined data types and associated operations, makes it possible to write programs in terms of concept-oriented rather than implementation-oriented data objects. Syntax extension, through user definition of new language constructs, allows the user to define specialized notations that may be valuable for particular tasks: this is discussed in detail in (L21). Operator extension, the ability to define meanings for basic language constructs such as arithmetic or iteration when applied to user-defined objects, brings some of the benefits of notational extension with less drastic consequences in program readability.
Data structure extension is accepted as an important part of all modern programming languages. Syntax extension has fallen into disfavor because of a lot of bad experience; we believe this happened partly because the tools did not support extensions as well as they did the base language. Operator extension is already present in a number of languages such as Algol 68: we did not discuss its merits. It is interesting to note that Smalltalk is founded on the notions of data structure and operator extension, but that the syntax extension facilities present in the 1972 version of the language have been removed.
(L21)Ability to create fully integrated local sublanguages
All language systems actually have many small sublanguages for special purposes. For example, Mesa has not only the Mesa language, but the C/Mesa configuration language, the debugger command language, the programming language subset acceptable to the debugger’s interpreter, and the very small languages used to control the compiler and binder from the Executive command line. ‘‘Fully integrated,’’ as an ideal, means that control and data should be able to pass freely between sublanguages, and that the facilities (editor, prettyprinter, I/O system, etc.) applicable to the primary programming language should also be applicable to the other sublanguages.
Lisp is unique in that its dozen or so sublanguages all provide the ability to embed arbitrary Lisp computations in them. For example, in the middle of the editor one can compute (by calling an arbitrary program) data to be inserted, possibly as a function of the thing being edited, or even a sequence of edit commands to execute. The following features of Lisp seem to have made the creation of integrated sublanguages easier:
S-expressions are a simple, standard internal representation of parse trees for all sublanguages.
Lisp provides a standard method of sharing names and passing environments, namely a single, very simple name environment (atoms) that all sublanguages share. (This has both advantages and drawbacks: it leads to the ‘‘FLG’’ phenomenon, for example.)
Many internal system ‘‘hooks’’ are available to the sublanguage implementor. (This too has its drawbacks: it tends to make sublanguages more fragile.)
The standard system contains packages (prettyprinter, table-driven lexical scanner and parenthesis parser) which make I/O of program-like structures easy.
The sublanguages that don’t take advantage of these characteristics, such as CLisp, QLisp, and KRL, find their lives a lot more difficult. If we were willing to limit the complexity of sublanguages to that of S-expressions, i.e., procedures and conditionals, then we could devise an S-expression-like representation for Mesa also. (Extending this to, say, arithmetic expressions not only involves a complex parser and prettyprinter, but in a statically typed language like Mesa also requires taking type declarations into account to decide what the operators in the source text actually mean. Admittedly, the S-expression-like approach doesn’t allow embedding of a reasonable subset of Mesa itself in a sublanguage, and it doesn’t address the point that some of the highest payoff comes from the integration of languages, like KRL, that don’t look like S-expressions.)
We also noted that no matter what features the language and environment provided, proper proceduralization of facilities was essential: even Lisp has sublanguages—in particular, the compiler control sublanguage—that are implemented so as to interact with the user directly, and that therefore cannot be considered integrated.
To sharpen our ideas about what integration means, we considered a ‘‘straw man’’: a system in which all languages (editor, interpreter, etc.) shared a screen interface (window manager) but were otherwise entirely separate. This led us to the following observations:
This model was proposed as one that imposed minimal requirements on the individual subsystems, at least if they only dealt with the screen as a sequential character I/O device. This may not be a proper assumption, however: despite numerous attempts, no such package has ever been developed for the Alto, and this may be because nobody has been able to develop a satisfactory model for the interface. We agreed that things become more complex as the subsystem’s view of the display becomes more sophisticated (e.g., an editable document, a bitmap).
While this model allowed for considerable communication (by human transfer of characters from output in one window to input in another), it had two serious deficiencies: it made no provisions for communication under program, rather than manual, control, and it required that all transmitted information be in text form. (Note that even transmission of file names requires integration in the sense of sharing a common file system.)
While we did not reach any clear conclusions, we were able to agree on the following:
This is an important area for discussion, but it needs more time than we had available to us.
Integration really means a common model for communication.
The one catalog entry we have now should be broken down into multiple features of differing priorities.
If all the other Priority A features in the catalog were provided, Mesa could readily support sublanguages of the complexity of S-expressions.
Adequate proceduralization is necessary for a package implementing a sublanguage to be usable.
A major source of difficulty is the sharing or passing of environment information between sublanguages. One part of this difficulty is simply addressing or naming objects to be shared. Another part is making sure that shared objects are interpreted the same way (in Mesa, making sure the communicants share the same declarations for the objects). One way around this is to have a limited number of globally agreed-upon structures, such as strings or S-expressions, and then encoding more specialized languages within them and interpreting them by convention or agreement (as Lisp does).
We also do not agree on the importance of fully integrated sublanguages: a number of Lisp users feel this item should have very high priority.
(L22)User access to the machine’s capability for multi-precision arithmetic
Many language systems, though implemented on machines in which multi-precision arithmetic in assembly language is relatively straightforward, make it impossible for the user to get at these facilities (such as the carry from single-precision addition, or the double-by-single division that most machines provide in hardware). There is no excuse for this, especially on machines with relatively short (16-bit) words.
(L23)Good facilities for processes, monitors, interrupts
Synchronization between logically asynchronous processes is necessary in many programs, either for functional reasons (a system should be able to listen for incoming mail, send a file to be printed, and carry out interactive editing simultaneously) or for efficiency (overlapping computation with disk transfers). The language system, through an underlying operating system if necessary, should provide mechanisms that help the programmer write programs that involve multiple processes. There are a number of adequate, though conflicting, models of these mechanisms available, such as the Mesa and Pilot facilities and the Smalltalk scheduler. We did not discuss this area at all.
(L24)Simple, unambiguous syntax (including infix notation)
While CLisp leaves a very large gap between what can be precisely defined and what the user can reasonably do, and while Mesa syntax is regrettably complex (5 closely-spaced pages), we are willing to tolerate problems of this sort under the assumption that the primary users of the system will be experts, and that novices will be able to learn a useful subset easily.
(L25)Control over importation of names
As discussed under (L13) and (L14), import control seems less important than export control. The reason is that the implicit use of an interface can be deduced locally by nothing what names are used; for export, the issue is global and not under control of the provider of the package without some explicit specification.
(L26)User packages as ‘‘first-class citizens’’
We would like user-defined packages to function as ‘‘first-class citizens’’ on a par with built-in primitives (types, operations, etc.) However, the Euclid experience seems to indicate that this is very difficult [Popek, et al., 1977]. In Smalltalk, this goal has been achieved except for some I/O issues, at the expense of not having static type structure in the language at all.
(L27)Closures
See (L15) for discussion.
(L28)Full-scale inter-language communication
(L29)User microprogramming
The ability to share data and pass control freely between programs written in any of the major languages depends on carefully coordinated use of certain basic resources such as the peripheral devices and the machine’s address space. We think this is less urgent than the more restricted facilities of (L17): the primary motivation for changing languages in mid-program is increased efficiency, and dropping into Mesa from Lisp or Smalltalk provides this, although it does not address the secondary motivation, which is the ability to take advantage of work already done (or more conveniently done) in another language.
In cases requiring extreme speed, it may be necessary to give the programmer a way to write application-dependent microcode and link it to the system in a way that provides at least some checking that it will not destroy the rest of the system. We did not discuss this.
(L30)Clean data and control trapping mechanisms
There was a long and inconclusive discussion. Apparently we don’t really know what this point is about. At one extreme of ‘‘data trapping’’ there is a simple address trap, as on early machines. At the other extreme is KRL. No one was willing to espouse either extreme. We noted that:
Programming with data abstractions can help with this problem, since there is then a better handle on when data is being changed.
Checking some predicate at periodic intervals (e.g., on every control transfer) may be quite adequate when data trapping is being used to catch ‘‘core smashing’’ types of bugs. Mesa already has this facility.
Many interesting cases can probably be handled by using the primitive trapping facilities of the mapping hardware.
(L31)‘‘Good’’ exceptional condition handling
As discussed above, we really don’t know what this would mean. Roy Levin’s thesis, among other published papers, may be relevant [Levin, 1977].