[_CD8_]<cedardocs>per>report>per.3a!1

The issue here is that things should scale smoothly as programs grow to encompass more functions or larger data bases. As things stand, the time required tends to grow in predictable ways, but when the space (addressing or memory) requirements grow beyond a certain point, radical redesign of the program is usually required. A secondary issue is the ability to combine programs without running into the same kind of hard constraint on the space taken up by the code.

An address space of, say, 2↑24 items would be adequate to hold all the code actually being used even in a large system, but not adequate for all of its data (e.g. the American Heritage dictionary), whereas an address space of, say, 2↑48 items would be adequate for all the code and data of a very large, even multi-machine system. We tend to favor the former, more conservative definition, since we do not understand how to provide an efficient, robust implementation of the latter (more on the robustness question below). However, we feel that it is essential that a transition across the 2↑24-object boundary not require the kind of wholesale reorganization of programs that such a transition requires in current language systems.

A programming environment in which the file system is viewed as an extension of the address space must be extremely robust -- much more so than any programming system we now have. Our current practice is to make the "truth" for a data base be a text file, and not to expend a lot of effort on armoring the binary version against all imaginable errors. Notable exceptions are the basic file facilities like IFS; we believe that a system that provides robustness facilities usable at higher levels would have a lot of advantages.

It should not be possible to destroy all traces of a file containing input (as opposed to purely computed) information. The space requirements can be kept under control by storing descriptions of files (such as changes from other files) rather than all the bits.

These facilities are essential for essentially the same reasons as (L1) through (L3), namely, to free programmers from excessive concern for the size and location of their code and data, for both explicitly named and dynamically constructed objects. Pilot will take care of our memory management needs if we use page swapping. Alternative swapping strategies, such as Ooze, were not discussed: they are most applicable to environments that are very short on high-speed memory, which we believe will not be the case on the Dorado.

The ability to interrupt program execution is essential, at least to gain control of runaway programs, to allow interaction with long computations, and to provide local multiprocessing; "good" facilities are less important. We have no strong religious feelings about the form of a process mechanism.

An integrated mechanism for handling exceptional conditions is required for clean debugging and for the construction of robust programs; it helps clarify program structure by separating "normal" and "exceptional" algorithms. Certain mechanisms are required in this area: some way to unwind the stack, some way of catching an unwind, and program control of error processing. What should be provided beyond this is not agreed -- the controversies raging around the Mesa signal facilities are a reflection of our poor understanding of the problem.

The ability to pack data is essential to obtain acceptable storage efficiency for large data structures. However, in a system without static (pre-runtime) type checking, the desire to access packed structures efficiently (without checking or indirection through descriptors) conflicts with the desire to prevent corruption of the storage management system. There was a long discussion of why it is hard to add packed data to Alto Lisp, which centered around this protection question in the following guise: how carefully should the system prevent the user from smashing its underlying data structures. There wasn’t much agreement on this point, but it does seem to have considerable practical importance, since a highly restrictive attitude makes it difficult to code low-level parts of the system in itself.

These two issues are closely related: underlying them is our desire to make it easy to extend the set of tools, to make communication between sublanguages easy, and to break down as many as possible of the artificial distinctions between programs and their compilers, on one hand, and data and their programs, on the other. A long discussion led to the conclusion that all this information is currently available in Mesa, modulo the question of how stable the compiler’s internal representation of the program as a tree should be expected to be. Straightforward methods (like the ones used in Clisp) will allow compiled code to be attached to source in a user-created data structure, although it is certainly easier to do this in Lisp, where interpretation is simple. Not explored was the usefulness of an interpreter for a subset of the language, such as currently exists in the Mesa debugger.

The primary value of type systems is descriptive and structural -- specifying the intended properties of one’s data, and providing a mechanism for ensuring consistency between the suppliers and clients of an interface. Enabling a compiler to generate better code is secondary. There are several dimensions of variability in type systems:

Whether type information is bound early, as in Mesa, or late, as in Lisp and Smalltalk. There is a need to provide the programmer a selection from a spectrum of alternatives -- Mesa provides variant records, which are a limited form of run-time binding, and Lisp has DeclTran, which provides some compile-time binding.

Whether types form a strict partition of the data values, a coercion hierarchy, or some even richer structure. We believe that a richer structure is desirable. It might include provision for generic operators, for type-parameterized programs (Mitchell & Wegbreit "schemes"), for a more general notion of type as simply a partial specification of the behavior of an object (perhaps like the Alphard "needs" list), and for automatic pointwise extension of operators over collections.

It appears feasible to deduce much of the type information for a program automatically, starting from the assumption that each variable only takes on values of one type [Milner, Cousot & Cousot]. This would alleviate some of the nuisance of having to write type declarations. Name conventions such as those of the Software Production Environment, if interpreted by the compiler, would also eliminate most of the need for separate specification of type information.

Abstraction mechanisms are important because they make explicit the logical dependencies of one part of a program on another, while concealing implementation choices irrelevant to the communication between such parts. Thus, these mechanisms enable the ability to factor the development, debugging, testing, documentation, understanding, and maintenance of programs into manageable pieces, while leaving individual programmers the appropriate freedom to design those pieces.

The ability to specify interfaces in the abstract, and to conceal their implementation, is important, but a difficult research area. It is possible to derive this information about implicit interfaces after the fact using tools like Masterscope. The code produced by an optimizing compiler need not reflect source-level modularization, if tighter binding improves efficiency and the user is willing to pay the price of more compilations and possibly decreased debugging information.

We believe one facility in this area is absolutely essential: it must be possible for a programmer to control which names get exported from a package. In addition, it is important for the system to conceal the distinction between user-defined packages and system-provided primitives (types, operations, etc.) at least as well as Mesa does.

Coroutines and generators are essential: they provide a natural way to write transducers (programs that consume a data stream and produce another), which in turn are often the best way to modularize a data transformation algorithm. Lisp has backtracking, a control discipline sometimes used to explore alternatives in goal-directed searching, but there is considerable disagreement over the mechanism used to provide it, and its uses can probably be covered by more restricted coroutine and Undo mechanisms. Closures are a method of providing a wide variety of interesting control and binding environments, but we are not sure whether they can be implemented efficiently enough, or are structured enough, to replace the more specialized coroutine constructs.

The ultimate efficiency criterion is whether a system can meet its external specifications/constraints (responsiveness to human users or program clients). Computational efficiency equivalent to Alto Mesa, coupled with the larger real memory and faster disk of the Dorado, is adequate for many projects [we think we can attain this through internal re-engineering of Lisp and Smalltalk]; for other, more computation-intensive systems, at least another factor of 5 is attainable (based on raw hardware speed).

We believe the best way to attain the benefits of uniform methods for accessing the machine’s facilities at the lowest level is for all language systems to run under a single operating system that is carefully constructed not to impose unnecessary space or time penalties on its clients. For the D0 and Dorado, we believe that Pilot satisfies this criterion.

Inter-language communication at a higher level helps reduce duplication of effort and also provides one way of attaining extra efficiency for particular functions. Calling Mesa subroutines from Lisp or Smalltalk will probably be adequate. Communication through the network or file system interface requires very little work, but is too inefficient. The general problem seems difficult and less important.

Use of the display is pervasive in our interactive systems. Lack of uniformity leads to duplicated effort, often of low quality since an individual builder cannot easily draw on all past experience or devote the time to taking advantage of it. On the other hand, too much central control over screen management may frustrate the desire to experiment with new paradigms for interaction.

We believe that it is possible to "virtualize" the screen and the user input devices -- that is, require people to write programs on the assumption that they will only have access to a subpart of the screen and to a slightly filtered stream of input events -- in a way that will not markedly impede our ability to experiment, and that will have a large payoff in terms of the user’s ability to construct a screen environment containing multiple windows on different programs. At a minimum, the user must have direct access to all the capabilities of BitBlt, appropriately mapped or confined to work on a virtual screen.

Languages that provide for programmer-controlled defaulting or inheritance reduce the time and chance for error in the programming process by making it unnecessary to write the same code or parameter values over and over again. The basic idea is that one should be able to write programs in a way that only specifies how they differ from some previously written program. Examples include default standard values for procedure arguments (how does this call differ from a "standard" call?), variant records (how does this particular record distinguish itself from the invariant part?), and the Smalltalk subclass concept (how does this class of objects differ from some more general class?).

Languages may be extended by users in a variety of ways. Data structure extension, through user-defined data types and associated operations, makes it possible to write programs in terms of concept-oriented rather than implementation-oriented data objects. Syntax extension, through user definition of new language constructs, allows the user to define specialized notations that may be valuable for particular tasks: this is discussed in detail in (L21). Operator extension, the ability to define meanings for basic language constructs such as arithmetic or iteration when applied to user-defined objects, brings some of the benefits of notational extension with less drastic consequences in program readability.

Data structure extension is accepted as an important part of all modern language. Syntax extension has fallen into disfavor because of a lot of bad experience: we believe this happened partly because the tools did not support extensions as well as they did the base language. Operator extension is already present in a number of languages such as Algol 68: we did not discuss its merits. It is interesting to note that Smalltalk is founded on the notions of data structure and operator extension, but that the syntax extension facilities present in the 1972 version of the language have been removed.

All language systems actually have many small sublanguages for special purposes. For example, Mesa has not only the Mesa language, but the C/Mesa configuration language, the debugger command language, the programming language subset acceptable to the debugger’s interpreter, and the very small languages used to control the compiler and binder from the Executive command line. "Fully integrated", as an ideal, means that control and data should be able to pass freely between sublanguages, and that the facilities (editor, prettyprinter, I/O system, etc.) applicable to the primary programming language should also be applicable to the other sublanguages.

Lisp is unique in that its dozen or so sublanguages all provide the ability to embed arbitrary Lisp computations in them, e.g. in the middle of the editor one can compute (by calling an arbitrary program) data to be inserted, possibly as a function of the thing being edited, or even a sequence of edit commands to execute. The following features of Lisp seem to have made the creation of integrated sublanguages easier:

Lisp provides a standard method of sharing names and passing environments, namely a single, very simple name environment (atoms) that all sublanguages share. (This has both advantages and drawbacks: it leads to the "FLG" phenomenon, for example.)

The sublanguages that don’t take advantage of these characteristics, such as CLISP, QLISP, and KRL, find their lives a lot more difficult. If we were willing to limit the complexity of sublanguages to that of S-expressions, i.e. procedures and conditionals, then we could devise an S-expression-like representation for Mesa also. (Extending this to, say, arithmetic expressions not only involves a complex parser and prettyprinter, but in a statically typed language like Mesa also requires taking type declarations into account to decide what the operators in the source text actually mean. Admittedly, the S-expression-like approach doesn’t allow embedding of a reasonable subset of Mesa itself in a sublanguage, and it doesn’t address the point that some of the highest payoff comes from the integration of languages, like KRL, that don’t look like S-expressions.) We also noted that no matter what features the language and environment provided, proper proceduralization of facilities was essential: even Lisp has sublanguages -- in particular, the compiler control sublanguage -- that are implemented so as to interact with the user directly, and that therefore cannot be considered integrated.

To sharpen our ideas about what integration means, we considered a "straw man": a system in which all languages (editor, interpreter, ...) shared a screen interface (window manager) but were otherwise entirely separate. This led us to the following observations:

This model was proposed as one that imposed minimal requirements on the individual subsystems, at least if they only dealt with the screen as a sequential character I/O device. This may not be a proper assumption, however: despite numerous attempts, no such package has ever been developed for the Alto, and this may be because no one has been able to develop a satisfactory model for the interface. We agreed that things become more complex as the subsystem’s view of the display becomes more sophisticated (e.g. an editable document, a bitmap).

While this model allowed for considerable communication (by human transfer of characters from output in one window to input in another), it had two serious deficiencies: it made no provisions for communication under program, rather than manual, control, and it required that all transmitted information be in string form. (Note that even transmission of file names requires integration in the sense of sharing a common file system.)

A major source of difficulty is the sharing or passing of environment information between sublanguages. One part of this difficulty is simply addressing or naming objects to be shared. Another part is making sure that shared objects are interpreted the same way (in Mesa, making sure the communicants share the same declarations for the objects). One way around this is to have a limited number of globally agreed-upon structures, such as strings or S-expressions, and then encoding more specialized languages within them and interpreting them by convention or agreement (as Lisp does).

Many language systems, though implemented on machines in which multi-precision arithmetic in assembly language is relatively straightforward, make it impossible for the user to get at these facilities (such as the carry from single-precision addition, or the double-by-single division that most machines provide in hardware). There is no excuse for this, especially on machines with relatively short (16-bit) words.

Synchronization between logically asynchronous processes is necessary in many programs, either for functional reasons (a system should be able to listen for incoming mail, send a file to be printed, and carry out interactive editing simultaneously) or for efficiency (overlapping computation with disk transfers). The language system, through an underlying operating system if necessary, should provide mechanisms that help the programmer write programs that involve multiple processes. There are a number of adequate, though conflicting, models of these mechanisms available, such as the Mesa and Pilot facilities and the Smalltalk scheduler. We did not discuss this area at all.

While CLISP leaves a very large gap between what can be precisely defined and what the user can reasonably do, and while Mesa syntax is regrettably complex (5 closely-spaced pages), we are willing to tolerate problems of this sort under the assumption that the primary users of the system will be experts, and that novices will be able to learn a useful subset easily.

As discussed under (L13) and (L14), import control seems less important than export control. The reason is that the implicit use of an interface can be deduced locally by nothing what names are used; for export, the issue is global and not under control of the provider of the package without some explicit specification.

We would like user-defined packages to function as "first-class citizens" on a par with built-in primitives (types, operations, etc.). However, the Euclid experience seems to indicate that this is very difficult. In Smalltalk, this goal has been achieved except for some I/O issues, at the expense of not having static type structure in the language at all.

The ability to share data and pass control freely between programs written in any of the major languages depends on carefully coordinated use of certain basic resources such as the peripheral devices and the machine’s address space. We think this is less urgent than the more restricted facilities of (L17): the primary motivation for changing languages in mid-program is increased efficiency, and dropping into Mesa from Lisp or Smalltalk provides this, although it does not address the secondary motivation, which is the ability to take advantage of work already done (or more conveniently done) in another language.

In cases requiring extreme speed, it may be necessary to give the programmer a way to write application-dependent microcode and link it to the system in a way that provides at least some checking that it will not destroy the rest of the system. We did not discuss this.

There was a long and inconclusive discussion. Apparently we don’t really know what this point is about. At one extreme of "data trapping" there is a simple address trap, as on early machines. At the other extreme is KRL. No one was willing to espouse either extreme. We noted that