--Rick Beach, February 15, 1987 8:01:50 pm PST
--- length: 1 in
Reference Models, Window Systems, and Concurrency
Keith A. Lantz, Peter P. Tanner,
Carl Binding, Kuan-Tsae Huang, Andrew Dwelly
1. Introduction
This group was originally chartered to develop the architecture of the user interface software ``external to the UIMS.'' This charter presented a classic chicken-and-egg problem since, in order to determine what was ``external to'' the UIMS, it was necessary first to determine what was meant by ``the UIMS,'' a determination being made in parallel by the other working groups! We addressed this problem by formulating a reference model for the implementation of interactive software as a whole, and then specifying those components of the model that we believed corresponded to the intended meaning of ``the UIMS.''
Subsequently, we focused on the software comprising what traditionally has been referred to as the ``window system'' [23], leaving it to another working group to discuss the relationship of the UIMS to higher-level (application) software [16]. Of particular concern are the sharing of devices between multiple clients, the ways in which concurrent programming techniques can expedite the implementation of relevant software and an evaluation of standards efforts in this area.
2. A reference model for the implementation of interactive software
This section presents a reference model for the implementation of interactive software. The intent is to emulate the OSI Reference Model [44] by providing a framework within which alternative implementation strategies can be compared, rather than to propose a specific implementation strategy.
2.1 Environmental factors
Any reference model makes assumptions about the environments to which it applies. Our reference model, in particular, was affected significantly by the following considerations, most of which have only recently begun to impact the design of interactive software:
Distribution: We are moving to an increasingly distributed environment, where applications run on machines other than the one(s) managing the user interface. Depending on the type of interconnection, interaction between users and applications may be affected by significant variation in bandwidth and delay.
Concurrency: Users are becoming ever more accustomed to interacting with several applications at the same time, possibly using multiple communications media simultaneously. This requires that the underlying computer system support multi-tasking, which in turn is perhaps best provided by judicious use of processes.
Multimedia: User interfaces are evolving to accommodate media other than text and graphics — for example, voice, touch and video. The new devices required to be supported currently include a variety of valuators, button devices, digital pen and tablet devices and audio input and output. Ultimately, the user should be able to employ his entire repertoire of senses to interact with the computer, which may entail the use of eye trackers and other head- or body-mounted devices [8].
Compatibility: It is important to allow existing applications to continue to run without any modifications to the source. It is also desirable to minimize the changes required in existing systems software.
Collaboration: Users are becoming increasingly more interested in using computers to support collaborative work, or human-computer-human interaction. Consequently, the ideal user interface should provide support for multi-user dialogues.
In addition, two other factors are of great concern to the user interface developer, namely:
Portability: It should be straightforward to port the interface to another machine or to another operating system. It is not sufficient to be ``device-independent.''
Configurability: It should be straightforward to configure (or extend) the interface to accommodate different users, or classes of users, or application environments. Both static and dynamic configuration should be supported.
The reader is referred to [26] for further discussion of these factors and related issues.
2.2 The reference model
To best describe the various levels of ``I/O'' involved in an interactive dialogue, one can consider the relevant hardware and software as consisting of four major components:
1. Hardware devices (e.g. display/frame buffer, micro-phones, mice)
2. Workstation agent
3. Dialogue manager
4. Applications, including the workstation manager
The relationships (and interfaces) between these components are depicted in Figure 1 and described further below.
picture(size 1.4in, postscript "figures/uia-layered.vps-fig")])
(a) Layered view.
picture(size 4.21in, postscript "figures/uia-no-febe.vps-fig")])
(b) Modular view, with interfaces.
Figure 1. Reference model: Two views.
The hardware devices provide the ``raw'' interface to the user. The workstation agent is responsible for managing those devices, presenting a device-independent (media-dependent1) interface to clients. It also provides the basic support for multi-tasking; specifically, it must multiplex the various devices between multiple clients. The multiplexing (or scheduling) ``policy'' is determined by the workstation manager.
1 The term ``media-dependent'' is actually more descriptive than ``device-independent'' since, in fact, ``device-independent'' software typically is constrained to a particular class of devices, that is, a particular medium.
The workstation agent provides only a set of relatively low-level I/O primitives. The dialogue manager is responsible for composing these I/O primitives into interaction techniques, and for selecting particular interaction techniques to satisfy individual application-specific tasks. The dialogue manager provides the application with the option of seeing a media-independent interface, as opposed to the media-dependent interface presented by the workstation agent. It also provides for the invocation of applications or methods, and for handling their responses. In short, the dialogue manager is responsible for providing true ``dialogues'' between users and applications.
Of course, as witnessed by contemporary window system environments, the user may be engaged in multiple simultaneous user-application dialogues. In order for the user to be able to switch contexts she must be able to engage in a ``meta-dialogue'' with the system. The management of this meta-dialogue is the responsibility of the workstation manager. The workstation manager effectively sets the policy for sharing devices; for example, whether windows on a display should be tiled (and if so, how) or overlapping. It must also provide (through the dialogue manager) the user interface to the meta-dialogue itself. With respect to contemporary window systems, the workstation manager is to the workstation agent as the window manager is to the window server. Different terminology is used to account for the fact that we must accommodate media other the display screens.
The only thing we note about applications is that, in our architecture, no application semantics are ``present'' in the workstation agent. We make no assumptions as to whether parts of the application are contained (downloaded) in the dialogue manager or not. However, we do assume that the workstation manager is, for all intents and purposes, an application — which happens to use the dialogue manager to interact with the user.
Finally, while we have used the expression ``the dialogue manager'' above, the system is not constrained to supporting only one dialogue manager at a time [25]. Suppose, for example, a user interface developer is testing a new dialogue manager; she would like to run it ``on top of'' the existing dialogue manager in order to have a stable base to fall back on. Alternatively, consider a situation where the user wishes to invoke a backward-compatible ``shell,'' such as the UNIX c-shell. Such a ``shell'' is a dialogue manager. Thus, one can imagine a system supporting multiple dialogue managers simultaneously; only for expository purposes will we continue to use the singular.
2.3 Model vs. architecture
The principal motivation for the decomposition of function proposed above is, simply, that it appears that many existing architectures for interactive software can be defined in terms of it. That is, it can be demonstrated to serve its principal purpose as a reference model!
A secondary motivation is to provide the converse, a basis for a well-structured, modular architecture — not just a model to which existing architectures can be compared. The OSI Reference Model has been used in this way to create the ISO protocol suite. We differ from that progression, however, in that we reject the notion of a layered architecture in which to get to layer N the application must ``go through'' layer N+1 (and vice versa, depending on the direction of control flow). Rather, as shown in Figure 1(a), each ``layer'' or module is assumed to have a well-defined interface, to which the application has direct access.
This bias in favor of ``modularity'' vs. ``layering'' has two beneficial side effects with respect to compatibility with existing software. First, applications that were written prior to the introduction of a dialogue manager can continue to function in the context of an existing workstation agent (or window system). Similarly, applications may continue to access devices directly, if necessary. Naturally, in that case, the user should beware of running any other application that uses the same devices at the same time.
2.4 Where's the UIMS?
The reader may have noticed that the expression ``UIMS'' has yet to appear in the presentation of the reference model. Indeed, our group decided that ``UIMS'' should either be scrapped altogether or used to refer to the combination of the dialogue manager, workstation agent and workstation manager as discussed above. That is, if ``user interface management system'' is to be applied solely to software (and not to include support mechanisms external to the computer), then it should be applied to all software that provides for the management of user interfaces. The remainder of this document uses the expression ``UIMS'' in precisely that manner — in contrast to the typical use of ``UIMS'' to mean what is referred to here as the ``dialogue manager.''
3. The workstation agent
The remainder of this report focuses on the workstation agent (WA). This section discusses the interfaces a WA should present to its clients, including dialogue managers, without proposing a specific strategy for implementing those interfaces. Subsequent sections suggest a specific implementation strategy, as well as discuss the relationship of the WA, as proposed, to existing or proposed standards.
The workstation agent provides the basic interface between the hardware and the rest of the system. One of its principal functions is to hide any idiosyncrasies of that hardware — through the use of virtual devices. Rather than dealing directly with the hardware, most applications request input from a virtual keyboard or mouse, for example, and write output to a virtual store.2 Depending on the kind of real devices being emulated, the characteristics of the virtual devices will vary widely. In the simplest implementation, each workstation agent emulates exactly one input device and one output device. More sophisticated workstation agents might emulate multiple classes of devices simultaneously.
2 This is not to say that idiosyncrasies must be hidden. Indeed, in many situations the details of specific devices are exactly those that determine the quality of interaction [8,12].
Historically, the most common pair of devices emulated has been the keyboard and display of a page-mode (character) terminal, exemplified by the DEC VT-100. Even in this case, the workstation agent can be thought of as emulating different types of devices, corresponding to the various input and output modes provided by a VT-100 — character-at-a-time versus block transmission, local editing facilities, and the like. In general, the workstation agent, through its virtual devices, provides a set of facilities that might be referred to as ``cooked I/O'' — ranging from line-editing to page-editing to graphics-editing and so forth. In support of this, the virtual input devices are usually capable of ``echoing'' the input and local editing operations on an appropriate virtual output device. These facilities are enabled and disabled individually for each virtual device.
Largely due to this linking of virtual input to virtual output devices for purposes of echoing, it has been common to regard (input device, output device) pairs, or virtual terminals, as the entities of interest. Unfortunately, such one-to-one coupling causes problems in situations where an application requires a many-to-many mapping of input to output devices, as is the case in the context of real-time multimedia conferencing (cf. [2,18,24]). Therefore, we advocate a separation of input from output and eschew the expression ``virtual terminal.''
The number of virtual devices may be different from the number of real devices. There may be more — as in the case of multiple virtual displays (windows) — or less — as in the case of locator and button devices being combined as a mouse device. The latter case is more typical of input devices, where events from multiple real devices are merged into a single input stream.3
3 To be more precise, there may be multiple classes of virtual device for each real device, and there may be multiple instances of each class. We use the expression ``virtual device'' to refer to an instance.
The expression ``workstation agent,'' then, derives from the fact that it ``stands in'' or ``acts as an agent for'' the physical hardware. Many contemporary systems have used the expression ``window system'' to refer to this functionality, but that expression does not suggest a system that can accommodate communications media other than traditional display devices. While for purposes of exposition the bulk of the following discussion does, in fact, focus on the case of I/O being graphical (or textual), we believe most of the concepts are applicable to other media. Our use of the term ``window'' will, in general, be restricted to situations where we are referring to portions of the display.
3.1 An input model
We propose that the workstation agent should, as a minimum, offer one class of (composite) virtual input device, namely, one that interleaves events from all real hardware devices into a single stream. Thus, the interface to clients is a stream interface. A client must ``read'' the stream to receive input.4 Naturally, a client may choose to open multiple such streams, perhaps one for each window. Other features pertaining to input handling are best described working from the hardware up. To illustrate these features, we will describe how the WA handles a mouse event (refer to Figure 2).
picture(size 3.53in, postscript "figures/wa-input.vps-fig")])
| | | | | | | window determination
←|← determination of clients
| | replicating ( if <1 client interested in window)
|  filtering (possible discarding)
|←|←|←| multiplexed with other device input
| input queue to client
Figure 2. Processing a mouse event.
4 This does not mean that the client must issue a new read request every time it wants a new input token. In an asynchronous message-passing system, for example, the client might open a stream for reading, after which the WA would send events to the client as they become available. The client would simply receive those events from its input message queue.
A mouse event, on arrival at the WA, is analyzed to determine which client (or clients) are interested in the event. Multiple clients may be associated with a single window, for example, if one client is responsible for lexical feedback (echo), while a different client is responsible for semantic feedback. Alternatively, there could be a client responsible for the mouse echo for all windows; it would receive a copy of all mouse events, and so be able to update the cursor position on the screen in parallel with another copy of the event being sent to other clients.
The determination of which clients are interested in the event is a three-step process. First, the WA determines the window (or windows) in which the mouse is positioned when the event occurred. Second, the WA determines which of the clients associated with those windows are interested in input from the device that generated the event. If there is more than one such client, the event is replicated, with a copy being targeted for each client. This then is the demultiplexing stage — the single stream of mouse input branches into several streams, one for each client. Of course, if there are no clients interested in an event, it is discarded.5
5 Events should not be discarded until this determination can be made with certainty. There may be situations where a device is generating events faster than the WA can demultiplex them, in which case those events should be queued elsewhere until they can be demultiplexed. In the most pathological situations even these queuing limits may be exceeded, in which case events must be discarded.
At this point, a copy of the input event has been produced for each interested client. The final step in client determination is to determine, for each client selected thus far, whether the client is interested in this particular event. That is, in general, a client may specify a filter that performs an action that is dependent on the actual input values as well as on the originating input device and the client's specifications for that device.6 For example, it would be quite reasonable for an input event to be discarded if it is not ``significantly different'' from the most recent previous reading in the client's queue. Alternatively, the ``new'' event (the one being processed) might cause the ``old'' event (the one in the queue) to be discarded. The measure of ``significant difference'' may well vary from one client to another; a typical measure for the mouse would be distance moved since last event, or whether a button transition had occurred.
6 The filter need not be procedural, but could consist instead of a mask or template.
Once it has been decided that the event should be queued for the client, it is entered in the queue in time-sequence order with input from all other devices. This is the multiplexing stage, and is the final step in the input flow through the WA. Subsequently, the event will be returned to the client in response to a ``read'' request.
A final desirable feature is support for out-of-band data. For example, a CTRL-C typed on a keyboard, indicating an abort request, should be processed before any pending events. The necessary input might be inserted at the head of the client's input queue, or it might be sent using an asynchronous event notification mechanism distinct from the stream interface we have discussed thus far. For reasons of simplicity, we advocate the former approach, relying on the client-side interface routines to accommodate out-of-band data. On the other hand, since out-of-band data often means that pending events should be flushed or modified in some other way, the client interface should provide the corresponding functionality.
Although we know of no current system that adheres precisely to the above description, the input models adopted by the Adagio project [41,42] and by the ANSI X3H3.6 Committee on Display Management for Graphical Devices differ only slightly. TheWA [30], X [38], and NeWS [21] differ in more details but are similar in spirit — including having adopted the notion of a single composite virtual input device.
3.2 Output: retention issues
The workstation agent is responsible for posting images on a display, sending controls to a sound generator and managing all other devices capable of providing feedback to the user. It is not our intention in this section to describe the media-dependent output primitives available to clients of the WA, but rather to discuss one of the most controversial issues in ``window system'' design: retention, the degree to which information needed to generate the output should be retained in the WA. As in the previous discussion, what follows looks at the case of the output being graphical or textual, and being displayed on a window of a graphics display, although many of the ideas presented are applicable to other media.
There are four principal levels of retention:
1. None
2. Device-dependent (e.g. a bitmap)
3. Partially structured (e.g. a segmented display file)
4. Complete (e.g. a structured display file)
If the WA does not retain any portion of the image that it has placed on the screen, the application must then be responsible for all window regeneration whenever that becomes necessary. The WA's role becomes one of an informer — it informs the application that its window (or a specific portion of its window) has been destroyed, and must be replaced. It is then up to the application to enter the necessary information into its output stream that will cause the WA to repair or replace the contents of the window. The application module must keep, at all times, enough information to regenerate the image on the screen. This is the approach taken in SunWindows [40], X [38], and Andrew [22], for example.
Alternatively, the WA may keep a bitmap representation of the full window, including that part of the window covered by other windows. In this case, window movements do not require any work from the application, but any pan, scroll, zoom or resize of the window will require the application to take on the responsibility of redrawing the window. This approach was taken, for example, in the Teletype 5620 (formerly the Blit) [35].
The WA may keep a partially structured representation of the contents of the screen. For example, a WA handling a text window may keep the textual representation of the contents of the window. In this situation, the WA can handle any regeneration of the output unless a ``fault''occurs — analogous to a page fault — where the WA requires information from the application which precedes or follows that portion of the output structure retained by the WA.
A model of full retention can be illustrated with the idea of the WA having a built-in implementation of PHIGS [11]. Such a WA would retain the complete structure, the model, of a 3-D graphics image. The WA would be able to redisplay the 3-D image independently after any window modification mentioned above, or after such window parameter changes such as movement of the viewpoint or modification of the display attributes. Moreover, it would be possible for the WA to provide multiple independent views of the same image — independent of the application. This is the approach taken (for graphics) in the Virtual Graphics Terminal Service (VGTS) [28,34] and its successor, The Workstation Agent (TheWA) [30]. It is also an approach that can be taken in NeWS (formerly SunDew) [21], where Postscript [1] is supported rather than PHIGS.
The tradeoffs between these approaches are significant. For example, complete retention allows for rapid reposting of the window when it is moved, changes size or is uncovered — in contrast to the sometimes excruciating slowness with which a screen is refreshed when no information is retained. This is especially advantageous when the application is running on a machine remote from that running the WA (cf. [28,29,34]). However, the method used for this retention is highly dependent on the types of applications being supported and may lead to a high degree of redundancy — data structures in the application plus display file in the WA plus frame buffer, for example. Consequently, the idea of ``forcing'' such complexity upon all applications is inappropriate.
Fortunately, as demonstrated by TheWA and NeWS, it is possible for a single workstation agent to implement multiple retention models simultaneously. Indeed, the approach is to provide multiple classes of virtual output devices for each real output device. For example, TheWA supports one class of virtual output device that emulates the store of a VT-100 and one class that emulates a PHIGS-like structured display file. However, adding this much complexity to the workstation agent may be inappropriate in some situations, especially if memory space is a concern.
4. The workstation manager
The workstation manager permits the user to control the meta-dialogue consisting of the combination of all user-application dialogues. With respect to display management, in particular, the workstation manager is the module that enforces the constraints on window placement that the user desires. For example, it enforces the precise position and front-to-back ``ordering'' of windows.
It is crucial to appreciate the distinction between workstation managers and workstation agents. The manager exerts ``control'' over the ``facilities'' provided by the agent. Different users may have different styles of control. For example, one user may wish to position all windows manually, whereas another user may prefer the system to determine the ``best'' position automatically. One user may prefer his windows to be tiled, whereas another may prefer them to overlap. Different users may prefer different border styles. All of these choices are policy decisions that are distinct from the mechanisms provided by the workstation agent, although these policies are then implemented by the WA.
Not only does the manager-agent split reflect a separation of policy from mechanism, but it also removes from the workstation agent the need to be able to interact with the user to set those policies. Rather, the workstation manager, like any other application, is free to interact with the dialogue manager or the workstation agent in any manner it sees fit. Consequently, as with any other application, the manner in which it interacts may be changed at any time.
In contrast, many existing window systems fail to maintain an adequate separation of policy from mechanism, such that the necessary user interface functions are ``wired-in'' and cannot be changed. Fortunately, both X and NeWS have adopted the more flexible approach suggested here.
5. Relation to standards efforts
International standardization groups have worked hard towards defining generic concepts to be used in computer graphics programming, specifying dialogues for addressing graphical devices, and standardizing techniques for the storage and transfer of graphical information. The goal has been, of course, to permit applications to be portable among hardware configurations and so to give the user of any single graphics workstation access to a far greater amount of application software. However, each of the graphics standards — GKS, PHIGS, CGI, and NAPLPS — imposes its own constraints, based on its own model of a graphics workstation. Whether or not this model matches the model we have discussed above, these standards are so widespread in their use that they must be incorporated into our recommendations. The goal of this incorporation would be to maintain the capabilities of the workstation agent while providing the standardized interface to support existing applications.
There are three basic approaches to accommodating a standard graphics system (SGS) in the context of a workstation agent:
1. Position the workstation agent ``on top'' of the SGS.
2. Position the SGS on top of the workstation agent.
3. Embed the SGS in the workstation agent.
5.1 Workstation agent on top of graphics package
One possible approach to the combining of a standard graphics system (SGS) with the WA is to place the SGS between the WA and the devices. This permits the WA to communicate with the devices through the SGS, using existing software, and certainly gives it access to multiple input and output devices if they are available. At the higher level, the WA allows its clients to specify policies for using and controlling shared interactive resources. This approach yields a degree of device independence in one direction, and possible compatibility with the standard graphics system in the other.
KI [10] and MS Windows [33] are examples of this approach. At the heart of the KI system is GKS, which provides a set of device-independent functions for graphical data processing. On top of GKS are the KI kernel commands, which allow interactive use of the GKS functions with a common set of commands in addition to system functions for echoes, error messages and end-user defined dialogues.
There are several pitfalls to this approach. Current SGS's do not provide the functionality to readily implement the input model we have described above. Either the WA must live with the input model of the graphics package or it must define a new model and provide the mapping between the two. Also, SGS's do not normally provide proper windowing capabilities for graphics devices. For example, GKS was not designed to be run on workstations that might dynamically change size. Another problem, in CGI, is that the limited capability of device drivers may restrict the system's ability to take advantage of the increasing hardware support for complex graphic functions such as clipping to complex boundaries. In brief, we do not believe this to be a viable approach in most situations.
5.2 Graphics package on top of workstation agent
The second approach is to treat the SGS as a layer between the workstation agent and its clients, albeit a layer bypassed when required. The WA interacts directly with the hardware devices, providing its own graphics routines, font files and color maps. Consequently, it is able to implement input and output models other than (and presumably better than) existing standards. However, for each desired SGS a standard library is provided, which can be linked with the client to provide a ``front end'' that makes the WA look like the SGS. In situations where the SGS expects exclusive control of a device, some support may be required in the WA.
X is an example of this approach. While providing its own graphics interface, the X distribution includes a number of front end libraries. NeWS is another example in this category. Any SGS can be implemented on top of the PostScript graphics primitives due to the high level of abstraction it provides. The associated code can either be linked with the client or downloaded into the NeWS server itself, as discussed below.
The major advantage of this approach is that the WA can support applications written for any SGS for which an appropriate front end has been written — one that must include a mapping from the WA's input model to the one required for the SGS. Moreover, the WA can be ported to other systems independent of the SGS and more easily modified to accommodate new styles of interaction.
5.3 Graphics package inside the workstation agent
The final approach to accommodating existing standards is to embed the SGS inside the workstation agent. The principal advantage of this approach is reducing the potential for redundancy, particularly of data structures. For example, if PHIGS were embedded inside the workstation agent, then it is reasonable for the workstation agent to retain all output in structured display files, with the corresponding performance advantages discussed above. If, instead, the workstation agent adopted a different retention model and PHIGS were implemented in client libraries, then the data would be have to be stored both in structured display files within the client (library) and in the appropriate output structures within the WA.
The principal disadvantage of this approach is considerably increasing the size and complexity of the WA compared to the previous approaches. Nevertheless, it is the approach taken by the VGTS and TheWA, with considerable success. Moreover, it is an approach that could easily be taken with NeWS.
5.4 Display management for graphical devices
In addition to considering standard graphics packages, we must take account of the recent work by the ANSI X3H3.6 Committee on Display Management for Graphical Devices [39]. This work has focused on a variety of issues discussed in this report, primarily input models and allocation of the display between multiple applications. Indeed, their input model and display resource arbiter are quite similar to the input model and workstation manager, respectively, described here. Although this effort is still in an early stage, the obvious overlap between their work and the work reported in this paper makes it imperative that each group (participants at Seattle and members of X3H3.6) continue to track the work of the other group.
6. Concurrency and multi-process structuring
With the proliferation of networks and multiprocessor workstations, concurrency and parallelism have been of increasing concern to systems developer.7 Until the advent of window systems, however, the typical user had little cause to be aware of any underlying concurrency — as provided by any multi-tasking operating system, for example. She did one thing at a time. Only when window systems made it not just possible, but easy, for a user to interact with multiple applications (more-or-less) simultaneously did concurrency have a significant impact on her model of the system.
7 By ``parallelism'' we mean truly parallel execution, possible only by using multiple processors. By ``concurrency'' we mean ``logical parallelism'', that is, achieving the same semantic effect as parallel execution but without requiring multiple processors. Thus, from a semantic point of view, ``concurrency'' subsumes ``parallelism''.
Thus, even casual users are becoming accustomed to at least one fundamental concept of concurrency, namely, multiple threads of control. The issues at hand, however, are whether concurrent programming mechanisms and practices can assist in the implementation of interactive software, and, if so, how. The principal mechanisms of interest are processes (threads of control) and the mechanisms to prevent processes from interfering with each other (mutual exclusion), to permit them to synchronize their dialogue (synchronization), and to permit them to exchange data (communication). The selection of and subsequent use of those mechanisms should be determined entirely by the choice of concurrency model — or paradigm for multi-process structuring [13].
We assume that the underlying operating system already supports processes and discuss only the issues surrounding multi-process implementation of (sub)systems outside the kernel. Of course, concurrent systems of this sort predate window systems and UIMS's by at least a decade, so our intent is not to give a comprehensive overview of the field — for which the reader may wish to refer to the excellent tutorial by Andrews and Schneider [4]. Rather, we limit our discussion to the basic motivations for and application of multi-process structuring in the context of a UIMS.
6.1 Motivations
Historically, there have been four principal motivations for composing software systems out of multiple interacting processes:
1. Modular decomposition: The decomposition of large systems into smaller, self-contained pieces, with the possible advantages of improved portability, configurability, maintainability, and . . .
2. Protection: The information hiding that comes with modular decomposition yields increased protection, especially if processes (modules) use disjoint address spaces.
3. Distribution: If processes have disjoint address spaces, and communicate only through messages, it is straightforward to distribute the various processes across multiple (loosely-coupled) processors.
4. Performance: If processes can be distributed across multiple processors (whether loosely- or tightly-coupled), and can run in parallel, performance may be enhanced.
Unfortunately (for the case of multi-process structuring), most if not all the benefits of modular decomposition accrue to contemporary data abstraction languages. Moreover, many early concurrent programming mechanisms, especially message-based systems, gained a reputation for poor performance and difficult-to-get-right programming.
However, more recent concurrent programming environments have, in general, overcome their performance and programming problems. That does not necessarily mean that passing a message is as cheap as a procedure call, for example, but that overall system performance degradation, if it occurs, is outweighed by the advantages of multi-process structuring. In particular, data abstraction by itself does not help when the system in question must deal with multiple threads of control — for example, input from multiple devices or output from multiple applications. In that case multiple processes, one for each thread, say, are preferred over one process that needs to multiplex itself between the threads — from an ease-of-programming point of view [13,19,27,31]. Without employing multiple processes it is also impossible to distribute a software system across multiple processors and, therefore, impossible to take advantage of any available parallelism.
Finally, suppose there were only two modules to worry about, the application and the workstation agent. There are three basic ways to ``connect'' these two modules:
1. Implement the workstation agent entirely in the kernel, accessing it via system calls.
2. Provide some kernel support, but link the bulk of the workstation agent with the application.
3. Implement the workstation agent as a server process, accessing it via interprocess communication. (Still requires some kernel support for low-level device access and for IPC.)
Without repeating all the arguments, method 1 is simply incompatible with current trends in operating system design, and should not be considered for any but the most specialized applications environments. Experience with SunWindows provides ample evidence that method 2 suffers from three severe disadvantages [37]:
1. A high degree of redundancy — if shared libraries are not supported,
2. Complex synchronization problems — multiple clients attempting to repaint the screen at the same time, for example,
3. Inadequate support for remote (distributed) applications.
That leaves method 3, and the success of Andrew [22], X [38], NeWS [21], and other server-based window systems — for UNIX — leave little doubt as to the superiority of that method over the others — for any multi-tasking environment.
Therefore, we see no realistic alternative to employing multiple processes in the construction of interactive software — in the context of emerging hardware environments. The real issue is how to do so in such a way as to gain the indicated advantages while avoiding the twin perils of poor performance and ``difficult-to-get-right'' programming.
6.2 Mechanisms and paradigms
Historically, the mechanisms and paradigms for concurrent programming have fallen into three broad classes, based on their underlying memory architecture:
1. Shared memory
2. Disjoint address spaces
3. Hybrid
The first two classes represent two ends of the spectrum, with competing advantages and disadvantages. In shared-memory-based systems the interface to the various process interaction mechanisms is the procedure call. Data need never be copied, and creating, destroying and switching between processes in the same address space do not require significant memory map manipulation; hence the expression ``lightweight process.'' In disjoint-address-space-based systems the interface is the message,8 data needs to be copied, and creating, destroying and switching between processes requires a significant amount of memory map manipulation; hence the expression ``heavyweight process.'' Consequently, shared-memory-based mechanisms typically offer superior performance. Disjoint-address-space-based mechanisms, on the other hand, offer transparent distribution and better protection.
8 All remote procedure call mechanisms are built on top of messages, by whatever name.
Achieving a more balanced mix of advantages and disadvantages suggests a hybrid system. In such systems, teams of (lightweight) processes share a single address space and may employ shared-memory-based mechanisms to interact with each other. Processes on separate teams interact via messages.9 A team, then, resides on a single host (including shared-memory multiprocessors) and its constituent processes can interact efficiently by virtue of their shared address space. However, the software system as a whole can still be distributed (and possibly made more reliable) by placing different teams on different hosts.
9 The team is the equivalent of a heavyweight process — as in UNIX, say.
A typical server, for example, might employ separate processes on the same team for each outstanding ``connection'' (open resource) — each dialogue, or each display file, or each input stream, for example. In addition, helper processes might be employed as timers or to perform operations concurrently with the main server, such as repainting the contents of one window concurrent with processing updates to the display file associated with another window. In either case, each process can be ``single-threaded'' — taking a request and processing it to completion before asking for another request — and therefore be almost as easy to program as a typical sequential program. The choice of message-passing paradigm — in particular, whether asynchronous or synchronous — can lead to additional simplification; the reader is referred to [4] for an overview of the relevant issues.
The hybrid ``teams of processes'' paradigm was pioneered in the Thoth operating system [13,15] and employed in all of its descendants, including the V-System [6] and Harmony [20]. Examples of interactive software constructed for these systems are discussed in [5,7,9,24,25,41,42]. The same basic paradigm has also been adopted for the Argus [32], Eden [3], Verdix Ada [43], and Mach [36] systems, as well as by the innumerable groups who have developed ``lightweight process packages'' for UNIX — including those employed internal to the X and NeWS window servers. It should come as no surprise, then, that it is the basic paradigm we advocate for multi-process structuring.10
10 Some of us also believe that there is reason to support the sharing of memory between teams, as might be the case for the semantic support component discussed in [16], for example. However, this issue was not discussed in sufficient depth as to offer it as a consensus view.
6.3 Key performance issues
If the reader remains unconvinced as to the viability of the ``teams of processes'' paradigm for multi-process structuring of interactive software, it is probably due to remaining doubts as to the efficiency of interprocess communication between disjoint address spaces. We offer several observations to assuage those doubts. First, the reader should remember that the ability to distribute software across multiple processors, without fine-tuning the software for the particular hardware environment, can in fact improve performance — by virtue of parallelism or access to more powerful processors. Second, increasingly efficient techniques are being developed for exchanging messages between processes on the same machine. For example, numerous systems exist which, running on Motorola 68020-class hardware, can execute complete send - receive - reply transactions in under 1 millisecond. More importantly, evidence is accumulating that, whatever the costs of interprocess communication, it represents but a small piece of overall computation time (cf. [14,17,29]).
The remaining observations pertain to the situation where large numbers of separate events must be exchanged, as when many input or output events are being generated. If a separate message is sent for every event, then performance (in a single-machine environment) will naturally be slower than if each event resulted in a procedure call. Two fundamental techniques limit this degradation. First, wherever possible, multiple events should be batched in a single message. Second, wherever possible, messages should be pipelined; that is, replies should be eliminated. Experiments with the VGTS have shown order-of-magnitude performance improvement due to application of these techniques [29]. Similar observations have been made with respect to X and Andrew [22,37].
6.4 Application to a UIMS
Applying the above techniques to a UIMS, we postulate the existence of at least three distinct teams (address spaces). One team contains the workstation agent, one contains the dialogue manager and there is one team for each application including the workstation manager. Within the workstation agent team, separate (lightweight) processes may be used for each device, or even for each input or output stream. However, each has access to the same queues, for example, by virtue of their shared address space. Similarly, within the dialogue manager team, separate processes may be used for each dialogue, but each has access to the shared databases needed to maintain user profiles, dialogue specifications and the like. The resulting architecture is precisely that of Figure 1(b), where ovals correspond to teams and the interconnections correspond to message paths (possibly hidden by stream interfaces or the like).
If, for efficiency reasons, three disjoint address spaces prove too many, we believe that a merger of the workstation agent and dialogue manager would be preferable, in most case, to a merger of the application and the dialogue manager. This merger still permits the UIMS to be run on one host, while the application(s) are distributed. From our experience, the required communication bandwidth within the UIMS is considerably higher than the necessary communication bandwidth between applications and the UIMS. Moreover, as in the case of SunWindows, merging the dialogue manager with the applications typically means merging each application with (a copy of) the dialogue manager, leading to a high degree of redundancy and potential synchronization problems. On the other hand, our experience may not bear out in the event that the dialogue manager becomes more tightly coupled with the application, as proposed in [16].
Fortunately, we believe that the cost of using disjoint address spaces in many systems is already sufficiently low as to render the efficiency argument moot, and that this trend will continue. Experience with the VGTS, TheWA, X, Andrew, NeWS and Adagio all support this observation, at least with respect to the use of disjoint address spaces for applications and the workstation agent.
7. Open issues
During our discussions at the Seattle workshop, many issues were left unresolved, for either lack of time or lack of an appropriate solution. This section gives an overview of the most important unresolved topics.
7.1 Sample mode
Our proposed input model for the WA is based on reliable streams of input events. Physical devices produce events that ultimately are consumed by one or several clients at higher levels. By continuously monitoring the event stream, clients do, in principle, know the current state of the device. If, however, there are many events pending on the stream, this is no longer the case. To handle such situations, it may be necessary to let clients directly sample virtual devices. This can be done by adding additional ``entry points'' to the WA or by having ``modal'' streams.
7.2 Generalized keymaps
With the exception of the QWERTY keyboard, physical input devices do not strongly follow any standards. For reasons of device independence the WA might define a set of logical input events onto which physical events may be mapped. The required functionality must permit the detection of state transitions of the physical devices and map these into the set of logical input events
While systems such as NeWS support the equivalent of procedural keymaps — where a client-provided function is executed on receipt of a particular physical event — we tend to believe that such functionality is best placed in the dialogue manager.
7.3 Cut and paste
Direct manipulation interaction often uses a cut-and-paste paradigm to exchange information between different applications. In situations where the WA is retaining the output structures for the applications being modified11 and there is no additional semantic information associated with those structures, cut-and-paste can be effected entirely within the WA. This may even be true in the case of cutting information represented in one medium and pasting it (back) using another medium.
11 Either the source or destination or both depending on whether the operation is a cut, a copy, or a paste.
In most situations, however, applications expect to be informed as to all input, and both cutting and pasting constitute input. Sometimes, as when pasting text at the insertion point of the ``active'' window, the text can simply be inserted into the (active) client's input queue. In other situations the data may have to be handled ``out of band.'' If the operation is legal for the given application, it must update its internal data structures to reflect the user interaction. Otherwise, the application must undo the cut-and-paste operation as effected by the WA. In short, implementing universal cut-and-paste remains a open problem.
7.4 Structure of event records
Different input devices may provide different types of information. For instance, a pointing device must contain a location, whereas a keyboard event may only contain its ASCII value. Event streams must accommodate events from all devices, so a choice must be made between placing variant records in the event stream or defining a single record format that accommodates all possible events. NeWS, in particular, takes the latter approach to its extreme, by encoding the entire state of the keyboard in every event record. It remains to be seen what effect this has on performance.
8. References
[1] Adobe Systems Incorporated. PostScript Language Reference Manual. Addison-Wesley, 1985.
[2] Aguilar, L., Garcia Luna Aceves, J.J., Moran, D., Craighill, E.J. and Brungardt, R. Architecture for a multimedia teleconferencing system. In Proc. SIGCOMM '86 Symposium on Communications Architectures and Protocols, (August 1986). ACM, New York, 126-136.
[3] Almes, G.T., Black, A.P., Lazowska, E.D. and Noe, J.D. The Eden system: A technical review. IEEE Transactions on Software Engineering SE-11, 1, (January 1985), 43-59.
[4] Andrews, G.R. and Schneider, F.B. Concepts and notations for concurrent programming. ACM Computing Surveys 15, 1, (March 1983), 3-43.
[5] Beach, R.J., Beatty, J.C., Booth, K.S., Plebon, D.A. and Fiume, E.L. The message is the medium: Multiprocess structuring of an interactive paint program. Proceedings of SIGGRAPH'82 (Boston, Mass., July 2630, 1982). In Computer Graphics 16, 3 (July 1982), 277287.
[6] Berglund, E.J. An introduction to the V-System. IEEE Micro, (August 1986), 35-52.
[7] Berglund, E.J. and Cheriton, D.R. Amaze: A multiplayer computer game. IEEE Software 2, 3, (May 1985), 30-39.
[8] Bolt, R.A. The Human Interface: Where People and Computers Meet. Lifetime Learning Publications, 1984.
[9] Booth, K.S., Cowan, W.B. and Forsey, D.R. Multitasking support in a graphics workstation. In Proc. 1st International Conference on Computer Workstations, (November 1985), IEEE, 82-89.
[10] Borufka, H.G. and Pfaff, G. The design of a general-purpose command interpreter for graphics man-machine communication. In Man-Machine Communication in CAD/CAM. Sata, T. and Warman W. (eds.), North-Holland, 1981.
[11] Brown, M. and Heck, M. Understanding PHIGS: The Hierarchical Graphics Standard. Megatek Corporation, San Diego, CA, 1985.
[12] Buxton, W. Lexical and pragmatic considerations of input structures, Computer Graphics 17, 1 (January 1983), 3137.
[13] Cheriton, D.R. The Thoth System: Multi-Process Structuring and Portability. North-Holland/Elsevier, 1982.
[14] Cheriton, D.R. The V Kernel: A software base for distributed systems. IEEE Software 1, 2, (April 1984), 19-42.
[15] Cheriton, D.R, Malcolm, M.A., Melen, L.S. and Sager, G.R. Thoth, a portable real-time operating system. Communications of the ACM 22, 2, (February 1979), 105-115.
[16] Dance, J.R. et al. Report on run-time structure for UIMS-supported applications. In this issue, Computer Graphics 21, 2 (April 1987).
[17] Fitzgerald, R. and Rashid, R.F. The integration of virtual memory management and interprocess communication in Accent. ACM Transactions on Computer Systems 4, 2, (May 1986), 147-177.
[18] Forsick, H.C. Explorations in real-time multimedia conferencing. In Proc. 2nd International Symposium on Computer Message Systems, (September 1985). IFIP, 299-315.
[19] Gentleman, W.M. Message passing between sequential processes: The reply primitive and the administrator concept. Software—Practice and Experience 11, 5, (May 1981), 436-466.
[20] Gentleman, W.M. Using the Harmony operating system. Technical Report NRCC-ERB-966, Division of Electrical Engineering, National Research Councial of Canada, May, 1985.
[21] Gosling, J. SunDew: A distributed and extensible window system. In Methodology of Window Management, F.R.A. Hopgood, et al. (eds.), Springer-Verlag, 1986, 4758.
[22] Gosling, J and Rosenthal, D. A window manager for bitmapped displays and UNIX. In Methodology of Window Management, F.R.A. Hopgood, et al. (eds.), Springer-Verlag, 1986, 115-128.
[23] Hopgood, F.R.A., Duce, D.A., Fielding, E.V.C., Robinson, K., and Williams, A.S. (eds.) Methodology of Window Management. Springer-Verlag, 1986.
[24] Lantz, K.A. An experiment in integrated multimedia conferencing. In Proc. CSCW '86: Conference on Computer-Supported Cooperative Work, (MCC Software Technology Program, December, 1986). 267-275.
[25] Lantz, K.A. Multi-process structuring of user interface software. In this issue, Computer Graphics 21, 2 (April 1987).
[26] Lantz, K.A. On user interface reference models. SIGCHI Bulletin 18, 2, (October, 1986), 36-42.
[27] Lantz, K.A., Gradischnig, K.D., Feldman, J.A. and Rashid R.F. Rochester's Intelligent Gateway. Computer 15, 10, (October 1982), 54-68.
[28] Lantz, K.A. and Nowicki, W.I. Structured graphics for distributed systems. ACM Transactions on Graphics 3, 1, (January 1984), 23-51.
[29] Lantz, K.A., Nowicki, W.I., and Theimer M.M. An empirical study of distributed application performance. IEEE Transactions on Software Engineering SE-11, 10, (October 1985), 1162-1174.
[30] Lantz, K.A., Pallas, J., and Slocum, M. TheWA beyond traditional window systems. Internal Memo, Distributed Systems Group, Department of Computer Science, Stanford University.
[31] Liskov, B. and Herlihy, M. Issues in process and communication structure for distributed programs. In Proc. 3rd Symposium on Reliability in Distributed Software and Database Systems, (October 1983). IEEE, 123-132.
[32] Liskov, B.H. and Scheifler, R. Guardians and actions: Linguistic support for robust distributed programs. ACM Transactions on Programming Languages and Systems 5, 3, (July 1983), 381-404.
[33] Microsoft Windows Software Development Kit, Microsoft Corporation, 1986.
[34] W.I. Nowicki. Partitioning of Function in a Distributed Graphics System. PhD thesis, Stanford University, 1985.
[35] Pike, R. Graphics in overlapping bitmap layers. ACM Transactions on Graphics 2, 2, (April 1983), 135-160.
[36] Rashid, R.F. Threads of a new system. UNIX Review 4, 8, (August 1986), 36-49.
[37] Rosenthal, D.S.H. Toward a more focused view. UNIX Review 4, 8, (August 1986), 54-63.
[38] Scheifler, R.W. and Gettys, J. The X window system. To appear in ACM Transactions on Graphics, April 1987.
[39] Steinhart, J.E. Display management reference model: Preliminary first draft. Technical Report ANSI X3H3.6/86-41, ANSI Committee X3H3.6, September, 1986.
[40] Programmer's Reference Manual for the Sun Window System, Sun Microsystems Inc., 1985.
[41] Tanner, P.P., MacKay, S.A., Stewart, D.A. and Wein, M. A Multitasking Switchboard Approach to User Interface Management, Proceedings of SIGGRAPH'86 (Dallas, Texas, August 1822, 1986). In Computer Graphics 20, 4 (August 1986), 241248.
[42] Tanner, P.P., Wein, M., Gentleman, W.M., MacKay S.A. and Stewart D.A. The user interface of Adagio: A robotics multitasking multiprocessor workstation. In Proc. 1st International Conference on Computer Workstations, (November 1985). IEEE, 90-98.
[43] Verdix Ada Development System, Version 5.1, Verdix Corporation, 1985.
[44] Zimmermann, H. The ISO reference model. IEEE Transactions on Communications COM-28, 4, (April 1980), 425-432.