13Lantz.tioga
Rick Beach, February 13, 1987 8:58:36 pm PST
--- length: 1 in
Multi-process Structuring of User Interface Software
Keith A. Lantz
Department of Computer Science
Stanford University
Stanford, CA 94305
Abstract
Many contemporary user interface management systems suffer from the lack of adequate operating system support for multi-process structuring. They either adopt a single-process server approach, resulting in monolithic code, or are implemented as run-time libraries, resulting in a high degree of redundancy and complex synchronization problems. This paper, on the other hand, describes a methodology that takes advantage of lightweight processes and fast interprocess communication to structure user interface software as multiple cooperating processes. It demonstrates that such an approach can lead to exceptional flexibility — with respect to the addition of new functionality, in particular — without sacrificing performance.
1. Introduction
Despite the proliferation of high-performance graphics workstations, user interface design remains more of an art than a science. This is not so much due to lack of knowledge about what makes a good user interface as to the lack of appropriate tools and/or support environments with which to create that interface. One often overlooked deficiency is the underlying operating system.
This paper describes a methodology for implementing user interface software that relies on facilities rarely found in commercial operating systems. Specifically, we assume the ability to construct software out of multiple processes, with negligible performance degradation in the worst case and improved performance (due to parallelism) in the best case. Moreover, we assume the ability to introduce new systems software with negligible impact on existing systems software.
Section 2 outlines the ``ideal'' environment to which we inspire. Section 3 discusses the basic implementation methodology, with additional details and current status being provided in Sections 4 and 5, respectively. Section 6 presents an approach to real-time computer-supported teleconferencing that demonstrates the ability to introduce new software components without impacting existing software.
2. Target environment
Imagine an environment consisting of Sun- and VaxStation-class workstations with a mix of monochrome, gray-scale, and color displays, some with special peripherals (such as speech-processing units, digital scanners, or digital pen-and-tablet sets), all employing shared file servers, backend computation engines, and other specialized facilities. All workstations are interconnected by a high-speed institutional (inter)network. Institutional networks are in turn interconnected by long-haul networks using T1 (or equivalent) interconnections.
From the point of view of a user sitting in front of one of these workstations, the following functionality is available:
· The user may engage in multiple activities simultaneously, each of which may run local to the workstation, on a remote host, or distributed across several hosts.
· The user may interact with each application independent of the others, possibly using multiple communications media simultaneously.
· The user may ``transfer'' one application's output to another application's input, independent of data types involved. For example, data is automatically converted from text to voice to graphics, whenever possible.
· The user may engage in a real-time conference with any number of other users, and may invoke any available applications from within the framework of the conference. The output from the application is presented at every workstation, and each participant can provide input.
· Typically, the user interacts with applications via techniques of her own choosing. She may request multiple independent presentations of the same underlying data, each associated with different interaction techniques.1 Similarly, in the context of a real-time conference, each user may request different views of the shared data. The mapping between application abstractions and interaction techniques is performed by the system.2
1 The term ``presentation'' is meant to allow for media other than display devices.
2 Some applications rely on being able to use a particular interaction technique or communications medium (or, worse, a particular device), which may be different from the techniques and media the user prefers to use. For example, it does not make much sense to render an image as a menu or to generate a voice description (of the bits).
· The system maintains sufficient context information to provide effective help and error recovery. This might include intelligent modeling of the user, such that the system is capable of taking action on behalf of the user.
· Complete backward compatibility is provided, such that regardless of the evolution of the user interface, the user may always access those applications to which she previously had access.
From the point of view of a user interface developer, on the other hand, the key attributes of the user interface software are as follows:
· It is straightforward to port the software to another computer or to another operating system, not just to another device.
· It is straightforward to configure (or extend) the software to accommodate different users, or classes of users, or application environments. Both static and dynamic configuration are supported.
These two characteristics imply in turn that the interface software is modular — following basic principles of good system design.
Naturally, these characteristics represent the ideal, and achieving all of them at once may be impractical, if not impossible. Nevertheless, they represent a noble goal. Further elaboration on each of them, including (additional) motivations, may be found in [13].
3. Implementation methodology
There are indeed some (if remarkably few) examples of human-computer interfaces that meet many of the ``user goals'' presented above. Unfortunately, the associated software rarely meets the ``developer's goals'', even in those situations where the software conforms to contemporary user interface management system (UIMS) or user interface toolkit models [21,22,26]. For example, most (server-based) user interface management systems are too monolithic, do not interface well with standards graphics packages, and provide inadequate multi-tasking support. Most (library-based) user interface toolkits, on the other hand, tend to result in redundant code and to introduce complex synchronization problems. Moreover, most approaches of either ilk deal only with traditional media, such as keyboards, displays, and mice.
We are addressing these deficiencies via a hybrid of the UIMS and toolkit approaches, one that can accommodate ``new'' media. The implementation methodology is based in large part on experience with the user interfaces for two major distributed operating systems, Rochester's Intelligent Gateway [14] and Stanford's V-System [2] (articles that focus on the related user interfaces include [10,11,15,18,20]). Yet another implementation, which more closely follows the methodology presented here, is underway.
3.1 Decomposition of function
The adopted methodology is based on a derivative of the linguistic analysis model of human-computer interaction [6]. The derivative model introduces the notion of ``medium'' (in addition to ``device''), accounts for multi-tasking, and is stated in terms intended to provide better guidance for the developer. As discussed in a companion report in this issue [19], there are four levels of I/O:
· device-dependent I/O
· media-dependent I/O
· media-independent I/O
· workstation management requests
and five basic hardware/software components:
· hardware devices
· workstation agent
· workstation manager
· dialogue manager(s)
· applications
picture(size 4.21in, postscript "figures/uia-small.vps-fig")])
Figure 1. The basic user interface architecture.
The manner in which these components are interconnected by the various types of I/O is depicted in Figure 1. This figure differs from the corresponding one in [19] in the decomposition of the application into a frontend and a backend. There are two principal ways to view this split:
· The frontend, if it exists, is responsible for implementing application-specific interaction techniques that augment the facilities providing by the dialogue managers and workstation agent, whereas the backend implements the semantics of an application — its algorithms. Splitting the two components can be especially advantageous in the context of distributed applications, where the frontend remains on the user's workstation while the backend is distributed on one or more remote hosts.
· The frontend is actually a protocol translation agent [9,16] for an application that would otherwise be incapable of communicating with the user interface software. Such agents, however they are implemented, are a fundamental mechanism for providing backward compatibility.
3.2 Flow of control
The basic flow of control is as follows: When the system is initialized, the workstation manager creates a default dialogue manager. At a later time, the user may interact with the workstation manager to create other dialogue managers or to create new windows for existing applications. Typically, however, one dialogue manager will suffice; it interacts with the user (via the workstation agent) to determine what applications (or methods) should be invoked, to specify the arguments to those commands (or methods), and to invoke frontend and backend services as necessary. Naturally, any number of applications may be running at any time; the user controls the input focus via the workstation manager.
This flow of control is typical of what has sometimes been referred to as external control, where the user guides the interaction [26]. Unfortunately, the vast bulk of existing applications exert internal control; they drive the interaction. Such applications are accommodated in two ways. First, all requests for arguments can be directed to the dialogue manager rather than to a specific workstation agent (or device driver). The dialogue manager can then determine the best interaction technique to use to specify each argument — just as it determines the interaction technique(s) for specifying a complete command. Second, a pre-existing ``shell' or ``executive'' may be invoked as an application, from which the ``real'' applications may then be invoked.
3.3 A multiplicity of dialogue managers
From the discussion thus far the reader may have inferred another difference between the model assumed here and that presented in [19]: support for a hierarchy of dialogue managers, rooted at the workstation manager. The manner in which the user interacts with the workstation manager — the meta-dialogue with the system — may be fundamentally different than the manner in which the user interacts with a typical application. Therefore, while the workstation manager remains a client of the workstation agent, it is regarded more as a (special) dialogue manager than as an application. Similarly, pre-existing shells invoked by a dialogue manager can themselves be regarded as dialogue managers, however antiquated. In the remainder of this paper, we will continue to use different terms for these three basic types of dialogue manager, but the reader should remain aware of their underlying similarities.
3.4 Concurrency and multi-process structuring
Another implication of the previous discussion is the availability of facilities for run-time ``instantiation'' of code, as when the workstation manager creates dialogue managers. Although this can be achieved in a shared-memory environment via dynamic linking facilities, the more general approach is to use multiple processes. Aside from support for run-time instantiation and dynamic binding, there are many other motivations for the use of concurrent programming techniques in the construction of interactive software. The reader is referred to the companion report in this issue [19] for a general discussion of motivations and techniques. It suffices to say here that we have adopted the ``team of processes'' approach discussed in that report.
In particular, our implementation methodology relies on the availability of lightweight processes and fast interprocess communication — through message-passing, shared memory, or both. Each of the four major software components discussed above — applications, dialogue manager(s), workstation manager, and workstation agent — consists of one or more processes sharing a single address space. Processes sharing the same address space may synchronize their access to shared data with spin locks or messages. Processes in different address spaces communicate solely via messages.
4. Fine structure
4.1 The workstation agent
First, for purposes of this discussion, a device that implements both input and output is treated as consisting of two separate devices.
In order to be responsive to all input devices, each is associated with an independent process that accepts input only from its associated device. However, a great deal of experience has been accumulated suggesting that it is easier to write applications that accept a single stream of input events rather than multiple streams, one for each device. In our model, of course, most applications are not (or need not be) aware of individual devices or even media, but the dialogue manager must be. Therefore, it is desirable that the workstation agent optionally serialize the input from multiple devices that is logically part of the same input stream (dialogue). This suggests the addition of an additional process, the input demultiplexor.
On the output side the situation is somewhat different; the system is driving the device. This means that it is not necessary to have a separate process waiting at all times to serve that device. On the other hand, it is desirable to have separate processes for each communications medium — an audio manager, a video manager, and a display manager, say. The display manager, in particular, would provide all the ``standard'' graphical output facilities of the system — the output facilities of GKS or PHIGS, for example. In fact, it should be capable of providing multiple sets of facilities at the same time.
The workstation agent also must provide the basic support for multitasking. Specifically, the input demultiplexor must be cognizant of multiple input streams and each output handler must be cognizant of multiple output streams. In the case of the display manager, for example, it should support multiple display files. Input and output streams should be kept separate, thus allowing an application to have one input stream but multiple output streams. The one exception to this rule concerns feedback.
Recent experience has shown that the feedback facilities of standard graphics packages conflict with contemporary interest in object-oriented programming and direct manipulation [5]. Specifically, very little of what is referred to as ``feedback'' relies solely on lexical information, and therefore cannot be implemented at the workstation agent level. Nevertheless, to the extent that lexical feedback is desirable, it must be possible to indicate to the input handlers when and how they should communicate with their corresponding output handlers.
The resulting structure looks like that in Figure 2, which depicts the workstation agent (TheWA [17]) currently implemented for the V-System.
picture(size 4.75in, postscript "figures/wa-structure.vps-fig")])
Figure 2. The Workstation Agent.
4.2 The dialogue manager
Analogous to the use of multiple input and output handlers, the dialogue manager employs multiple dialogue handlers, one for each ongoing dialogue. A ``master'' dialogue manager may be employed to synchronize access to the various shared databases, or traditional shared memory techniques could be employed. (Remember that all dialogue manager processes share the same address space in this architecture.) However, each dialogue helper is single-threaded, simplifying its implementation.
The information being shared includes:
user profiles: Includes the preferences, authorizations, etc. of the user.
dialogue specifications: The specification of a dialogue in an abstract, media-independent syntax.
media-independent to media-dependent bindings: The different media-dependent representations of each abstract object.
media-dependent to device-dependent bindings: The different devices (and associated interaction sequences) for each communications medium.
help information: Keyed by...
history, context, and names: This information typically will be distributed throughout the various components of the system, not just isolated in a single database.
The common algorithms being shared are those that make decisions, based on the various constraints, as to how best to interact with the user or the application at any given time. For example, when operating under external control, each dialogue helper is ``driven'' by a dialogue specification. For every abstract object, the appropriate decisions must be made as to which input events are acceptable or which output events should be generated. Typically, this entails translation from abstract (media-independent) to media-dependent representations, bearing user preferences and the like in mind. Sometimes, the dialogue specification will mandate the use of a specific medium or even a specific device.
When running an application that is exerting internal control, the dialogue helper merely accepts requests from the application rather than being driven by a dialogue specification. If the application requests input or output in a media-independent fashion, it is still possible for the dialogue helper to decide which specific interaction techniques to employ.
Finally, to the extent possible, the dialogue handlers provide syntactic feedback. Especially when under external control, the handler will do a fair amount of input processing, generating feedback directly to the various output handlers.
4.3 The workstation manager
As noted above, the workstation manager is a ``special'' dialogue manager, one that manages the meta-dialogue consisting of all ongoing dialogues. A single process implementation will suffice in this case. Since it too interacts with the user, it must have access to some of the databases discussed in the previous section. At first glance, then, it seems reasonable to place the workstation manager on the same team as the other dialogue managers; indeed, the workstation manager could then be the ``master'' dialogue manager alluded to above.
However, if the workstation manager is in the same address space as the ``normal'' dialogue manager, neither can be as easily replaced independent of the other. Moreover, as previously discussed, the meta-dialogue for which the workstation manager is responsible may be qualitatively different than the typical user-application dialogue and, therefore, the databases may not be the same after all. Therefore, we suspect that the preferred approach is to place the workstation manager in a separate address space, replicating whatever information is common between it and the normal dialogue manager team.
5. Current status
Currently, the user interface software for the V-System conforms in spirit but not in detail to the methodology just presented. The V-System's workstation agent, TheWA, is implemented exactly as described above (refer to Figure 2), but does not currently support other than the traditional communications media. Like its predecessor, the Virtual Graphics Terminal Service (VGTS) [15,20], it provides multiple windows and a structured graphics system similar to PHIGS [4]. Unlike the VGTS, it provides a stream input model similar to that described in [19] and is structured as two distinct collections of processes, one for input (TheWAin) and one for output (TheWAout).
The default V-System dialogue manager consists of an exec server and one or more executives, all sharing the same address space, again conforming to the basic methodology. However, executives currently provide only limited functionality, equivalent to that of the UNIX c-shell. Consequently, applications, once created, interact directly with TheWA, rather than going through the dialogue manager.
Finally, the workstation manager, while providing the basic functionality defined for it above, is currently linked with TheWAin, as shown in Figure 2.
6. A flex test: integration of conferencing facilities
The configurability (or extensibility) of the proposed methodology has been put to the test in the form of an experiment to add real-time conferencing facilities. The goal was to incorporate those facilities into the V-System in such a way as to have minimal impact on existing applications and systems software. Elaboration on what follows may be found in [12].
6.1 The basic architecture
Integration of conferencing facilities into the V-System required the addition of three new (types of) processes and a minor modification to TheWA. The three process types added were:
· the conference manager, which provides floor control and other necessary synchronization functions;
· the conference agent, which mediates all I/O between shared applications and users (or, more precisely, their associated workstation agents); and
· the conference frontend, which provides a rudimentary user interface for the conference manager.
Note that the conference frontend and conference manager constitute the frontend and backend, respectively, of the conference management ``application''.
As shown in Figure 3, a fully replicated architecture was adopted, wherein the application is replicated at every participating workstation. Consequently, instances of the conference frontend and conference agent are also replicated at every workstation. There is, however, only one instance of the conference manager. A detail of the team and process structure on Host Y in shown in Figure 4, where boxes demarcate teams.
The conference manager receives commands from the conference frontend, validates them, and if valid, broadcasts them to all conference agents.3 The conference agents actually execute the commands with respect to their individual workstation environments. In this implementation, the conference agent, rather than the conference frontend, is responsible for invocation of shared applications — applications that are to be run in the context of the conference. Of course, before invoking an application, it must first determine which application to invoke, and with what arguments; it must act as a dialogue manager. However, rather than reimplementing dialogue management functions within the conference agent, it was decided that the only application a conference agent could invoke would be an executive. Any other application can then be invoked from the executive.4 Any executives invoked by the conference agents, and any applications invoked by those executives, are shared by all conferees.
3 Here and in the discussion to follow, ``broadcast'' is used as a placeholder for the sending of a message to a group of processes, whether by using bona fide broadcast or multicast facilities, or point-to-point messages.
4 This is another level in the hierarchy of dialogue managers as discussed in Section 3.3.
picture(size 3.74in, postscript "figures/mm-replicated-small.vps-fig")
Figure 3. Fully replicated conferencing architecture.
picture(size 3.21in, postscript "figures/mm-infra-small.vps-fig")
Figure 4. Process structure on host Y.
6.2 Mediating I/O
Only one user may have the floor at any time. Therefore, all requests destined for each conferee's TheWAin must be intercepted. What the conference agent does with these requests depends on whether or not its user has control of the floor. If not, the conference agent is said to be in passive mode; it simply buffers the input requests and waits for the input to arrive from the controlling conference agent. If the user has control of the floor, the conference agent is said to be in control mode; it responds to input requests by asking its co-resident TheWAin for input.
When user input is received, the (controlling) conference agent broadcasts the input to all passive conference agents and passes it back to the requesting application. On receipt of the relayed input, each passive conference agent returns the input to its instance of the application, or buffers the input until a request for it is received.5 Thus, regardless of source (TheWAin or other conference agent), conference agents always pass user input on to their co-resident application. Each instance of the application proceeds to generate the same output, directly to their co-resident TheWAout.
5 Since all instances of the application are not guaranteed to run in lock-step, it is possible for input to arrive from the controlling conference agent before a request for input is received by the passive conference agent.
6.3 Impact on existing software
The only change that needed to be made to existing software was the addition of a facility to TheWAin which permitted the conference agent to register itself to mediate all requests to open input streams. Not only was this change trivial in nature, but it provided a general ``intermediary'' facility that can be used in any similar situation — such as by monitoring programs, for example. Thus, the proposed implementation methodology, as manifested in TheWA, passed this particular flex test with flying colors.
7. Performance
Concurrent systems, especially message-based ones, have been much-maligned for poor performance. Our experience is quite the contrary. In brief, the V-System can execute a complete SendReceiveReply cycle in approximately 500 microseconds on a Sun-3 workstation. TheWA, in turn, can respond to a request for input in an average time of 7 milliseconds, can process 2-D vector graphics output at an average rate of 440 vectors per second, and can process fixed-width text output at an average rate of 13000 characters per second.6 Use of the conferencing software described above makes input processing more complex — increasing the average time to handle a request for input to 10 milliseconds — but has no effect on output speed.
6 The output performance includes the cost of storing the data in a structured display file. Any subsequent regeneration of the output is done from the structured display file, for a performance improvement of approximately a factor of three over the quoted figures.
Beyond the obvious fact that response is good, it is important to note that message-passing time is dwarfed by the time it takes internal to the various processes to actually do anything, which time is independent of the multi-process structure.
8. Concluding remarks
We believe that judicious use of multiple processes is the most promising approach to the implementation of user interface software. We have demonstrated that such an approach can lead to exceptional flexibility — with respect to the addition of new functionality — without sacrificing performance. Fortunately, systems that provide the requisite combination of lightweight processes and fast interprocess communication are becoming increasingly common. Even UNIX is evolving in this direction, both through the innumerable ``lightweight process packages'' being developed by various institutions and in the guise of the Mach operating system [23]. It should therefore come as no surprise that increasingly more research groups are adopting an implementation methodology similar to that described here [1,2,7,8,24,25]).
9. References
[1] Beach, R.J., Beatty, J.C., Booth, K.S., Plebon, D.A. and Fiume, E.L. The message is the medium: Multiprocess structuring of an interactive paint program. In Proceedings of SIGGRAPH'82 (Boston, Mass., July 2630, 1982). In Computer Graphics 16, 3 (July 1982), 277287.
[2] Berglund, E.J. An introduction to the V-System. IEEE Micro, (August 1986), 3552.
[3] Booth, K.S., Cowan, W.B. and Forsey, D.R. Multitasking support in a graphics workstation. In Proc. 1st International Conference on Computer Workstations, (November 1985), IEEE, 8289.
[4] Brown, M. and Heck, M.. Understanding PHIGS: The Hierarchical Graphics Standard. Megatek Corporation, San Diego, CA, 1985.
[5] Dance, J.R. et al. Report on run-time structure for UIMS-supported applications. In this issue, Computer Graphics 21, 2 (April 1987).
[6] Foley, J.D. and van Dam, A. Fundamentals of Interactive Computer Graphics, Addison-Wesley, 1982.
[7] Gosling, J. SunDew: A distributed and extensible window system. In Methodology of Window Management, F.R.A. Hopgood, et al. (eds.), Springer-Verlag, 1986. 4758.
[8] Green, M. The University of Alberta user interface management system, Proceedings of SIGGRAPH'85 (San Francisco, Calif., July 2226, 1985). In Computer Graphics 19, 3 (July 1985), 205213.
[9] Green, P.E. Jr. Protocol conversion. IEEE Transactions on Communications COM34, 3, (March, 1986), 257-268.
[10] Lantz, K.A. An architecture for configurable user interfaces. In Foundation for Human-Computer Communication, K. Hopper and I.A. Newman (eds), North-Holland, 1986. 257275.
[11] Lantz, K.A. Command interaction in distributed systems. In Proc. Fall COMPCON, (September 1980). 2532.
[12] Lantz, K.A. An experiment in integrated multimedia conferencing. In Proc. CSCW '86: Conference on Computer-Supported Cooperative Work, (MCC Software Technology Program, December, 1986). 267275.
[13] Lantz, K.A. On user interface reference models. SIGCHI Bulletin 18, 2, (October, 1986), 3642.
[14] Lantz, K.A., Gradischnig, K.D., Feldman, J.A. and Rashid R.F. Rochester's Intelligent Gateway. Computer 15, 10, (October 1982), 5468.
[15] Lantz, K.A. and Nowicki, W.I. Structured graphics for distributed systems. ACM Transactions on Graphics 3, 1, (January 1984), 2351.
[16] Lantz, K.A., Nowicki, W.I., and Theimer M.M. An empirical study of distributed application performance. IEEE Transactions on Software Engineering SE11, 10, (October 1985), 11621174.
[17] Lantz, K.A., Pallas, J., and Slocum, M. TheWA beyond traditional window systems. Internal Memo, Distributed Systems Group, Department of Computer Science, Stanford University.
[18] Lantz, K.A. and Rashid, R.F. Virtual terminal management in a multiple process environment. In Proc. 7th Symposium on Operating Systems Principles, (December 1979). ACM, 8697.
[19] Lantz, K.A, et al. Reference models, window systems, and concurrency. In this issue, Computer Graphics 21, 2 (April 1987).
[20] Nowicki, W.I. Partitioning of Function in a Distributed Graphics System. PhD thesis, Stanford University, 1985.
[21] Olsen, D., Buxton, W., Ehrich, R., Kasik, D., Rhyne, J. and Sibert, J. A Context for User Interface Management, IEEE Computer Graphics and Applications 4, (Dec 1984), 3342.
[22] Pfaff, G.E., ed. User Interface Management Systems, Springer-Verlag, 1985.
[23] Rashid, R.F. Threads of a new system. UNIX Review 4, 8, (August 1986), 3649.
[24] Scheifler, R.W. and Gettys, J. The X window system. To appear in ACM Transactions on Graphics.
[25] Tanner, P.P., MacKay, S.A., Stewart, D.A. and Wein, M. A Multitasking Switchboard Approach to User Interface Management, Proceedings of SIGGRAPH'86 (Dallas, Texas, August 1822, 1986). In Computer Graphics 20, 4 (August 1986), 241248.
[26] Thomas, J.J. and Hamlin, G. Graphical input interaction technique (GIIT) workshop summary, Computer Graphics 17, 1 (January 1983), 530.