Rick Beach, February 13, 1987 8:58:22 pm PST
UIMS Support for Direct Manipulation Interfaces
Scott E. Hudson
Department of Computer Science
University of Arizona
Tucson, AZ 85721
Recently direct manipulation interfaces have received a lot of attention. The direct manipulation paradigm seems to offer significant advantages, particularly for novice users. In this paper we will consider what direct manipulation is, what this means to user interface management systems, and what we should do about it. We will conclude that to support direct manipulation, syntax should be de-emphasized and decentralized, the presentation component of the system should be made more flexible than most exiting systems, and feedback, particularly semantic feedback, is very important and should be supported in a much more automatic fashion.
User interface management systems have traditionally been designed to support the conversational metaphor of user interfaces. Under this metaphor, the interface is seen as a conversation or dialogue between the user and the system. The user expresses actions to be performed in some language, and the system responds. The user then evaluates the result and issues new commands in order to iterate toward some goal. Because of this metaphor, much of previous UIMS work has centered on dialogue management, and implicitly or explicitly made use of the language model of user interface [4].
Recently, a new metaphor for user interface, that of direct manipulation [15,16], has been widely discussed. Under this metaphor, the user has the illusion of directly acting upon the objects of interest without the intermediary of the system. This is contrasted with the conversational metaphor where the objects of interest are usually referred to abstractly (i.e. by name), their current state or status is displayed only on request, and the user manipulates objects only by asking the system, in some language, to perform actions on them.
In this paper we will consider how user interface management systems may better support direct manipulation. In the next section we will consider how the important aspects of direct manipulation interfaces have been characterized. Section 3 will then examine how this characterization might be related to software oriented concepts, and what UIMS research directions this implies. Section 4 will consider the implications of the conclusions drawn in section 3 with respect to a model of the architecture of a UIMS. Section 5 will discuss some constraints imposed by issues of ease of programming, and finally, section 6 will provide conclusions.
What is direct manipulation?
Direct manipulation interfaces are probably best understood by example. One good example is the Apple Macintosh desktop interface for file manipulation. Figure 1 shows a sample screen from the Macintosh. In this interface, the objects of interest, in this case files and application programs, are displayed to the user in iconic form. Actions are expressed primarily by mouse movements and button presses. For example, to delete a file, the user points at the icon representing the file, presses the mouse button down, and while holding the button down, drags the icon into another icon representing a trash can. Similarly, if the user decides to change the volume of the system's bell, he or she uses the slider on the control panel shown in figure 1 instead of issuing a command. Overall, many common actions are performed by pointing at or dragging objects on the screen. Other operations are performed using menus. Textual command language or other obviously syntactic mechanisms are never used.
.sp 3.25i
Figure 1.
We have briefly characterized direct manipulation in terms of examples and in terms of a user's illusion of acting directly upon the objects of interest. However, this characterization is much too vague to be of much use in building systems. Shneiderman, who coined the term direct manipulation, gave a more concrete characterization by suggesting that a direct manipulation interface should have [15]:
 Continuous representation of the objects of interest.
 Physical actions or labeled button presses instead of complex syntax.
 Rapid incremental reversible operations whose impact on the object of interest is immediately visible.
In addition to Shneiderman's description, Hutchins, Hollan, and Norman [9] give a more thorough and abstract treatment of direct manipulation. They characterize direct manipulation in terms of a ``feeling of directness'' on the part of the user. They go on to discuss a number of distinct aspects of directness, in particular what they call ``engagement'' and ``distance.''
Engagement is the feeling of dealing ``... not with programs, not with the computer, but with the semantic objects of our goals and intentions.'' Distance can be thought of as a measure of ``... the distance between one's thoughts and the physical requirements of the system under use.'' Hutchins et. al. further characterize distance in terms of ``Semantic'' and ``Articulatory'' distances.
Articulatory distance concerns the actual form that communication takes; for example, the choice of type and mode of use of input devices, or the kinds of images drawn for the user. A small articulatory distance results when the input techniques and output representations used are well suited to conveying the required information. Articulatory distance is also decreased when the from of inputs and outputs relate to the semantic concepts of the underlying conceptual model.
Semantic distance involves the ease with which the user can express desired actions within the concepts of the system. Is it possible to express the concepts of interest concisely? Semantic distance also involves a measure of how closely the user's conception of the task domain matches that of the system. Consequently, it is not an interface issue alone, but involves the application component of the system as well.
To summarize Hutchins et. al., direct manipulation involves a feeling of directness on the part of the user. This feeling is characterized by three aspects:
 Engagement - the feeling of communicating with the objects of interest directly.
 Articulatory distance - the degree to which the form of communication with the system reflects the application objects and tasks involved.
 Semantic distance - the degree to which the semantic concepts used by the system are 1) compatible with those of the user and, 2) can be used to easily accomplish the user's goals.
UIMS implications
The characterizations of direct manipulation made by Shneiderman, and Hutchins et. al. are useful in understanding what makes a direct manipulation interface. However, these characterizations are expressed in terms of end results — what the user sees — rather than in terms of the software concepts necessary to implement them. In this section we will begin to explore what the implications of direct manipulation are for the architecture and implementation of user interface management systems.
We will consider each of the aspects of directness discussed by Hutchins et. al. in turn. The concept of engagement involves a feeling on the part of the user that he or she is dealing with the objects of interest directly instead of through a third party. One barrier between the user and objects of interest can be syntax. Syntax of some form is needed to communicate the user's intentions to the system. However, a single overall syntax for the interface as a whole implies communication with the system rather than communication directly with the individual objects of interest. This reduces engagement.
Following Shneiderman, we can say that syntax should be in terms of individual objects, should be as simple as possible, and should involve physical actions such as pointing or dragging instead of more linguistic concepts. This implies that previous systems based on transition networks [5,10,18] or grammars [12,13] which control the overall interface are not good candidates for supporting direct manipulation. On the other hand, the techniques used in these systems might still be adapted to the task of handling the remaining syntax of communicating the user's intentions to individual objects. To adapt these techniques will require that the state of multiple dialogues (potentially one for each object) be maintained at once, and that the user be allowed to freely move between these multiple dialogues with simple actions such as moving a mouse.
In addition to minimizing syntax and turning to a multiple dialogue model of interface, another way of increasing engagement is to provide representations of objects on the screen which reflect the behavior of objects from the application domain. One of the best ways to do this is through feedback. Acting on objects in the application domain almost always has consequences. Allowing the screen representation of these objects to immediately reflect these consequences greatly adds to the illusion that the representation of the object is the object.
Traditionally feedback has been characterized as lexical, syntactic, or semantic. However, providing the high level of feedback necessary to enhance engagement often requires merging these levels. In particular it is often necessary to provide semantic feedback even at what would normally be considered a lexical level. A good example of this involves dragging. Dragging an object on the screen is normally a lexical operation. However, since moving an object has a semantic consequence, engagement can be greatly increased by providing feedback about potential consequences as dragging is performed. For example, in the Macintosh desktop interface in some cases, such as putting a file in a folder, one may drag objects into other objects. In other cases dragging one object on top of another has no special semantic consequence. The semantic difference between these lexically identical operations is clarified by highlighting objects when something is to be placed in them, and doing nothing when something is to be placed on them. This kind of feedback, along with such techniques as semantically driven gravity fields or sophisticated snapping techniques [2], can greatly enhance the intuitive feeling of manipulating objects directly. However, providing this capability in a general way could be quite expensive, since we must reflect the arbitrary semantics of the application domain. Providing semantic feedback, particularly when it manifests itself at the lexical level, is one of the major challenges UIMS's must face to support direct manipulation.
The next aspect we will consider is articulatory directness. Articulatory directness concerns the actual form of communication between the user and the system. Consequently, it impacts the presentation component of a UIMS most strongly. On the output side, many UIMS's provide a good deal of flexibility in how images are rendered on the screen by allowing access to a general set of graphics capabilities. However, on the input side, systems are rarely as flexible. Usually only a fixed set of interaction techniques is supported. To achieve a high level of articulatory directness for input, it will be necessary to tailor the input techniques used to the objects which they apply to. It is also important to realize that, just as application objects are normally structured objects composed from smaller objects, it is also often necessary to structure input techniques by building complex techniques from simpler ones. This implies the need for some mechanism which can combine the available physical input devices into composite input techniques or abstract devices as was done by Anson [1].
Finally we turn to the issue of semantic directness. Achieving a high degree of semantic directness involves matching the semantic concepts of the system to those of the user. Discovering how the user or a potential group of users will structure the tasks and concepts of the application is outside the domain of the UIMS. What the UIMS must support is the ability to structure the application in a flexible manner. This flexibility is usually provided by treating the application as a set of semantic action procedures written in a general purpose programming language. This approach to the application interface allows the application to be structured as needed within the general purpose programming language. However, it has drawbacks in a direct manipulation context. In particular, to improve engagement, we would prefer to model the application as a set of objects whose properties and behaviors are being represented on the screen to the user. When the objects change, feedback is generated to reflect those changes in the representations displayed to the user.
While this object oriented approach is only conceptual and can be implemented using the conventional approach of simply calling semantic action routines, it then becomes the application's responsibility to provide all semantic feedback. As we have seen, semantic feedback which occurs in a lexical context can add greatly to the degree of engagement exhibited by the interface. Under the semantic action routine approach to the application interface either the application must be involved in the lexical details of the interface, or we must forego this form of feedback. Because of this limitation, we believe that a shared object model of the application interface is more appropriate for direct manipulation interfaces. Under this model, application objects are accessible to both the application and the interface. When changes to the objects are made by either party the other is notified. This allows semantic feedback to be handled by the UIMS and hence allows for more automatic support to be provided.
To summarize conclusions about UIMS support for direct manipulation:
 Syntax should be minimized, using physical actions such as pointing and dragging in preference to more syntactic concepts.
 Syntax should be expressed in terms of individual objects rather than the system as a whole.
 Feedback, particularly semantic feedback is very important, and needs better support.
 Flexibility in the presentation component is important; particularly the ability to design specific interaction techniques and combine these techniques into abstract devices.
 An application interface based on shared objects is preferable to the conventional application interface based on semantic action routines.
Architectural implications
The Seeheim UIMS model [5,6] contains 3 components as illustrated in figure 2. The points we have outlined above impact each of these components. The most obvious impact on the model regards the dialogue management component. Since syntax is being minimized we feel a model without an explicit dialogue component, as shown in figure 3, is more appropriate. In a direct manipulation setting, the syntactic component of the system is not entirely missing. However, because of the emphasis taken, syntax is primarily at a lexical or interaction technique level, as well as at a semantic level where we determine the current legal operations based on the state of the objects being interacted with. It should be noted that minimizing syntax is a current trend among a number of UIMS researchers. Some have gone as far as to handle all syntax by means of defaults [14].
.sp 1.5i
Figure 2.
In addition to the dialogue management component, the above conclusions also strongly impact the application interface component. We have proposed that the application interface component be structured as a set of shared objects. This is done in order to support semantic feedback within the UIMS. In addition to simple shared objects, it would also be useful to store information about the semantics of the objects. One way to do this is to store the objects as active entities, that is, entities which do not simply passively store data, but also react to changes in ways that reflect the semantics of the application domain. Active objects allow a convenient vehicle both for implementing semantic feedback and for automatically notifying the application when objects change. The author has implemented a system based on this approach [7,8] as has Sibert et al. [17].
.sp 1.5i
Figure 3.
The final area of interest is the presentation component of the system. Many UIMS's handle output quite flexibly by allowing the application access to a fairly general graphics package. However, this has the drawback that the application must get involved with the details of presentation. Also, on the input side, much less flexibility is normally provided, and there is often little connection between outputs and inputs other than being able to pick a component of the output. In a direct manipulation framework, inputs and outputs must work together to produce the illusion of directly acting upon the objects of interest. For example, the elements of output need to be manipulated in more sophisticated ways than simple picking (e.g., dragging with constrained movements or semantically driven highlighting). Further, if the UIMS is to automate all or part of the task of semantic feedback it must also accept more responsibility for automating screen update, and for the creation of outputs.
One approach to this problem is to abstract inputs and outputs into presentation plans which describe how to create presentations based on a series of parameters. This was done for outputs in [3,13] and for a unified description of inputs and outputs in the author's system. The presentation component of the system is responsible for managing the graphical images and input techniques described by the plans for all active presentations. Whenever the parameter values controlling a presentation are changed, either by the application directly, or via the application interface, the presentation component can derive the screen updates which are necessary. This scheme forms a convenient framework for supporting semantic feedback, since it allows flexible presentations to be controlled by a small number of semantically significant values.
Constraints imposed by other issues
Thus far we have concentrated only on supporting direct manipulation interfaces. However, direct manipulation has limitations and drawbacks, and there are other issues that impact the design of a UIMS. In particular, we are concerned with the complexity faced by the application programmer when using the system and the difficulty of learning this can cause. Experience with existing UIMS's has shown that systems which are hard to learn are not used despite the advantages they might offer [14]. Unfortunately, many of the directions proposed for creating UIMS's which better support direct manipulation also create more complexity. One solution to this problem is to apply user interface technology to UIMS's and allow specification by demonstration as was done in [11]. Currently we are faced with a choice between systems which are very easy to program but produce simple or stylized interfaces or systems that are complex to program but can support better interfaces. What is needed is a synthesis of these choices. However, this will only come after exploration of the alternatives at both ends of the spectrum.
In this paper it has been shown what direct manipulation interfaces are, how user interface management systems might better support them, and the research directions this indicates. Conclusions include the fact that syntax should be minimized and decentralized, that the presentation component needs better support for describing new input techniques, and that semantic feedback is a central issue which affects several aspects of the system.
[1] Anson, E. The device model of interaction, Proceedings of SIGGRAPH'86 (Dallas, Texas, August 1822, 1986). In Computer Graphics 20, 4 (August 1986), 107114.
[2] Bier E. and Stone, M. Snap-Dragging. Proceedings of SIGGRAPH'86 (Dallas, Texas, August 1822, 1986). In Computer Graphics 20, 4 (August 1986), 233240.
[3] Clemons, E.K. and Greenfield, A.J. The SAGE System Architecture: A System for the Rapid Development of Graphics Interfaces for Decision Support. IEEE Computer Graphics & Applications 5, (November 1985), 3850.
[4] Foley, J.D. and van Dam, A. Fundamentals of Interactive Computer Graphics, Addison-Wesley, 1982.
[5] Green, M. The University of Alberta user interface management system, Proceedings of SIGGRAPH'85 (San Francisco, Calif., July 2226, 1985). In Computer Graphics 19, 3 (July 1985), 205213.
[6] Green, M. Report on Dialog Specification Tools. Computer Graphics Forum 3, (1984), 305313.
[7] Hudson, S.E. and King, R. A Generator of Direct Manipulation Office Systems. ACM Transactions on Office Information Systems 4, 2 (1986), 132163.
[8] Hudson, S.E. A User Interface Management System Which Supports Direct Manipulation. PhD thesis, University of Colorado, Boulder, Colorado, August 1986.
[9] Hutchins, E.L., Hollan, J.D. and Norman, D.A. Direct manipulation interfaces. In User centered system design, Norman, D.A. and Draper, S.W. (eds.), Lawrence Erlbaum Associates, Hillsdale, NJ, 1986, 87124.
[10] Jacob, R.J.K. An Executable Specification Technique for Describing Human-Computer Interaction. In Advances in Human-Computer Interaction, Hartson, H.R. ed., Ablex, Norwood, NJ, 1985, 211242.
[11] Myers, B.A. and Buxton, W. Creating Highly Interactive and Graphical User Interfaces by Demonstration, Proceedings of SIGGRAPH'86 (Dallas, Texas, August 1822, 1986). In Computer Graphics 20, 4 (August 1986), 249258.
[12] Olsen, D.R. Jr. and Dempsey, E. SYNGRAPH: A Graphical User Interface Generator, Proceedings of SIGGRAPH'83 (Detroit, Mich., July 2529, 1983). In Computer Graphics 17, 3 (July 1983), 4350.
[13] Olsen, D.R., Dempsey E.P. and Rogge R. Input-Output Linkage in a User Interface Management System, Proceedings of SIGGRAPH'85 (San Francisco, Calif., July 2226, 1985). In Computer Graphics 19, 3 (July 1985), 191197.
[14] Olsen, D.R. MIKE: The Menu Interaction Kontrol Environment, BYU Tech Report, Brigham Young University, Provo, Utah, 1986.
[15] Shneiderman, B. The Future of Interactive Systems and the Emergence of Direct Manipulation, Behaviour and Information Technology 1, (1982), 237256.
[16] Shneiderman, B. Direct manipulation: a step beyond programming languages, IEEE Computer 16, 8 (1983), 5769.
[17] Sibert, J.L., Hurley, W.D. and Bleser, T.W. An Object-Oriented User Interface Management System, Proceedings of SIGGRAPH'86 (Dallas, Texas, August 1822, 1986). In Computer Graphics 20, 4 (August 1986), 259268.
[18] Wasserman, A.I. Extending State Transition Diagrams for the Specification of Human-Computer Interaction. IEEE Transactions on Software Engineering SE-11, (August 1985), 699713.