Software architecture for gestural interfaces
The central software component of the interface is the dialogue machine, which communicates with the other subsystems:
· the input/output subsystem, which manages the input and display devices, performs inking of the stylus trace, and converts input device manipulations into canonical interface events;
· the recognizer subsystem, comprising a trainable, pattern matching recognizer for handwriting and a feature analysis recognizer for gesture [24];
· the application subsystem.
Having two recognizers complicates the dialogue design. The preliminary classification of strokes may result in a stroke being sent to the wrong recognizer, and the dialogue must be written to handle this. Some heuristics about handwriting, i.e. the left-to-right sequence, adjacency, and baselines are built into a post-classifier routine that examines strokes to see if they belong to previously entered handwriting strokes.
Dialogue specification and processing
A command is a grouping of functional elements that is meaningful and complete, typically represented by a tree or DAG. The or DAG also captures the surface ordering of the tokens of the command through the notion of a traversal order. The parsing technology developed over the last 30 years is profoundly dependent on the traversal ordering. In gestural languages, however, the surface ordering and functional grouping may not coincide, and syntax directed parsing will not be ideal for gestural dialogues. This conclusion has been reached by others attempting to implement direct manipulation interfaces.
Our requirement for order-free, asynchronous dialogues has led us to a rule based system driving a parallel parser. The parser is similar in operation to parsers designed in the 1950s and 1960s to produce all possible syntactically valid parses. The design of our parser is completely different, however, in that semantic constraints are incorporated within the rules, and the parallel parse is employed to handle order freedom and multiple dialogue threads.
Events are timestamped and linked in temporal order, but the parser sees all events categorized by event types. When a set of events is present that satisfies the requirements of a dialogue rule, the temporal sequence links may be checked to see if the events form a temporal group. The timestamps may also be checked by the rules where the dialogue requires certain actions to be carried out within a prescribed time.
Where parallel parses are permitted, deletion of erroneous parses must be handled. In our case, the parses proceed as a kind of competition; the first parse to reach a stage of completion wins and eliminates the other parses. The execution of a rule constructs a graph linking the tokens input to the rule to the tokens built by the rule. This graph may be used to undo the rule, and alternative parses are eliminated by undoing the rules which created them. Parse elimination can be an expensive task. Careful design of the interface should minimize the number of cases in which parse elimination has cascading, exponential behavior.
The following brief example demonstrates some of the features of the dialogue rules. The application is a spreadsheet, and the task is to sum a column of numbers in a particular cell. The dialogue scenario consists of three strokes: a scope gesture, a Greek Sigma symbol, and a closeure stroke in which the user points at a closure icon displayed on the screen.
11.50 sscmd
3.87
2.03
99.44 ssop ssrange closestroke
symbol gesture
hwstroke geststroke
1 hwstroke(x) =: symbol(y):hwreco(x);
2 geststroke(w) =: gesture(z):gestreco(w);
3 symbol(y) =: ssop(v)|sspick(v):sscorr(y);
4 gesture(z) =: ssrange(u)|sspick(u):sscorr(z);
5:ssop(v), ssrange(u): closestroke(t) =:
sscmd(s):sscmd("\0x95 %a @SUM( %a . %a ) \n",
sscell(v), ssulrange(u), sslrrange(u));
6 sscmd(s):(rcsscmd(s) = 0):CLOSE;
The left side of a rule contains conditions for the rule execution, written as predicates with free variables. The free variables are used to specify coocurrence constraints and to transmit the events to the right sides of the rules. The scope of a free variable is a single rule. Conditions are matched in temporal order except where curly braces are used to delimit order free groups. The left side ends with the ``=:'' symbol.
The right side of a rule consists of actions and their resulting events, written as <event-union>:<action>, where an <event-union> is either a single event or a collection of alternative events resulting from execution of the action. Actions are presently atomic procedures in the system, written in a programming language. Later, the rule language will be extended to include specification of action procedures.
Rules 1 and 2 match handwriting and gesture stokes, and invoke the respective recognizers (actions hwreco and gestreco that create symbol and gesture events respectively). The event is created when the action is complete, and other rules may be invoked while an action is being caried out. If the action fails, the event is not created, and the rule is undone by the rule.
In rules 3 and 4, the recognized symbol and gesture are sent to a function that analyzes their placement on the spreadsheet display, sscorr. This function may generate any one of several events, e.g. ssop, sspick, or ssrange depending on the kind of symbol or gesture and its location on the spreadsheet display.
In rule 5, the ssrange and ssop events may occur in any order, so long as they are adjacent to each other. They must be followed by the closure event described earlier. The result of this rule is to create an event token, sscmd, by invoking a function sscmd that actually formats a sequence of keystrokes to be sent to the spreadsheet. This sequence of keystrokes causes the cell pointer to be moved to the cell containg the sigma, and a formula to be entered into that cell of the form @SUM(ul.rl), where ul and lr are cell addresses of the extent of the range denoted by the ssrange token.
Rule 6 tests for proper completion of the command execution. The CLOSE action eliminates any competing parses.
Summary
In this report, we have tried to describe some interesting results from a project that is not yet complete. There are many issues in the software design and human factors design of such interfaces for which we do not have answers. These answers will come as we press ahead toward our goal of building an operating prototype.
Our approach to the interface software is based on the technology of language translation, as other efforts were also [8,17,18]. Language translation provides a structure for representing dialogue, but it has not been helpful in characterizing the initial processing of input or the presentation of graphics. Other paradigms for representing dialogues are objects [6,9,15], event driven processes [7,22], constraint systems [1,5,16], and programming language extensions [4,11,19].
Gestural interfaces have appeared now and again in particular domains, such as text editors [14], graphical editors [3], musical score editors [2], and as interfaces to text based applications [25]. Our work differs from these primarily in the breadth of our attack on the problem. Other systems do not try to combine handwriting and gesture, and this is crucial if one is to avoid the necessity of the enduser switching continually back and forth between pen and keyboard. We are fortunate to be able to take advantage of several years of work at our laboratory on handwriting recognition, and feature based recognition of symbols.
One can argue forcefully for the benefits of gestural interfaces, but such arguments depend on being able to implement such interfaces at reasonable cost and in reasonably useful technologies. It appears that the technologies are now or will be soon at hand. There are many problems yet to be solved in the implementation of software. We hope that others will be encouraged to address these problems in order to realize the tremendous potential of gestural interfaces.
This work could not have been realized without the contributions of our colleagues, particularly Chuck Tappert, Joonki Kim, Shawhan Fox, and Steve Levy, who contributed recognition algorithms and software, Cathy Wolf, whose experimental subjects gave crucial insights on the nature of gestural dialogues, John Gould, who explored the human factors of handwriting and studied users of spreadsheets, and Robin Davies, who guided the entire project. Inspirational credits are due to Alan Kay and Adele Goldberg. Bill Buxton has given many useful comments and criticisms on this endeavour, along with his strong support for its purpose.
References
[1] Borning, A. The Programming Language Aspects of ThingLab, A Constraint-Oriented Simulation Laboratory, ACM Transactions on Programming Languages and Systems 3, 4 (1981), 353387.
[2] Buxton, W., Sniderman, R., Reeves, W., Patel, S. and Baecker, R. The evolution of the SSSP score editing tools, Computer Music Journal 3, 4 (1979), 1425.
[3] Buxton, W., Fiume, E., Hill, R., Lee, A. and Woo, C. Continuous hand-gesture driven input. Proceedings of Graphics Interface'83 (1983), 191195.
[4] Cardelli, L. and Pike, R. Squeak: a language for communicating with mice, Proceedings of SIGGRAPH'85 (San Francisco, Calif., July 2226, 1985). In Computer Graphics 19, 3 (July 1985), 199204.
[5] Duisberg, R. Animated Graphical Interfaces using Temporal Constraints. Proceedings CHI'86 Conference on Human Factors in Computing Systems (Boston, April 1317, 1986), ACM, New York, 131136.
[6] Goldberg, A. and Robson, D. Smalltalk-80: The Language and Its Implementation, Addison Wesley, 1983.
[7] Green, M. The University of Alberta user interface management system, Proceedings of SIGGRAPH'85 (San Francisco, Calif., July 2226, 1985). In Computer Graphics 19, 3 (July 1985), 205213.
[8] Hanau, P. and Lenorovitz, D. Prototyping and Simulation Tools for User/Computer Dialogue Design. Proceedings of SIGGRAPH'80 (Seattle, Wash., July 1418, 1980). In Computer Graphics 14, 3 (July 1980), 271278.
[9] Henderson, D.A. Jr. The Trillium User Interface Design Environment. In Proceedings CHI'86 Human Factors in Computing Systems (Boston, April 1317, 1986), ACM, New York, 221227.
[10] Hutchins, E.L., Hollan, J.D. and Norman, D.A. Direct manipulation interfaces. In User centered system design, Norman, D.A. and Draper, S.W. (eds.), Lawrence Erlbaum Associates, Hillsdale, NJ, 1986, 87124.
[11] Kasik, D.J. A user interface management system, Proceedings of SIGGRAPH'82 (Boston, Mass., July 2630, 1982). In Computer Graphics 16, 3 (July 1982), 99106.
[12] Kay, A. and Goldberg A. Personal Dynamic Media. IEEE Computer 10, 3 (1977), 3141.
[13] Kay, A. Microelectronics and the Personal Computer. Scientific American 237, 3, (Sept. 1977), 231.
[14] Konneker, L.K. A Graphical Interaction Technique Which Uses Gestures. Proceedings of the First International Conference on Office Automation, (1984), IEEE, 51-55.
[15] Lipkie, D., Evans, Newlin, J. and Weissman, R. Star Graphics: An Object Oriented Implementation. Proceedings of SIGGRAPH'82 (Boston, Mass., July 2630, 1982). In Computer Graphics 16, 3 (July 1982), 115124.
[16] Nelson, G. Juno: a Constraint-Based Graphics System. Proceedings of SIGGRAPH'85 (San Francisco, Calif., July 2226, 1985). In Computer Graphics 19, 3 (July 1985), 235243.
[17] Olsen, D.R. Jr. and Dempsey, E. SYNGRAPH: A Graphical User Interface Generator, Proceedings of SIGGRAPH'83 (Detroit, Mich., July 2529, 1983). In Computer Graphics 17, 3 (July 1983), 4350.
[18] Parnas, D. On the Use of Transition Diagrams in the Design of a User Interface
for an Interactive Computer System. Proceedings ACM 24th National Conference, (1969), 378385.
[19] Pfister, G. A High Level Language Extension for Creating and Controlling Dynamic Pictures. Computer Graphics 10, 1 (Spring 1976), 19.
[20] J. R. Rhyne and C. G. Wolf. Gestural Interfaces for Information Processing Applications. T. J. Watson Research Center, IBM Corporation, 1986, IBM Research Report RC-12179.
[21] Rhyne, J.R. Dialogue Management for Gestural Interfaces. IBM Research Report RC-12244 (An extended version of this article), 1986.
[22] Schulert, A.J., Rogers, G.T. and Hamilton, J.A. ADM—A dialog manager. In Proceedings CHI'85 Human Factors in Computing Systems (San Francisco, April 1418, 1985), ACM, New York, 177183.
[23] Shneiderman, B. Direct manipulation: a step beyond programming languages, IEEE Computer 16, 8 (1983), 5769.
[24] Tappert, C.C., Fox, A.S., Kim, J., Levy, S.E., and Zimmerman, L.L. Handwriting recognition of transparent tablet over flat display. SID Digest of Technical Papers XVII, (May 1986), 308312.
[25] Ward, J.R. and Blesser, B. Interactive Recognition of Handprinted Characters for Computer Input. IEEE Computer Graphics and Applications 5, 9, (Sept. 1985), 2437.
[26] Wolf, C.G. Can People Use Gesture Commands. SIGCHI Bulletin 18, 2 (October 1986), 7374. Also IBM Research Report RC 11867.