Apex.tioga
Subhana Menis, May 20, 1987 9:31:46 am PDT
Last edited by: Subhana Menis - May 28, 1987 11:32:00 am PDT
FEINER
APEX: AN EXPERIMENT IN THE AUTOMATED CREATION OF PICTORIAL EXPLANATIONS
SIGGRAPH '87 TUTORIAL COURSE NOTES
DOCUMENTATION GRAPHICS
=== length: 1 in
APEX: An Experiment in the Automated Creation of Pictorial Explanations
APEX: An Experiment in the Automated Creation of Pictorial Explanations
How might we automate the design of pictures that are intended to show the viewer how to perform a series of tasks?
Steven Feiner
Brown University*
[Republished from IEEE Computer Graphics and Applications 5 11, 29-37, Nov. 1985]
*The author is currently a member of the computer science faculty at Columbia University.
Abstract: APEX is an experimental system that generates pictures portraying the performance by a problem solver of physical actions in a three-dimensional world. It supports rules for automatically determining the objects to be shown in a picture, the style and level of detail with which they should be rendered, the method by which the action itself (such as turning or pushing) should be indicated, and the picture's viewing specifications. A picture crystallizes about a small set of objects inferred from the nature of the action being depicted. Additional objects and detail are added when it is determined that they disambiguate an object from others with which it may be confused.
A number of research projects have explored the use of computer graphics to explain how things work. Their approaches have been as diverse as dynamic, videodisc-based movie manuals [1], directed-graph-structured pictorial documents [2], and animated algorithm simulations [3]. In these systems, creating a presentation may require the collaboration of many people, including subject matter experts, authors, designers, photographers, illustrators, and editors. Thus the presentation design process is expensive and time-consuming. It is our hope that someday there will be computer-based experts that can communicate with a user through pictures, words, and sounds. Systems of this sort would not only have to possess knowledge of the subject areas in which they were expert, but must also know how to design and produce effective visual presentations [4].
Although much attention has been paid to the problems of rendering prearranged collections of 2D or 3D objects, automating the design of pictures is an area of computer graphics that is relatively, although not totally, unexplored. Work in knowledge-based graphics has included animation scripting and diagram layout [5], design of business graphics [6], generation of diagrammatic information displays [7], animated instructions for a CAD system [8], and automatic synthesis and layout of icons in a graphical database interface [9].
APEX (Automated Pictorial EXplanations) is an experimental system that we have built to examine some of the problems involved in depicting actions in a 3D world. It contains two components: one part, described here, which creates pictures of parts of the world, and a second which designs and lays out displays containing these pictures.
APEX differs from previous work in that it attempts to depict existing worlds of 3D objects and actions by analyzing their relationships and then determining what part if any each should play in a picture, how each should be rendered, and how each should affect the viewing specifications for the picture. APEX is essentially an intermediary between an expert system that figures out what actions must be performed to solve a problem and rendering software that scan-converts the explanatory pictures that APEX generates. In our work we treat both the expert system and the rendering tools as ``black boxes,'' although they are, of course, subjects of extensive research in their own right.
Picture Generation
What kind of pictures should APEX generate? Much work in computer graphics has concentrated on the synthesis of realistic pictures. There are many situations, however, in which a realistic picture can provide too much detail to communicate as effectively as a simpler, stylized picture. Books on writing technical manuals, for example, recommend deleting unnecessary details and highlighting objects of interest in drawings [10].
A number of attempts have been made to formalize rules for presenting quantitative information in the form of charts and graphs [11,12]. APEX incorpor-ates the crude beginnings of a model for the creation of pictures that show ac-tions, such as pushing, pulling or turning, being performed on physical objects.
Consider, for example, how to decide what objects to include in a picture. Suppose the picture is meant to tell a person to turn a knob on a piece of equipment. One solution might be to include in the picture the knob, the equipment on which it is located, and, perhaps, the hand turning it. If the equipment is the only thing in the room and the knob the only feature on it, this might be appropriate. But what if the equipment has many knobs and switches and the room contains many similar pieces of equipment? There must be a compromise between the overwhelming detail of showing with photographic accuracy everything that a person might see and the potential ambiguity of showing only those objects that participate in the action.
We would like the contents of the picture that we generate to depend on whether the person knows the location of the knob on the equipment or even the location of the equipment itself. If the person is unfamiliar with the equipment's location, showing it relative to the rest of the room may help. The experienced user, on the other hand, may not need extra context. Also, the appearance of the objects themselves should influence the picture's design. If the knob is one of a row of identical knobs, showing the knob in context may be more worthwhile than it would be if the knob were unique in shape and color. Rendering the knob with greater detail may be useful if the added detail reveals differences between it and similar knobs with which it may be confused.
In APEX, we have attempted to eliminate unneeded detail in pictures while emphasizing important features. Our approach has been to design a system that uses rules to govern each aspect of a picture's composition by determining which objects will be depicted, which rendering style will be used for each, and which viewing specifications will be employed.
System Overview
APEX is initially provided with information about a world of objects, the actions to be performed on it, and what the user already knows.
Actions. The actions to be depicted are those ``performed'' by a problem solver, an AI system that plans how to accomplish the task that we want to tell someone about by ``doing'' the task itself. APEX talks to problem solvers that use rules about a task to be performed, and knowledge about the current state of affairs, in order to break down a high-level task into a hierarchy of lower-level actions [13]. Each kind of action is represented as a frame [14], a collection of facts about the action and its participants. When the problem solver executes an action it instantiates one of these archetypal frames. Associated with an action frame instance are information about the important objects that participate in it and the nature of their roles, any other actions that have to be performed as subactions to accomplish it, when it should be executed relative to the other actions, and those changes that it effects in the environment.
Because the problem solvers run quite slowly, APEX does not communicate with them interactively. Instead, the desired problem solver is run in advance and information about the structure of the solution it generates is saved automatically in a detailed execution trace. APEX reads in the trace and uses it to recapitulate the actions of the problem solver incrementally, without incurring the decision-making overhead.
Objects. APEX uses the same object database as does the problem solver. Objects are hierarchically structured as trees of 3D parts. Leaf nodes are physical objects with properties such as material, color, size, shape, and position, while internal nodes are assemblies of leaf and internal nodes containing only transformations. Objects have information about their function and are also characterized by the relationships (such as ``on'' and ``in'') that they bear to one another.
As mentioned above, we would like to be able to display objects varying levels of detail. Physical properties are initially associated with the leaves of APEX's object trees. Therefore, we have developed a detail-removal process that associates with each nonleaf object a simplified version of the object pro-perties possessed by its children. This makes it possible to draw a high-level approximation of an object without processing its children, selectively progres-sing down the tree only when more local detail is desired [15] (see Figure 1).
The detail-removal method that we have used begins by processing nodes whose children are all leaves. The projection of a node's children onto a selected view plane is computed. The system then proposes simplifications of the children by merging adjacent objects and eliminating relatively small objects. Each simplification is also projected onto the viewplane and the differences between it and the unsimplified children's projection is computed. The difference metric is based on the size and color of corresponding areas in the two projections. The simplification that results in the smallest number of objects, and whose difference is less than a pragmatically chosen maximum, is used. The process is then propagated up the tree to the root. Because the detail-removal process runs quite slowly it is currently performed as a preprocessing step.
Associating approximate physical properties with internal nodes also allows objects to be compared hierarchically. The ability to compare two hierarchical objects quickly is exploited by APEX to find objects that are similar to some target object and to determine how they differ from it. First, the root nodes of both objects are compared with regard to size, shape, and material. If the differences are big enough to be considered significant (an ad hoc determination), the comparison stops; otherwise, the objects' next levels are recursively compared, with subobjects matched according to their positions in their parents (so that things occupying roughly the same relative position in their parents are compared).
===
[Artwork node; type 'ArtworkInterpress on' to command tool]
Figure 1. Detail removal. (a) The hierarchical object database originally contains information about the physical properties of leaf nodes—in this example the individual buttons and panel of a console. Internal nodes, such as the console or set of buttons, represent assemblies that may be drawn only by drawing all the objects at their descendent leaves. (b) The automated detail removal procedure described in the text combines adjacent subobjects and eliminates relatively small ones to produce a simplified version of the object. Detail removal is performed for each internal node to produce physical property descriptions in the same format as those of the leaf nodes. A simplified version of the object is displayed by traversing the hierarchy to the desired depth and drawing only the deepest objects encountered. The level of detail displayed may be varied across the object by selectively exploring some parts of the hierarchy more deeply to show them in more detail.
===
User knowledge. Associated with each object and action is some information about what the user knows (for example, an object's approximate location). This information forms an extremely rudimentary user model. APEX updates the user model whenever it depicts an object, reflecting a presumed increase in the user's familiarity with the object's location. This information guides the picture-making process by determining the amount of context to include.
The Picture Representation
APEX creates a picture by performing operations on a frame data structure in which the picture is represented. This picture frame has slots for the important attributes of a picture: the objects that it contains, its lighting, and its viewing specifications. Frame slots may be made to affect one another through the use of rules that fire when a slot is modified, causing changes in others. APEX also marks the objects that are inspected with its comparison routine to indicate where their differences and similarities lie.
In order to render a picture, APEX first turns its internal picture data structure into a precise 3D scene specification format accepted by several locally written rendering systems. APEX generates a picture specification by traversing those parts of the object hierarchy referenced by the picture frame. The determination of what rendering style to use and how far to travel down each part of the hierarchy is based on information associated with the objects while a picture is created, as described below.
Depicting a Single Action
The picture creation process will be illustrated by an example taken from one world for which APEX can generate pictures. Figure 2 shows this world's objects, the components of a stylized sonar system, in full detail. There are three large cabinets—a receiver, a transmitter, and an interface. A small speaker cabinet hangs on the wall. Figures 3-10 show the steps in the creation of a typical picture. They were generated by stopping APEX at each step of the picture-creation process and sending the partially completed picture specification to the renderer. The completed picture is intended to tell the viewer to pull out the drawer of the large center cabinet, using its middle handle. APEX was initially told that the picture's viewer had no idea where any of the objects depicted are located.
First, APEX initializes the viewing and lighting specifications based on the location of the viewer who is to perform the action. Associated with each depictable action frame is a priori information about the objects that play important roles in the action and therefore must be depicted. We refer to these as the picture's frame objects. These are the objects about which the picture will crystallize. APEX finds the frame objects for the particular instance of the action being performed and adds them to the picture.
===
[Artwork node; type 'Artwork on' to command tool]
Figure 2. A fully detailed view of a set of objects for which APEX can create pictures. Shown here are parts of a sonar system. The three large cabinets are (from left to right) the receiver, the transmitter, and the interface. A small speaker hangs on the wall to the right of the transmitter.
===
In Figure 3 they are the transmitter drawer (the object to be opened) and its middle handle (the object by which the opening is performed). APEX has rules that fire when an object is added to a picture. These rules discriminate on the kind of object being added. Frame objects cause the viewing specifications to be modified to include the objects in the viewplane in their entirety. The dark-blue background of each picture represents its extent.
The user may have some knowledge about each of the frame objects, in-cluding some notion of where it might be found. This information is currently represented as a part of the object hierarchy in which the object is known to reside. APEX follows the hierarchy up from each frame object until it finds an object with which the user is familiar. These ancestor objects are added to provide context. Figure 4 shows the transmitter of which the drawer is a part.
===
[Artwork node; type 'Artwork on' to command tool]
Figure 3. The first step in the metamorphosis of a picture depicting the command to open the drawer of the transmitter (the center cabinet shown in Figure 2): Frame objects, specified by information associated with the action frame being depicted. The viewer is initially assumed to know nothing about the location of any of the objects. The succeeding steps are shown in Figures 4-10.
===
[Artwork node; type 'Artwork on' to command tool]
Figure 4. Context objects, all ancestors of the frame objects up to the first objects with which the user is familiar.
===
APEX next attempts to find landmarks, objects whose properties should make them good reference points for locating other objects. These properties may be physical properties such as color, shape, or size, but may also be the user's familiarity with an otherwise nondescript object.
APEX currently has a rather naive set of criteria for finding landmarks for an object. It inspects those objects that are physically close to the object, compared to the other objects that comprise the object's parent. From those it selects objects that have relatively unusual physical properties or that are already familiar to the user.
===
[Artwork node; type 'Artwork on' to command tool]
Figure 5. Landmark objects, intended to help the viewer locate the frame objects.
===
Figure 5 shows the landmarks that APEX selected for the transmitter, the drawer, and the handle. In this case a landmark is found for the transmitter alone: the speaker cabinet hanging on the wall behind it. Although all the objects included in the picture so far have been depicted in their actual colors, the speaker is not. When the picture is rendered, a subdued rendering style is chosen for objects that were selected as landmarks, because they do not directly participate in the action. The current implementation realizes a subdued rendering style by blending the object's actual color with that of its parent, in this case the background color.
For each object included so far, APEX searches in its parents for those objects that are roughly similar to it, and thus ones with which it could potentially be confused by the reader. These similar objects are added to the picture in the same manner as the landmark objects. Figure 6 shows the objects found: the receiver and the interface, which are roughly the same size, shape, and color as the transmitter, and the two additional handles of the drawer. No similar objects were found for the drawer or the speaker.
===
[Artwork node; type 'Artwork on' to command tool]
Figure 6. Similar objects, added to help disambiguate the objects added so far from others that may be confused with them.
===
Each similar object is then compared with the object for which it was selected, using the hierarchical comparison routine described previously. As the comparison proceeds, information is stored with each part of the object considered, indicating whether the part was similar to or different from the other parts to which it was compared. This information is used later to determine how the object and its parts will be rendered, with disambiguating detail being included down to the level at which significant differences are found. Figure 7 shows this additional detail. The receiver's children, three drawers and a panel, are depicted, since at this level the major difference between it and the transmitter (two additional drawers) is discovered. The interface's door and panel are shown for the same reason. Lower level detail is not shown since it is not needed for disambiguation nor does it participate in the action being depicted.
===
[Artwork node; type 'Artwork on' to command tool]
Figure 7. Disambiguating detail, intended to help distinguish the frame and landmark objects from those that have been found to be similar to them.
===
APEX knows about another class of objects, those that must be included in a picture to prevent it from looking incorrect. For example, if an object is supported by another that would be visible given the current viewing specifications, then this supporting object is included. In Figure 8 the floor has been added to the picture because it supports the three cabinets.
===
[Artwork node; type 'Artwork on' to command tool]
Figure 8. Supporting objects, added if they are visible.
===
Finally, any remaining objects in the highest level of the object hierarchy being depicted are added, with the viewing specifications being modified just enough to indicate their presence. Figure 9 shows the only remaining object in this case, the left wall.
===
[Artwork node; type 'Artwork on' to command tool]
Figure 9. Siblings of those objects at the highest level at which APEX added context.
===
[Artwork node; type 'Artwork on' to command tool]
Figure 10. A meta-object arrow, added to indicate the motion of the drawer.
===
So far, the picture does not actually indicate the action to be performed. APEX currently knows how to indicate actions that involve motion by creating and including in the picture what we call a meta-object—an object that doesn't actually exist in the world being depicted, but that will be used to refer to objects in that world. The meta-objects that APEX can create are arrows drawn in the direction of a translation or rotation. Some simple rules are used to position the arrows based on the point at which force is applied, coupled with the angle and center of rotation or the distance and direction of translation. Figure 10 shows the arrow APEX has created to depict the motion of the drawer out along the z-axis.
Depicting a Sequence of Actions
APEX considers actions in the order in which they are performed. For each action that APEX knows how to depict (currently those that manipulate objects directly through translation or rotation), a picture will be created. The current implementation enforces a one-to-one mapping between depictable actions and pictures.
Figures 11-16 show the sequence of pictures APEX created to show a series of actions, beginning in Figure 11 with the action depicted in Figure 3. Note that the pictures are not supposed to be self-explanatory, but should ultimately be displayed with accompanying text.
===
[Artwork node; type 'Artwork on' to command tool]
Figure 11. The first picture in a sequence created by APEX to show a series of actions: Open the transmitter's drawer. The rest of the sequence is shown in Figures 12-16.
===
===
[Artwork node; type 'Artwork on' to command tool]
Figure 12. Rotate the drawer about its central support.
===
Figure 12 is intended to show that the now open drawer is to be rotated about an internal support. The pictures are designed to be presented sequentially, and therefore attempt to take advantage of information presumed to be imparted by previous pictures. Here, since the user is now assumed to know which drawer is being rotated, the transmitter and its landmark and similar objects are no longer included to provide context. In fact, the transmitter cabinet and the supporting rails along which the drawer rides are shown only because of their role in supporting the drawer. Figure 13 shows that a panel on the top side of the rotated drawer is to be opened.
In Figure 14, which instructs the user to close the panel, one of APEX's limitations is obvious. The arrow, although correctly drawn, is difficult to interpret because of its position relative to the viewer; the current system does not check for this. Figure 15 tells the user to rotate the drawer back to its original position. Note that by showing only a limited amount of the drawer's detail, the picture indicates more clearly that the motion is to be applied to the
===
[Artwork node; type 'Artwork on' to command tool]
Figure 13. Open the drawer's top panel.
===
[Artwork node; type 'Artwork on' to command tool]
Figure 14. Close the top panel.
===
===
[Artwork node; type 'Artwork on' to command tool]
Figure 15. Rotate the drawer back to its original position.
===
[Artwork node; type 'Artwork on' to command tool]
Figure 16. Close the drawer.
===
drawer, not, for example, to a panel that might lie at the tail of the arrow. Figure 16 tells the user to close the drawer.
Implementation
The APEX testbed is written in Franz Lisp on a VAZ 11/780 running Berkeley UNIX 4.2. The problem solvers for which APEX creates pictures are written in micro-Nasl/Frail.13 Pictures are generated in a 3D scene representation format that is interpreted by a variety of scan conversion systems.16 APEX required about 1 minute of CPU time to generate the specifications for the pictures in Figure 11-16. For debugging purposes the system allows pictures to be scan-converted as their specifications are generated, using a Lexidata Solidview z-buffered graphics system. This device was used to make Figures 2-16 and took approximately 10 seconds of real time per picture.
Research Directions
APEX is an actively evolving research tool. Its limitations are many and continually changing. A number of its components are temporary placeholders for better versions yet to be developed. Some of APEX's limitations have been imposed for reasons of efficiency. For example, the current system operates only on cuboid objects. Other restrictions reflect deeper issues, some of which are discussed below.
Pictures and actions. The one-to-one mapping between depictable actions and their pictures is too limiting. It should be possible to make pictures of high-level actions that abstract their low-level actions into fewer pictures than the fully detailed low-level actions would normally require. If a series of actions must be performed in strict sequence, the system should ensure that this sequence is depicted adequately. For example, some sequences of motion performed on one object may be unambiguously concatenated to form a single arrow's trajectory. Other sequences performed on one or more objects may be realized more clearly by coding the set of arrows in one picture as an ordered sequence, or by using separate pictures.
===
We would also like to create pictures that abstract or modify certain properties of objects besides detail.
===
The current system makes a single picture even if an action involving several small objects is surrounded by a much larger context. A better alternative would be a series of successive locator pictures that show with increasing specificity where to find the objects, or a detail inset that shows an enlarged and more detailed version of part of a picture. We would also like to create pictures that abstract or modify certain properties of objects besides detail. For example, interacting objects might be positioned differently in a picture so that their functional relationships rather than their actual physical locations are shown.
Detail removal and comparison. The approximations produced by detail removal may need to change when an object changes, either by motion of its parts or by changes in its hierarchical structure (for example, when a part is removed during disassembly), or when a viewpoint changes. Some performance improvements could be gained by taking advantage of coherence between pictures. The hierarchical nature of information associated with both detail removal and object comparison can also be used to limit how far the effects of a change have to propagate.
The methods currently used for detail removal and comparison are far from satisfactory, even for static objects. In particular, the detail removal procedure pays attention only to the gross area covered in the projection. It does not, for example, take into account silhouette information. The comparison procedure does not deal well with objects that have similar visual appearance but significantly different hierarchical structure. Both detail removal and comparison currently disregard the functional importance of a part or its similarities to others.
Rule base. The implementation of the system's picture-creation process is still far too ad hoc: No satisfactory mechanism is in place for easily developing or encoding design rules, let alone allowing a nonprogrammer graphic designer to specify them.
Conclusions
APEX is a research testbed for exploring problems in the automatic design of sequences of pictures that depict the performance of actions on objects. Facts about actions, participating objects, and user knowledge determine the objects included in a picture, their level of detail, rendering style, and the picture's viewing specification. APEX's graphic design knowledge, however, is limited, static, and difficult to encode. It is a research vehicle, not a practical system. We believe that far more powerful systems will one day make possible interactive, automatically generated presentations that communicate effectively using pictures as well as text.
Acknowledgments
David Kantrowitz designed the detail removal algorithm. Alex Kass implemented an earlier automated maintenance and repair manual that paved the way for APEX. Members of the Brown University computer graphics group produced the modeling and rendering software on which APEX depends for its output. Many thanks for ideas and suggestions are due to Leif Allmendinger, Eugene Charniak, Aaron Marcus, Barb Meier, Mihai Nadin, and Andy van Dam. This work was supported in part by the Office of Naval Research under Contract No. N00014-78-C-0396, Andries van Dam, principal investigator.
References
1. D. Backer and S. Gano, ``Dynamically Alterable Videodisc Displays,'' Proc. Graphics Interface '82, Toronto, May 17-21, 1982, pp. 365-371.
2. S. Feiner, S. Nagy, and A. van Dam, ``An Experimental System for Creating and Presenting Interactive Graphical Documents,'' ACM Trans. on Graphics, Vol. 1, No. 1, January 1982, pp. 59-77.
3. M. Brown and R. Sedgewick, ``A System for Algorithm Animation,'' Computer Graphics (Proc. SIGGRAPH 84), Vol. 18, No. 3, July 1984, pp. 177-186.
4. S. Feiner, ``Research Issues in Generating Graphical Explanations,'' Proc. Graphics Interface '85, Montreal, May 27-31, 1985, pp. 117-123.
5. K. Kahn, ``Creation of Computer Animation from Story Descriptions,'' Ph.D. thesis, AI Tech. Report 540, AI Lab, Massachusetts Institute of Technology, August 1979, 323 pp.
6. S. Gnanamgari, ``Information Presentation through Automatic Graphic Displays,'' Proc. Computer Graphics 1981, On-Line Conferences, Ltd., London, Oct. 1981, pp. 433-445.
7. F. Zdybel, N. Greenfield, M. Yonke, and J. Gibbons, ``An Information Presentation System,'' Proc. IJCAI 81, Vancouver, August 24-28, 1981, pp. 979-984.
8. D. Neiman, ``Graphical Animation from Knowledge,'' Proc. AAAI 82, Pittsburgh, Penn., August 18-20, 1982, pp. 373-376.
9. M. Friedell, ``Automatic Synthesis of Graphical Object Descriptions,'' Computer Graphics (Proc. SIGGRAPH 84), Vol. 18, No. 3, July 1984, pp. 53-62.
10. ``Technical Manual Writing Handbook,'' MILHDBK-63038-1 (TM), Department of the Army, Lexington, Ky., 1978, 184 pp.
11. J. Bertin, Semiology of Graphics, translation by W. Berg, University of Wisconsin Press, 1983, 415 pp.
12. E. Tufte, The Visual Display of Quantitative Information, Graphics Press, Cheshire, Conn., 1983, 197 pp.
13. E. Charniak, M. Gavin, and J. Hendler, ``The Frail/Nasl Reference Manual,'' CS Tech. Report CS-83-06, Dept. of Computer Science, Brown University, Providence, R.I., February 1983, 22 pp.
14. M. Minsky, ``A Framework for Representing Knowledge,'' in The Psychology of Computer Vision, P. Winston, ed., McGraw-Hill, 1975, pp. 211-277.
15. J. Clark, ``Hierarchical Geometric Models for Visible Surface Algorithms,'' CACM, Vol. 19, No. 10, October 1976, pp. 547-554.
16. P. Strauss, M. Shantzis, and D. Laidlaw, ``SCEFO: A Standard Scene Format for Image Creation and Animation,'' Brown University Graphics Group Memo, Providence, R.I., 1984, 32pp.
STEVEN FEINER is an assistant professor of computer science at Columbia University. His primary research interests are in the areas of picture synthesis, computer animation, human-machine interfaces, and multimedia information systems. Feiner received the AB degree in music from Brown University in 1973 and expects to receive his Ph.D. in computer science from Brown University in 1985. He is a member of ACM Siggraph, IEEE, SID, and Sigma Xi.
Feiner can be contacted at the Department of Computer Science, Columbia University, New York, NY 10027.