Techniques for Interactive Raster Graphics Patrick Baudelaire* & Maureen Stone Xerox Palo Alto Research Center Palo Alto, California ABSTRACT The visual quality of raster images makes them attractive for applications such as business graphics and document illustration. Such applications are most fully served using interactive systems to describe curves, areas and text which can be rendered at high resolution for the final copy. However, to present such imagery in an interactive environment for moderate cost is difficult. Techniques are presented that provide solutions to the problems of scan conversion, screen update, and hit testing for a class of interactive sytems called illustrators. The design rests on the use of software display file encoding techniques. These ideas have been used in the implementation of several illustration programs on a personal minicomputer. Key Words and Phrases: computer graphics, interactive graphics, display encoding, chain encoding, run-length encoding, scan conversion, illustration systems. CR Categories: 8.2 INTRODUCTION Pictures are a part of a large number of applications. A certain class of pictures can be referred to as illustrations. That is, the point of the picture is to illustrate a principle as part of an article or presentation. Good illustrations are visually interesting, but are judged more on their content than on their artistic merit. With respect to picture complexity, illustrations are simple in that there are a moderate number of shapes, clearly bounded by curves and lines. The image is essentially two dimensional, and color is often restricted to flat filled areas; that is, uniform colors and simple textures. Illustrations are important in business and publishing environments today. The advantages to be gained from using a computer to generate these images are similar to the ones for word processing: images are easily modifiable, pictures can be filed and copies easily generated, subimages can be libraried to facilitate the composition of a series of illustrations. However, to be acceptable outside of the experimental environment, the image quality of illustrations must be high. Curves and lines must be smooth, text must be represented in a variety of fonts, objects must be accurately positioned. While each illustration may contain a limited number of colors, a wide range must be available. Given the type of imagery desired, a raster display will give much better representation of the picture than a line oriented display. The constraint of high image quality, even after scaling and repositioning operations, means that the picture must be stored using a high precision representation. Furthermore, if the data base has sufficient precision, the set of affine transformations can be used as tools for generating images. *Present Address: TECSI, 29, rue des Pyramides, 75001 PARIS, FRANCE The process of designing an image is the mapping of some visualization onto a medium. Therefore, that any effective tool should provide visual feedback as the image was formed. Furthermore, the image should be built up in a two-dimensional manner, that is, by pointing to positions on a page, which indicates an interactive system. In this paper, such a system will be called an illustrator. Given a high precision data base, such as endpoints for lines, coefficients for splines, plus information about the resolution of the display, there are many standard techniques for displaying straight lines, curves and areas (5). This process is known as scan conversion. There are a number of problems in using these techniques in an illustrator, specifically: All straight lines and curves must be represented with a finite width. In an illustration, using lines of different widths is part of the visual effect. Therefore, displaying these shapes involves more than just digitizing the curve. Areas are discribed by their outline. Therefore, it is necessary to compute which points are inside the outline and color them in accordingly. While there are many techniques which address this problem (10, 11) in an interactive system the speed at which this can be done is an important factor as the display is constantly changing as the user works. In an interactive system, the display is constantly changing. For speed considerations, it is desirable to do this incrementally, updating only the areas that are affected by the change. Incremental refresh can cause problems displaying the correct overlap order for objects. In an interactive system it is necessary for the user to select objects to be manipulated. A natural way to do that is to point to the object. Therefore, given a point on the display, it is necessary to determine which object has been selected. This is called hit testing. Hit testing is the inverse of the display problem, and is even more speed critical than refresh. This paper will discribe methods for solving these problems that achieve a reasonable compromise in simplicity of implementation and response time. The solution is based on using display encoding techniques which provide a representation of the image that is compact, structured, and simple to manipulate. These techniques have been implemented in systems which have been used to successfully create illustrations. Examples are included at the end of the paper. DATA REPRESENTATIONS AND SYSTEM DESIGN Graphics systems can be categorized by the type of data used to represent the picture. One type of interactive graphics systems uses raster dots or arrays of intensity samples as the unique representation of the image. These systems use a painting model to manipulate the dots directly. These systems are easy to use, and have been shown to produce effective imagery (8, 9). However, the picture is unstructured, which limits the types of manipulations that can be performed without special hardware. The resolution is tied strongly to the display resolution, which can limit image quality. In general, to be effective visually, the images from such systems require large amounts of data, either in resolution or in bits per pixel. We have chosen a more structured approach. Given a geometric representation, one can consider that geometric elements (lines, curves, areas) are all made of filled contours. This paradigm is quite useful for applications such as the production of rasters for high-resolution printing of graphics and text (1). However, this model is difficult to implement in an interactive application because using the geometric data for display and hit testing can be expensive in terms of computation time. An interactive display file can be used to provide a definition of the picture that is structured, yet can be manipulated fast enough for an interactive system. It is natural for several representations to coexist in the design of a graphics system, usually according to levels of hierarchy. At the top level may exist a representation that embodies some specific knowledge or meaning relevant to a particular application: architectural drawings, blueprints for mechanical parts, electrical circuit diagrams, etc. Here we will consider that the top level representation that we chose aims only at giving a unambiguous and complete description of high-quality business graphics illustrations. To implement these ideas, the following systems approach was used: The workbench is a moderate resolution raster display plus pointing device attached to a 16 bit mini-computer. The final output device is a high resolution raster device such as a film recorder, phototypesetter, or raster printer. Shapes are defined by their geometry: trajectories and contours; plus style informations such as line width, colors, and textures. Trajectories can be specified by one of several mathematical schemes such as splines or other knot-based approximations, circles or other conical equations. Text is unformatted, and described in terms of position and string information plus style parameters such as font, color, and orientation. The user builds display objects by pointing at fitting points and indicating fitting methods such as straight lines or curves. All numbers are represented as floating point values to provide sufficient precision. The top level description is converted to an interactive display file, where the interactive processing, refresh and hit testing, will take place. The display file is used to generate a bitmap, that is a one bit per point rectangular array of samples (13), which is used as the refresh buffer for the display. This structure is summarized in figure 1. THE INTERACTIVE DISPLAY FILE The display file representation is used for refresh and hit testing. In designing such a representation, the first consideration is to what precision the objects should be encoded. Clearly, display resolution is sufficient for refresh. Since the user cannot specify a position by pointing which is more precise than the display resolution, it is also sufficient for hit testing. Both hit testing and incremental refresh involve scanning through the display file to determine what objects are in a specified area. Therefore, partitioning the display file to facilitate culling will increase performance. It is important that the display file be reasonably compact, yet not so difficult to generate or decode that it negates the advantages in terms of speed. It is also convenient to be able to encode all objects in a uniform manner. Given these considerations, we have applied and modified some well known encoding techniques, chain encoding of trajectories (3, 6) and run-length encoding of areas (2, 4). These techniques are essentially compressed representations of the bitmap. Chain encoding is based on the assumption that edges are continuous. Therefore, the basic representation is the differences between adjacent pixels. Run-length encoding is based on the assumption that flat areas contain most of the information in the outlines. Therefore, the basic representation is the position of the edge on each scan line. While it would be possible to run-length encode all objects, the increased structure in the chain encoding is appealing for lines. The particular encoding schemes chosen permit the segmentation of each object into pieces that are independent and bounded in display size. It follows from this that the time for display of one piece is bounded too. This makes it possible to run the screen refresh as a background process. All shapes can be described by one of the two types of encoding. Thus, pointing detection can be done by a single algorithm, independent of the mathematical definition of the elements (lines, circles, conics, splines, etc.). CHAIN ENCODING Chain encoding is a differential encoding scheme which records the screen coordinate increments between successive raster points on a trajectory. That is, from one trajectory point to the next, raster coordinates may differ only by -1, 0 or 1. Thus, a point may have eight possible successors, so each point new position could be represented in four bits. But, since common curve trajectories are monotonic for reasonably long intervals, one can take advantage of the continuity in direction to further reduce the number of bits per point. A number of schemes are possible for encoding coordinate increments. Two interesting and practical ones are described below. The first scheme uses two-bits to represent the coordinate increments (figure 2). The set of eight possible curve directions is divided into four quadrants. For each direction quadrant, the three possible coordinate increments are assigned code values 1 to 3. Code value 0 is used to indicate a change of quadrant, with the following two bits specifying the new quadrant. Therefore, the trajectory encoding is a stream of two-bit codes, starting with the quadrant number (0 to 3), followed by increment codes (1 to 3), terminated by a 0. The second scheme requires two streams (figure 3). The set of eight possible curve directions is divided into eight octants. Within each octant there are two possible directions. Therefore, it is possible to indicate each step within an octant with one bit. One stream contains the one-bit increment codes for a given direction octant, and the second stream contains the octant numbers and the number of steps in each octant. Besides being somewhat more efficient, it is possible to get a general idea of the behavior of a curve segment by examining the direction octants alone. An example of where this feature can be used is for defining edges for the scan converter. Using the chain encoding in the scan conversion process will be described further below. As mentioned above, in order to gain efficiency in the screen updating and pointing detection algorithm, the encoding is fragmented into independent pieces of similar length that we call chunks (figure 4). The bounding box for each chunk can be stored to facilitate culling on refresh and hit testing. The display file contains the following information for each chunk: Screen coordinates of the starting point S Stream(s) for the chain encoding The bounding frame: H and W. It is interesting to note that if the chunk size is such that each segment contains at most N trajectory points, all these points are enclosed in a square of size 2N centered at the starting point of the segment. So even if the bounding box is not explicitly stored with the chunk, a bounding region can be computed. RUN-LENGTH ENCODING Run-length encoding defines an area in terms of a starting scan line (Y) value plus a list of pairs of raster (X) values. In practice, it is more efficient to make the second X value relative to the first, so each run is defined as a starting X (SX) and a delta X (DX) value. The list of runs can be broken into chunks, such that each chunk defines a maximum of N runs. Therefore, each chunk has a bounding frame. It is desirable to make the starting X values relative to this frame, so that the chunk can be relocated simply by translating the frame boundries. (figure 5) The encoding described above works only for convex areas, specifically, it assumes one continuous run of rasters per scan line within the area. For concave shapes, there are two options: break the area into convex pieces, or introduce a flag into the list of runs that defines the number of runs per scan line. We have implemented the second option, choosing a negative starting X as a flag, signaling that the delta X value is the new number of runs per scan line. Assuming the run-length encoding describes areas at display resolutions, the starting and delta X values need be no larger than the maximum display coordinate. In practice, most runs can be described in fewer bits. Therefore, it is worthwhile to consider using a variable field for X. While maximum compression would be obtained by using a technique such as Huffman coding for field size, for simplicity we have chosen to implement two fixed fields (8 and 16 bits). The display file contains for each chunk: Frame boundry: upper left point S in screen coordinates (includes the starting Y value), plus frame size H and W. Field length: either bytes or words. List of run values, defined as SX relative to S, and DX relative to SX. SCAN CONVERSION The geometry to rasters conversion scan conversion process can be decomposed in two operations: Converting the geometry into the display file representation, and converting the display file to rasters. The conversion from geometry to display file only need be performed when an object is created or the shape is changed. Because this is a relatively infrequent operation, standard techniques for digitizing curves and filling areas provide acceptable speed. The specific algorithm used for areas is given at the end of this section. Objects are displayed in back to front order to give the correct overlap information. All display, refresh, repositioning, and hit testing operations can be implemented by manipulating the display file. Display The display operation is defined as the conversion from the display file representation to the rasters which are stored in the bitmap. The final action of writing raster bits into the bitmap is implemented by an efficient and versatile firmware function called RasterOp. This function copies and modifies bit patterns from an arbitrary rectangular area in picture memory to another rectangular area. RasterOp is described in more detail in (13) and (5). For areas, the run-length encoding can be displayed directly using RasterOp to display each run as a one line high area. To draw curves we use the paradigm of painting with a "brush" moving along their trajectories. The chain encoding provides a digitized representation of the trajectory. A round brush will approximate a line of uniform width equal to the brush diameter. In the simplest implementation of this model, RasterOp can be used to paint an image of the brush at each new raster position along the trajectory. However, because the successive brush images overlap, many of the pixels along the trajectory are written several times. The model can better be implemented by breaking down the brush into a set of horizontal sections and accumulating the rasters filled as the brush moves. Once a scan line is complete, that is, the brush has moved far enough that it no longer touches the scan line, the entire horizontal section can be displayed using RasterOp. This implementation is more efficient than the traveling brush because the affected rasters are only written once. Furthermore, the form of RasterOp used to display lines has been reduced to the same case used for areas. This uniformity makes applying colors or halftones to objects straightforward. Refresh In an interactive graphics system, the displayed image is constantly changing as the user adds, transforms, and deletes objects. In a system which contains only lines and curves, it is possible to write all new objects as they are created or repositioned, and to simply leave the small areas that are erased out of overlapping objects (from a transform or delete operation) unrefreshed. The user can then replot the entire picture when the image degrades too much. For systems which include filled areas, this approach is inadequate because the amount of the picture obliterated by erase operations overlapping other objects is too extensive. It is therefore necessary to find some way to quickly and accurately refresh subareas of the entire picture. We will call this process incremental refresh. The screen update process should be as fast as possible, yet must leave the screen in the correct state as to the shape of the objects and their overlap order. It follows that the design considerations for an incremental refresh algorithm are: The definition of an area to refresh should contain a minimum number of rasters that need to be regenerated. Objects within the refresh area must be replotted quickly and with no rippling effects. If the incremental refresh is going to run as a background process, the time necessary to refresh an area must be bounded. In general, the affected area has such a complex boundary that determining exactly which rasters fell inside the outline would be too time consuming a process. It is therefore necessary to use some approximation to the area. The simplest approximation is a rectangle. While for certain operations, such as erasing a diagonal line, the minimum rectangle that describes the area affected covers far more of the picture than actually need be redisplayed, rectangles are much easier to manipulate than other shapes such as trapezoids. Once a refresh area has been defined, the objects which are affected must be found. This is the same process as hit testing, except that each object must be compared to the boundaries of the refresh area instead of a small area around a point. If the display system contains an efficient clipper with variable clipping boundries, the update problem can be solved simply by setting the clipping region to the boundries of the affected area and refreshing the entire screen. In the type of system we are describing here, since the rasters are generated from the display file, the partitioning of the display file into bounded chunks provides a method for fast culling for this type of refresh. If the clipper is not used, the problem of rippling effects can arise because the object or chunk definition will in general generate rasters outside of the refresh area, which can affect objects not currently in the list of objects to be refreshed. For example, in figure 6a, object A is shown as overlapping object B. Part of object B needs to be refreshed (figure 6b). In refreshing object B, care must be taken that the correct overlap order is maintained between A and B. If overlap order is determined simply by back to front refresh of the display, just replotting all of B will result in B appearing to be on top of A (figure 6c) unless figure A is also refreshed. It is common for the user of an interactive graphics system to operate on several objects in a picture at one time leaving several areas needing to be refreshed. These objects may or may not overlap. One approach is to simply accumulate a maximum area as each object is operated on. However, if the objects are disjoint, it can give a very bad estimate of the affected area. Another approach is to treat each object independently. However, if the objects overlap, intersecting refresh areas will be redisplayed several times, which, while leaving the display in a correct state, is distracting to the user as well as time consuming. A third approach is to accumulate a refresh area for each object, and then process these areas to eliminate overlap cases. In a system where RasterOp is the limiting factor for display operations, the third approach is far superior to the other two. Area Scan Conversion Polygon scan conversion is described in detail in chapter 16 of (5), and much of the terminology in this section will be taken from that source. Here we would like to outline an approach that has been shown to be adequate for line and curve bounded areas, including concave areas, areas with holes, and areas with twists. The problem is to display solid areas which have overlap order but no depth. That is, areas do not intersect in the Z direction. The areas are bounded by combinations of spline curves and lines, and are not strictly convex. The pictures displayed are of moderate complexity, probably around 20-25 areas. Overlap order is resolved using the painters algorithm. That is, objects are displayed in overlap order, back to front such that front objects simply "paint over" objects that are behind them. The outline of each area is chain encoded. Each chunk is constrained to be monotonic. Scan conversion occures at display resolution, using the chain encoding as the edge definition. Each area is taken separately, and the outline is broken into a list of "edges" which are monotonic in the scan direction. This is determined by examining the chunks of chain encoding. These edges can be sorted on Y, then the intersections for each scan line are determined by running along the encoding. The X values are sorted, and taken pairwise to define the filled interior of the object. This is essentially the Y-X algorithm described for polygons in (5). Care must be taken when defining the edges that endpoints for edges are properly defined. The chain encoding is one continuous stream with one definition for each raster point. However, it is essential for the Y-X algorithm to have an even number of edges for each scan line. Therefore, the endpoint of edges that fall at maxima/minima of the object must be doubled. Furthermore, points which are not at a maximum/minimum in the scan direction must not be defined on two edges. This can be achieved by making edge boundries only at maxima/minima in the scan direction (figure 7). It is important to note that there may be horizontal sections inside an edge (figure 8). Therefore, it is necessary to return a range of X values for each edge intersection at each scan line. The left or right value is taken depending on whether the edge is a left or right edge. This determination takes place after the X values are sorted. The main advantages to this approach are speed and consistency. All curves are converted to rasters using some standard algorithm. Once this is done, any analysis of the curve, such as for monotonicity or for intersections, is done using the chain encoding. Furthermore, if the area is outlined, the outline and the edge of the filled interior are guaranteed to match since they come from the same digitizing algorithm. HIT TESTING By the term "hit testing" we mean given some target point, usually the display coordinates of a cursor, which is the selected object? In hardware augmented systems, hit testing is done by the clipping and display system. The clipping boundry is set to a small window around the hit point, and the entire picture is refreshed. An object falling inside the window is returned as a possible hit. In a raster system, redisplaying the object list is too slow, and redisplaying the rasters provides no structural information. However, the segmented encoding provides both structural information and a way to quickly determine which objects are candidates. The frame information on the encoding chunks provides a method for quickly culling out those segments which do not fall near the target point. The remaining chunks can be decoded using the same routines which display the encoding, except that the resulting points are compared to the target point instead of being displayed. Once the objects which lie near the target point are identified, some algorithm must be applied to select one of the objects. Some considerations are: absolute distance from the point, overlap order of the objects, prefered objects, etc. CONCLUSIONS Using these design principles, it is possible to make an interactive system which uses a raster display for design, yet has a geometric data structure which can be used to generate quality output on a high resolution raster device such as a film-recorder, a photo-typesetter, or a laser printer. Two systems using these principles have been designed and implemented on the Alto personal computer. One provides only lines and curves, the other also provides the capabilty for filled areas. Typical imagery is shown in figures 9 and 10 and in the illustrations in this paper. ACKNOWLEDGEMENTS Several people have contributed significantly to the design and implementation of the systems this paper is based on. The authors would like to thank Rick Tiberi, Geoff Brown, and John Dawson for their help and ideas. BIBLIOGRAPHY 1. Baudelaire, P., Flegal, R.M., and Sproull, R.F., Spline curve techniques, Xerox Palo Alto Research Center Internal Publication, (December 1977). 2. Erdahl, A.C., Displaying computer-generated half-tone pictures in real time, University of Utah Computer Science Technical Report, Salt Lake City, (1969), 4-14. 3. Freeman, H., Computer processing of line-drawing images, ACM Computer Surveys, Vol. 6, No. 1, (March 1974), 57-97. 4. Laws, B., and Newman, W.M., A gray-scale graphics processor using run-length coding, Proceedings of the IEEE Conference on Computer Graphics, Pattern Recognition, and Data Structures, (May 1975). 5. Newman, W.H. and Sproull, R.F., Principles of Interactive Computer Graphics, 2nd edition, McGraw Hill, New York, 1979. 6. Pitteway, M.L.V., A simple data compression technique for graphics displays or incremental plotters, Symposium on Computer Processing in Communications, (April 1969). 7. Shoup, R., Color table animation, Computer Graphics, Vol. 13, No. 2, (August 1979), 8-13. 8. Shoup, R., Some experiments in television graphics and animation using a digital image memory, Presented at the 13th Television Conference of the SMPTE and published in Digital Video, Vol. II, SMPTE, Scarsdale, New York, 1979, 88-98. 9. Smith, A.R., Paint, Technical memo #7, Computer Graphics Lab, NYIT, Old Westbury, New York, 11668, (July 1978). 10. Smith, A.R., Tintfill, Computer Graphics, Vol. 13, No. 2, (August 1979), 276-283. 11. Sproull, R.F. and Newman, W.M., The design of gray-scale graphics software, Proceedings of the IEEE Conference on Computer Graphics, Pattern Recognition and Data Structures, (May 1975). 12. Sproull, R.F., Raster graphics for interactive programming environments, Computer Graphics, Vol. 13, No. 2, (August 1979), 83-93. 13. Thacker, C.P., McCreight, E.M., Lampson, B.W., Sproull, R.F., Boggs, D.R., Alto: a personal computer, Xerox Palo Alto Research Center Internal Publication, (July 1979).