IEEEInterpressArticle.tioga
Rick Beach, May 22, 1986 4:07:45 pm PDT
Rick Beach, January 2, 1987 3:40:22 pm PST
BHUSHAN AND PLASS
INTERPRESS PAGE AND DOCUMENT DESCRIPTION LANGUAGE
SIGGRAPH '87 TUTORIAL COURSE NOTES
DOCUMENTATION GRAPHICS
The Interpress Page and
Document Description Language
[Republished from IEEE Computer 19(6):72—77, June 1986]
The Interpress Page and
Document Description Language
Abhay Bhushan and Michael Plass
Xerox Corporation
Potential users of computer-aided documentation systems almost inevitably face the problem of how to output document files created on a diverse set of tools, including workstations, mainframes, word processors, and scanners, to a broad range of shared printing devices, from low-speed desk top printers and low-resolution proof printers to high-speed document production systems and high-resolution phototypesetters. More often than not, output devices each require different formats to characterize a document's design, making it virtually impossible to achieve any consistency of appearance between versions of the document printed on different devices. They may also require unique interfaces with device-dependent control codes, protocols, and configuration requirements. Such demands effectively prevent the user from exploiting the full capabilities of advanced printing technology. The Interpress page description language addresses this problem with a device-independent interface and a format description methodology that make it possible to efficiently employ a full array of output resources in a manner that is transparent to the user.
Toward Device Independence
Computer-driven raster printers are inherently expressive devices, capable of printing any imaginable combination of text, graphics, and pictures simply by arranging the appropriate pattern of black (or colored) dots. Unlike character- or line-oriented devices, they are limited in the range of images they can print more by the expressive power of the interface to the document creation tool than by their own capabilities.
Because raster printers are driven by computer software and do not use device-specific control codes, it is possible to have a universal interface or interchange standard in which any document may be represented, and that can drive any raster printer, independent of its resolution (expressed in dots per inch) and other device characteristics. Interpress attempts to be such a device-independent standard. Since Interpress is so conceptually different from conventional printer interfaces, it is generally referred to as a page and document description language. The Interpress printing architecture is a software architecture that represents a unified document description scheme for printing in diverse system environments, including stand-alone computers, data processing centers, publishing and office work groups, and large, interconnected networks.
When work first began on Interpress over a decade ago at Xerox Palo Alto Research Center, the original goal was to design a system that permitted the same format to be used for editable (revisable form) as well as printable (final form) documents; but early in development this goal was seen as impractical and was discarded in favor of separate interchange schemes for revisable and final form documents. Interpress emerged as a language for final form document representation. The language was made publicly available in 1984 (an earlier version of the current Interpress 3.0), and Xerox is committed to keeping it an open system.
The Interpress Approach
In one possible document description scheme, the document creation device would send to the raster printer a facsimile picture of the intended output, presented directly in that printer's raster format. Such a scheme, though conceptually simple, has many disadvantages. First, raster data, which amounts to a map of every point that the printer must address, takes up an enormous amount of storage space (millions of bits per page, even when compressed). This increases not only storage costs but also transmission time and costs. Further, if the page contains text, the computer program generating it must have access to the raster image of all of the fonts for all printers that might be used; such a requirement is quite impractical. It would also be difficult to transform the raster images--to rotate, scale, or move them to fit a particular space or to achieve a desired effect. Finally, the raster format is based on the resolution of the printing device and is therefore not device-independent.
By contrast with this pictorial approach, Interpress represents a page not as a set of points for the printer to address but as a series of instructions analogous to a computer program. This technique permits the user to print substantially more complex documents than is possible with the static format specifications of character printers, and to do so with greater efficiency than is possible with conventional raster data.
===
[Artwork node; type 'ArtworkInterpress on' to command tool]
Figure 1. Three different representations of the letter ``A''.
===
Figure 1 illustrates this approach to representing the letter ``A.'' The pictorial representation takes several thousand bits at typical printer resolutions, the geometric representation takes several hundred bits, and the textual representation requires only eight bits once the font has been specified. While Interpress permits all three approaches, the textual approach is preferred in most applications. The geometric approach of outline characters is useful when unique character sizes are desired or when the characters need to be rotated at unusual angles.
The program that drives the printing machine to produce the finished output document is called an Interpress master, written in the Interpress programming language. Although most masters consist only of simple statements, such as text and vectors, the full power of the programming language is available for complex applications. Programming is useful if the master is to adapt to the various properties of the printing device (such as page size, order of page printing, color or black and white, etc.) and to change Interpress masters in complex ways.
Though a key task, describing pages is not enough. What is communicated to the printer usually is a document, which is a collection of one or more pages produced in a specific order and intended to have a specific relationship with one another in the document's final form. A document description language therefore must be able to specify how pages are put together. The Interpress design includes a set of printing instructions that enables the user to control the printing of documents--to invoke two-sided printing, for example, or a special finishing such as stapling. Printing instructions also provide information necessary for multi-user environments (the document's name, author, etc.) and enable the declaration of resources required for the the document to be printed (e.g., additional files, fonts, and font sizes).
The Interpress Language
Like all programming languages, Interpress has both syntax and semantics. The semantics of the language define how the various operators behave when they are executed by the printer; the syntax of the language defines how the calls to those operators are coded in a master. Since Interpress masters are intended to be created and interpreted by software and not by people, the language syntax was designed to make it easy for computers to produce and interpret, without concern for human readability. As such, Interpress commands are normally encoded in a binary format designed for compactness and decoding ease. For debugging purposes, utility programs translate the binary encoding to and from a human-readable text representation; this ``written encoding'' is used in the examples in this article.
The software within a printer interprets (executes) an Interpress master to print a document. During that execution, the printed document is built up one page at a time. When a page of a master is printed, an interpreter ``executes'' the code that constitutes the page description, much as a machine executes a program. The state of this virtual machine includes a set of 50 ``registers'' called the frame, a (potentially large) stack, and a set of special imaging variables. This computational environment is shown in Figure 2.
===
[Artwork node; type 'ArtworkInterpress on' to command tool]
Figure 2. The Interpress computational environment.
===
The elements stored in the frame and stack are Interpress values, which may be of type number (integer or real), identifier (similar to atoms in other languages), vector (a packaged sequence of values), body (a packaged piece of Interpress code), or operator (an Interpress program that can be executed). Character codes are represented by integers, and character strings by vectors of integers; there are compact encodings available for this kind of vector to keep the master compact. Additionally, there are special types that are constructed and used only by imaging operators, such as color, transformation, pixelArray, font, trajectory, and outline.
Interpress uses a postfix execution model: the occurrence of a literal in the instruction stream simply causes the corresponding value to be pushed on the stack, and an operator may pop some number of parameters off the stack and push some number of results. There are provisions for marking the stack to ensure that defined procedures (composed operators in Interpress parlance) pop and push the expected number of items. There is also a way of saving and restoring some or all of the imaging variables around the invocation of an operator. Furthermore, composed operators do not read or modify the caller's frame; they have their own frame, initialized when they were created and the same at each call. Thus the side effects of composed operators are well controlled.
The Imaging Model
Of course the most important side effects are those that change the image on the page. The software maintains a page image, which is altered by the imaging operator as the page is built. A complex page is made by starting with a blank page image and making a sequence of simple changes to it. Interpress is defined so that the partially built page image cannot affect the execution of the master; this makes Interpress useful for expressing images destined for non-raster devices such as pen-plotters and certain phototypesetters. All of the changes to the page image are made according to the Interpress imaging model illustrated in Figure 3: a color is instanced through a mask onto the page image, covering up (or perhaps altering) what was there before. In Figure 3, the color (represented by the parallelogram) is set with 0.5 SETGRAY, and the ``b'' on the page image is printed using the filled outline mask defining the character's shape.
Colors
Interpress uses the term ``color'' to designate a concept that is more general than what we mean in our day-to-day use of the term. An Interpress color may be simply black or white, or a shade of gray, or a "real" color like red or blue, or a generic color like ``highlight,'' which just means something other than black; these are all examples of constant colors. Constant colors may be specified by means of the SETGRAY operator (for simple grays), or from the printer's environment (by supplying a name to the FINDCOLOR operator), or via a color model, which is simply an operator that accepts a vector of numeric parameters on the stack and returns a color on the stack. (Color models are normally obtained from the environment by means of the FINDCOLORMODEL or FINDCOLORMODELOPERATOR operators).
===
[Artwork node; type 'ArtworkInterpress on' to command tool]
Figure 3. The Interpress imaging model.
===
The other kind of Interpress color is a sampled color, which consists of a rectangular array of pixel descriptions, along with the specification (color model) for how the pixels are to be interpreted and a transformation to specify how the pixels of the sampled color are to correspond to the pixels of the page image. The sampled color conceptually tiles the whole page, so it may be used to produce textures and wallpaper-like effects, as well as full-color continuous-tone images. The simplest kind of sampled color consists of a one-bit-per-pixel array, with 1 denoting black and 0 denoting either white or clear; this is known as a sampled black. Interpress allows various types of compression to be used in the encoding of the arrays of pixels; the ones that are currently defined in the system's Raster Encoding Standard are expressly for the one-bit-per-pixel case.
Masks
The other half of the imaging model is the notion of masks. A mask is simply a two-dimensional shape used as a stencil in the application of color to the page image; it specifies the portion of the page image to be colored. A mask may be a rectangle, a stroke of a specified width along polygonal or curved trajectories, a dashed or dotted stroke, an area bounded by a set of trajectories, a bitmap (possibly compressed) at any resolution, or a string of text in a particular font. Any of these masks may be used with any color. The current color is one of the imaging variables, as is the current font and the current transformation (more about these later); the different kinds of masks have different operators.
A Simple Example
To illustrate some of the basic features of Interpress, consider the example in Figure 4. This master will print two pages; lines 1 through 7 specify the first page (containing a rectangular box made up of four strokes) and lines 8 through 13 specify the second (containing the text ``Print this''). Lines 0 and 14 constitute the skeleton that brackets the page bodies (the empty brackets in line 0 are the preamble, which can set up the initial frame used for each page body; thus fonts, for example, must be declared only once rather than in each page). If the page brackets on lines 7 and 8 were eliminated, only a single page would be printed with ``Print this'' inside the rectangular box, as shown on the right in Figure 4.
The operations that actually change the page images occur in lines 3 through 6 (MASKVECTOR) and line 12 (SHOW); the other lines set up the necessary imager variables (strokeWidth for the MASKVECTOR, font and current position for SHOW). All the dimensions in this example are expressed in meters, which is the unit of the initial coordinate system of Interpress; the origin is in the lower left corner of the page, with x increasing to the left and y increasing towards the top of the page.
===
[Artwork node; type 'Artwork on' to command tool]
Figure 4. A simple Interpress master and the two pages that it describes.
===
Typographic Printing
Typography is the art of designing and placing letterforms to create a legible and pleasing effect. The quality of typography largely depends upon the availability of character sets and fonts. The Interpress printing architecture includes the multilingual Xerox Character Code set and the Font Interchange set. Interpress also allows other fonts and character codes to be used and intermixed freely in a master. For example, the ASCII formatting characters such as carriage return and tab are not recognized by the system because there is no accurate way to interpret what these characters should do. Formatting operations are achieved by positioning commands such as SETXY. Interpress also does not generate ligatures automatically. Ligatures must have a separate character code and representation (provided in the Xerox character code set). This reflects an important principle of Interpress design: all decisions about presentation and formatting should be made by the creator, not the printer. This principle ensures that documents are printed accurately and have uniform appearance on different output devices.
Further facilities include letterform definitions expressed as character operators; positioning operators used to control the position of the letters; geometric transformations to scale, rotate, and translate a letterform so that it can appear in arbitrary size, rotation, and position on a page; and additional graphical operators to define underlines, strikethroughs, and the like. Positioning may be either absolute (with respect to the page) or relative (with respect to some other coordinate system such as a box within a page). To assure correct justification and margin alignment, the system provides a CORRECT operator for spacing adjustment. It also defines a flexible way to achieve kerning (alteration in the intercharacter spacing of pairs of characters) to achieve a better appearance.
Fonts may be stored at the printer in outline or bitmap forms, or they may be communicated as part of an Interpress master. The bitmap fonts generally are fine-tuned for the characteristics of a specific printer and represent high typographic quality. Outline fonts represent greater versatility since they can be easily scaled and rotated to provide printing in any size and at any angle.
Graphics Printing
In some sense, the term ``graphics'' applies to everything that goes on the page; here, though, the term applies to elements other than black text at normal sizes and orientations and the rectangles that normally appear with such text.
Rectangles are very simple to specify: the operator MASKRECTANGLE expects the four variables x, y, w, and h on the stack (x,y coordinates of one corner, the width, and the height). Single line segments, as we have seen, can be drawn by supplying the two endpoints (four numbers x1, y1, x2, y2) to the MASKVECTOR operator; Interpress allows control of the stroke width and the style of the end caps (square, butt, or circular). More complicated shapes are specified by building trajectories, representing a sequence of straight or curved segments connected end-to-end. A trajectory is constructed segment-by-segment, using a pen-plotter analogy: the MOVETO operator takes the coordinates of a point (two numbers x0, y0) and returns a single-point trajectory. The other trajectory-building operators all take a trajectory followed by the appropriate number of coordinates and parameters, and return a new trajectory. A trajectory is an Interpress value and may be copied and saved for reuse; the most common pattern, though, is to construct it on the stack and use it right away. The other trajectory-building operators are:
x1 y1 LINETO straight line segment from (x0,y0) to (x2,y2);
x1 y1 x2 y2 r CONICTO conic segment--part of circle, ellipse, parabola, or hyperbola;
x1 y1 x2 y2 ARCTO circular arc passing from (x0,y0) through (x1,y1) to (x2,y2); and
x1 y1 x2 y2 x3 y3 CURVETO parametric cubic curve ending at (x3,y3).
Once a trajectory is built, it can be supplied to the MASKSTROKE operator to draw a constant-width stroke; the master may specify whether the joints between the segments should be mitered, beveled, or rounded. For a dashed or dotted stroke, the trajectory and a dash specification is provided to the operator MASKDASHEDSTROKE; fancy combinations of dashed and dotted strokes can be obtained by using the same trajectory with several different sets of dash specifications. MASKSTROKECLOSED draws a stroke with the segment closed by joining its two endpoints (with a line segment, if necessary); this allows the proper joints to be made in place of the end caps.
Several trajectories can be combined to form an outline by using the MAKEOUTLINE operator; an outline represents a filled geometric shape, which may have holes (using multiple trajectories that wrap in opposite directions). An outline can be used with the MASKFILL operator to fill the region with the current color. It can also be used as an argument to the CLIPOUTLINE operator to provide a clipping region for all subsequent masking operations (until the imager variables are restored); and a simpler CLIPRECTANGLE operator is also available.
===
[Artwork node; type 'ArtworkInterpress on' to command tool]
[Artwork node; type 'ArtworkInterpress on' to command tool]
Figure 5. An example of Interpress graphics primitives to draw an ice cream cone.
===
The MASKPIXEL operator uses a bitmap as a mask, represented in the same form as the MAKESAMPLEDBLACK operator. Optionally, it can be in a compressed format.
Figure 5 shows an example that uses many of the Interpress graphics primitives to draw an ice cream cone. The written encoding of the entire master is shown along with comments. You may want to try plotting the positions of the vertices and control points on a piece of graph paper to get a better feel about how they work.
One of the more useful capabilities of Interpress is the set of geometric transformations that are at the heart of the imaging operations. Interpress has a set of operators for building primitive transformations (SCALE, ROTATE, TRANSLATE) and two operators for combining transformations (CONCAT, CONCATT). Mathematically, the transformations can be represented as 3 3 matrices. Transformations can be applied to any graphic image, including line drawings, scanned images, and character shapes. The transformation capabilities are also useful in publishing applications such as two-sided printing of bound documents and creation of n-up printing signatures.
Open System Architecture
Interpress has evolved from many years of use in a distributed network environment. The Interpress architecture, with its functional richness in describing pages and documents and its device-independent capability, represents an effective solution to the problem of printing the wide variety of documents in diverse printing environments with a common standard interface. Interpress has proven to be gracefully extensible so that as new printing technology, new applications, and other new requirements emerged, existing masters could be used unmodified.
Xerox has chosen to make the Interpress printing architecture a completely open system. Interpress and related systems have been made publicly available, free of any royalty or licensing fees.
===
[Artwork node; type 'ArtworkInterpress on' to command tool]
Figure 6. The written Interpress commands to draw a spiral of ice cream cones.
===
===
[Artwork node; type 'Artwork on' to command tool]
Figure 7. A spiral of ice cream cones.
=