ProcPDL.tioga
Rick Beach, May 22, 1986 12:38:25 pm PDT
Rick Beach, January 2, 1987 3:39:28 pm PST
REID
PROCEDURAL PAGE DESCRIPTION LANGUAGES
SIGGRAPH '87 TUTORIAL COURSE NOTES
DOCUMENTATION GRAPHICS
Procedural Page Description Languages
[Republished from Text Processing and Document Manipulation,
Copyright Ó 1986, Cambridge University Press]
Procedural Page Description Languages
Brian K. Reid
Stanford University and DEC Western Research Laboratory
April 1986
ABSTRACT
An important goal of document preparation systems is that they be device-independent, which is to say that their output can be produced on a variety of printing devices. One way of achieving that goal is to devise a device-independent page description language, which can describe precisely the appearance of a formatted page, and to produce software that prints the required image on each variety of printer. Most attempts at device-independent page description languages have failed, resulting either in schemes that are only partially device-independent or in proclamations from researchers that device independence is a bad idea [2, 4].
A new generation of procedural page description languages promises a solution. The PostScript language, and to a slightly lesser extent the Interpress language, offers a means of describing a printed page with an executable program; the page is printed by loading the program into the printer and running it.
1. Page Description Languages
An imaging device, such as a typesetter, laser printer, or display, must have some way of knowing what image it is being asked to show. The two traditional means of providing it with that information have been to describe the image to the imager in terms of a bitmap or character map or describe the image to the imager by means of a sequence of control commands to the imager's electronics.
The bit-map or character-map schemes are the simplest and oldest. For example a line printer is provided with a character map (in this spot put the character X; in this spot put the character Y, and so forth). A CRT screen normally has a corresponding memory buffer such that each bit on the screen is tied to one bit in memory, and the screen pixel can be made light or dark by turning the bit on or off. Mapping schemes require that the dimensions and spacing of the characters or bits be identical to what the image creator wanted, or the resulting image will be rotated or scaled, perhaps even anamorphically. For example, the pixels on the IBM Personal Computer screen are rectangular rather than square, so that an image that was specified as an evenly-spaced bitmap will appear to be vertically elongated when displayed on its screen.
Schemes to describe images via commands to the controllers that generate the image are potentially more device-independent. For example, if the image is to consist of a horizontal line, then the image description can consist of the commands to move a pen to one end of the line and then swing it to the other end of the line, without needing to know the device resolution or how many pixels must be turned on between the endpoints in order to draw the line. Examples of command-stream image description include pen plotters, daisy-wheel printers, and laser printers made by Imagen [7]. Almost all current command-stream image description languages are derivatives of the XCRIBL system done in 1972 at Carnegie-Mellon [5].
Bitmap image descriptions take an enormous amount of storage space, are not device-independent, and require the program generating the images to have access to character font information so that the proper bits can be set. Furthermore, it is extremely difficult to edit or modify a bitmap image description, as for example to change a spelling error in a formed image, or to remove a component of the image and replace it by another: bitmap image descriptions are not suitable for further processing.
By contrast, command-stream image descriptions do not always have the capability of describing every possible image. For example, if the controller has no command for rotating the page or rotating a text character, then it is impossible to describe an image that includes a rotated character. The allure of bitmap image descriptions is that they are universal; the allure of command-stream image descriptions is that they are more compact, more editable, and somewhat device-independent.
Clearly an ideal image description scheme will share the universality of bitmap descriptions with the device-independence and compactness of command-stream image descriptions.
2. Procedural Page Description Languages
A procedural page description is a program, written in some graphics programming language, that, when executed, will create the intended page image. The idea is attributed to Warnock and Sproull, who devised it as the basis for the Interpress page description language [3].
Command-stream page description languages would appear to be ``procedural'' and in fact the descriptions that are written in these language are definitely procedures in the ordinary meaning of the word: ``move the cursor to [12,354]. Switch to boldface. Draw a `Q'. Move right 9 units.'' The true power of a procedural page description language, however, comes from the ability to write conditionals, to define and call functions, and to perform arbitrary computations based on the value of variables stored inside the printer. I reserve the term ``procedural page description'' for languages with these properties. The ability to redefine built-in functions is valuable but not necessary.
3. Comparing Procedural and Nonprocedural Page Descriptions
Procedural descriptions are often more compact than nonprocedural descriptions of the same image, for they can take advantage of regularities in the image. For example, consider a procedural description of a piece of graph paper or a geometric grid. It can define a procedure to draw a line, then call it repeatedly in an appropriate loop. A nonprocedural description of the same image, by comparison, must have a separate item describing each line in the grid.
Procedural descriptions allow the use of abstraction and modular construction in image assembly. One can assemble a library of procedures that draw commonly-used images, and call them inside larger diagrams. Naturally the ability to achieve modularity is not a guarantee that image representations will be modular, any more than a structured programming language like Modula-2 will guarantee that the programs written in it are well-structured.
Procedural descriptions can mimic any other page-description language, simply by programming subroutines in the procedural description language that duplicate the effect of the commands in another language.
Procedural descriptions can be device-adaptive as well as device-independent, by delaying certain decisions about the appearance of he image until the specifics of the printing device are known. For example, see Figure 1, which shows four instances of the Stanford University logo in 25-point through 50-point size.
All four of these logotypes are generated from the same procedural definition; notice that the detail in the outermost ring, the detail in the trunk of the tree, and the spacing between the two innermost rings changes as a function of the physical size of the logo.
===
[Artwork node; type 'Artwork on' to command tool]
Figure 1: Adaptive specification: increasing detail with size
==
It is also worth noting that procedural descriptions can be abused more easily than nonprocedural descriptions: it is possible to write bad code in any programming language, but there is often only one workable way to describe an image in a nonprocedural description scheme. PostScript printers must be on guard for infinite loops in the pictures they are printing.
4. PostScript and Interpress
There are two extant procedural page description languages, namely the aforementioned Interpress and PostScript [1]. A brief discussion of their relative histories can be found in the preface of the PostScript reference manual; a more detailed explanation is given by Reid [1, 6].
Both PostScript and Interpress assume that the printer contains an interpreter for the executable language, and that a page is printed by executing the page description program on the printer; the image is constructed as a side-effect of the program execution.
PostScript is more interesting because a complete implementation is readily available, and because a language description is widely available. The implementation of Interpress is much more limited and is not widely available; the Interpress documentation can only be obtained by special order from Xerox [3].
Therefore I shall take examples from PostScript. PostScript can do anything Interpress can do; the reverse is not true, as Interpress has a certain number of limitations not present in PostScript [6]. In general, however, the explanations and examples to follow comment on both Interpress and PostScript.
===
while input remains do
begin
token := nextLexeme(input);
lexType := lexicalType(token);
if lexType = name then
begin
tokenvalue := lookup(token);
tokentype := type(tokenvalue);
if executable(tokentype) then
execute(tokenvalue)
else
push(tokenvalue)
end;
else push(token)
end
end
Figure 2: PostScript semantics: outline of the interpreter
===
5. PostScript Language Details
A PostScript image description is a sequence of lexical tokens. Those tokens can be names, numbers, delimited strings, procedure bodies, array bodies, or comments. The tokens are delimited by white-space characters, and by certain other delimiter characters when a token boundary can be determined unambiguously.
When a PostScript program is presented to a printer, it is executed. That execution takes place on a stack machine, with names stored in dictionaries, and graphics state stored in global variables. This stack semantics makes PostScript operators be postfix, hence the name. Figure 2 shows an approximation of the PostScript interpreter. Each page begins completely white. Execution of imaging operators causes ink to be put in the image buffer. Color and grayscale are achieved by changing ink color before calling the imaging operators. All ink is opaque, even white ink, which is to say that ``image priority'' is always obeyed. The imaging operators use the interpreter's global ``graphics state'' variables for such information as current position, ink color, clipping region, line-drawing parameters, halftone parameters, font, transformation matrix, etc.
===
%!PS-Adobe-1.0
72 72 moveto 360 576 lineto
stroke copypage
newpath
200 200 moveto 300 400 lineto
0 -200 rlineto
400 200 100 180 360 arc
400 700 lineto
40 setlinewidth 1 setlinejoin 1 setlinecap
stroke copypage
72 72 translate
360 72 sub 576 72 sub atan neg 90 add rotate
/Helvetica-Bold findfont 90 scalefont setfont
70 10 moveto (etaoinshrdlu) show
[Artwork node; type 'Artwork on' to command tool]
Figure 3: PostScript code and the page images that it generates
===
6. Some Illustrative Examples of PostScript
Further explanation best awaits some examples. Each of these figures shows a PostScript program and a 15%-scale image of the page that it generates. Because of space limitations in this volume, these examples are necessarily cryptic; the reader is referred to the PostScript reference manual for further explanation [1].
Figure 3 shows two pages and the PostScript code that generated them; note the use of moveto, lineto, and stroke as active operators. The copypage operator is a debugging operator that prints the page buffer and then continues. The default coordinate system is in points, with the origin in the
===
%!PS-Adobe-1.0
newpath
100 100 moveto 400 600 lineto
100 setlinewidth 1 setlinecap stroke copypage
400 100 moveto 100 400 lineto
50 setlinewidth 0 setlinecap 0.75 setgray
stroke copypage
100 100 moveto 400 600 lineto
30 setlinewidth 1 setgray stroke
showpage
[Artwork node; type 'Artwork on' to command tool]
Figure 4: Building up a page image from overlays of opaque inks
===
lower left corner. First it draws a thin line, using the default line thickness. Then it sets the line width to 40 points, and draws a complex curve. Finally it rotates the coordinate system so that the originally-drawn line is the x-axis, then displays some characters of text horizontal along that axis. Figure 4 shows the mechanism by which the page image is built up from overlays of opaque inks. It sets a very wide line width and draws a diagonal line, then sets the ink color to gray and draws a cross-line, then sets the ink color to white and re-draws the original line with a narrower width and square corners. Notice that in every case the newest ``ink'' covers the older inks.
===
%!PS-Adobe-1.0
/xcoord 100 def /ycoord 200 def
10 setlinewidth
/linefunc {
newpath
xcoord ycoord moveto
100 100 rlineto
stroke
/ycoord ycoord 100 add def} def
linefunc copypage linefunc
20 setlinewidth linefunc copypage
10 setlinewidth linefunc showpage
[Artwork node; type 'Artwork on' to command tool]
Figure 5: Variables, functions, and arithmetic
===
Figure 5 shows the use of variables, functions, and arithmetic. It defines two variables, xcoord and ycoord, then defines a function linefunc that will draw a diagonal line at that [x,y], then add 100 to the value of ycoord. The four calls to linefunc generate the four diagonal lines shown. Figure 6 shows the effect of coordinate system transformations on an image. It defines a function named A, which draws a 300-point letter ``A'' when it is called. The example calls A once, then shrinks the coordinate system anamorphically and calls it again, then rotates the coordinate system, changes to a dark gray ink, reverses the anamorphic scaling, and calls it again.
===
%!PS-Adobe-1.0
/A {
newpath moveto
100 300 rlineto
100 -300 rlineto
-50 130 rmoveto
-100 0 rlineto
stroke
} def
20 setlinewidth 50 50 A copypage
300 400 translate 0.5 0.25 scale
50 50 A copypage
1 2 scale -40 rotate
0.8 0.8 scale 0.5 setgray
100 -150 A
showpage
[Artwork node; type 'Artwork on' to command tool]
Figure 6: The effect of coordinate system transformations
===
7. Capabilities of Procedural Systems
Having skimmed the basics of how a procedural page-description scheme works, let us turn our attention to some of its capabilities. The power of a procedural page-description language comes from:
· the ability to express geometric shapes in a device-independent fashion, while retaining the ability to be device-dependent if necessary,
· the ability to define and use new abstractions, and
· the ability to do ``late binding'' of shapes, which permits a defined operator to be used for varying effects in varying contexts.
The first of these capabilities is obvious, and was demonstrated by the Stanford-logotype example in section 3. The second of these capabilities is evident to any experienced programmer; its virtues need not be further praised. The third virtue—late binding—can best be explained with more examples. The production of the PostScript figures in this article is a good, though complex example.
===
Figure 7: Example of late binding of PostScript code
===
Consider Figure 7. It shows a small image of this page, with a drop shadow and a thin line around the outside. This figure [from the original publication by Cambridge University Press] was generated by extracting (with a text editor) one page image from the Scribe-generated PostScript file for this article, surrounding it with some redefinitions, including it as a figure in the page, then repeating the process. Figure 8 shows those redefinitions, which change the scale factor, produce a clipping region exactly equal to the scaled page image, put a drop shadow and a frame outside that clipping region, and redefine operators that might interfere.
This example, though fanciful, demonstrates the flexibility of the procedural scheme.
===
%!PS-Adobe-1.0
0.15 dup scale
/mydict 100 dict def /inch {72 mul} def
/pagepath {
newpath
0 0 moveto 8.5 inch 0 lineto
8.5 inch 11 inch lineto
0 11 inch lineto
closepath
} def
mydict begin
/showpage {} def
/nextpage {grestore 9.6 72 mul 0 translate
initclip initpage} def
/initpage {
gsave 0.3 inch -0.35 inch translate
pagepath 0.8 setgray fill grestore
pagepath gsave 1 setgray fill grestore
0 setgray 0 setlinewidth stroke
pagepath clip newpath gsave
} def
initpage
% ---
The PostScript page image text goes here
end
Figure 8: PostScript definitions for page-image figures
===
The physical nesting of the diagrams is made possible by the recursive nature of the PostScript execution environment. The ability to redefine a page image to be a figure within itself comes from the ability to specify the page image in terms of late-bound names, i.e. as operators whose definitions can be changed. In this PostScript example, the late binding is achieved by redefining the built-in operators, though the same effect can be had, with a certain amount of discipline on the part of the user, by specifying every page in terms of user-defined functions that merely call the corresponding system function, then accomplishing the late binding by redefining those outermost functions.
This same technique can be used to advantage in many ways. A PostScript file can be wrapped in a set of definitions that will cause it to print as 2 pages per page, or as 4 pages per page, or with decorated borders, or in white letters on a black background, or with a frame for overhead projector slides. As with any other programmable system, it is limited more by imagination than by technology.
References
[1] Adobe Systems, Inc. PostScript Language Reference Manual. Addison-Wesley, Reading, Massachusetts, 1985.
[2] Earnest, Les. ``Would you want your daughter to be device-independent?'' ARPANET Laser-lovers distribution, March 1985.
[3] —. Interpress Electronic Printing Standard. Xerox Corporation, Stamford, Connecticut 06904, 1984. Document number XSIS 04804.
[4] Newman, William. ``Press: A flexible file format for the representation of printed images.'' in Actes des Journees sur la Manipulation de Documents, Rennes, France, 5 May 1983.
[5] R. Reddy, B. Broadley, L. Erman, R. Johnson, J. Newcomer, G. Robertson, and J. Wright. ``XCRIBL, a hardcopy scan line graphics system for document generation.'' Technical Report, Department of Computer Science, Carnegie-Mellon University, October, 1972.
[6] Reid, Brian K. ``PostScript and Interpress: a comparison.'' ARPANET Laser-lovers distribution, March 1, 1985.
[7] Ryland, Chris. Imprint System Manual. Imagen Corporation, 2660 Marine Way, Mountain View California 94304 USA, 1983.